Setup Whisper Service
We need to setup a whisper service to transcribe audio to text, so we can use the context in the audio file as the context in the latter conversation.
We use onerahmet/openai-whisper-asr-webservice:latest-gpu
image to setup the whisper service, refer to this project for details. You can also use onerahmet/openai-whisper-asr-webservice:latest
if you don't have GPU resource, but it'll have low performance.
Run the service using K8s under kubeagi-system namespace.
# create the deployment
kubectl create deploy whisper-apiserver --image=onerahmet/openai-whisper-asr-webservice:latest-gpu -n kubeagi-system
# create the service
kubectl expose deploy whisper-apiserver --port=9000 -n kubeagi-system
Then the service will be available using endpoint whisper-apiserver.kubeagi-system:9000/asr
, and this will be the default address that whisper document loader will use to transcribe audio to text.
Advanced Configration
It'll use openai_whisper by default, but you can use environment variables to use fast_whisper like below:
ASR_ENGINE=faster_whisper
ASR_MODEL=/data/models/whisper/large-v3 # mount /data/models/whisper/ to /tmp/whisper in the container
Get the models from huggingface.
When you use DocumentLoader in your application, and input .mp3 or .wav files when call this applicaiton, the whisper document loader will call this API to get text from audio. Refer to DocumentLoader for details.