GPT结合业务基础能力之语音

为了更像心目中的 Jarvis, 我们需要语音交流

from pathlib import Path
import gradio as gr
import openai
from gtts import gTTS
from tempfile import NamedTemporaryFile
from playsound import playsound
from agent import getScheduleAgent # type: ignore

def transcribe(audio):
# 通过gradio获取到音频
myfile=Path(audio)
myfile=myfile.rename(myfile.with_suffix('.wav'))
audio_file = open(myfile,"rb")
# 通过 openAI 接口转为 text
transcript = openai.Audio.transcribe("whisper-1", audio_file)
# 调取业务, 这里调用之前的日程场景
response = getScheduleAgent(transcript["text"]); # type: ignore

print(response)
# 把结果通过text转语音, 然后播报出来
gTTS(text=response,lang="zh",slow=False).write_to_fp(voice := NamedTemporaryFile(delete=True))
playsound(voice.name)
voice.close()

return response

demo = gr.Interface(
fn=transcribe,
inputs=gr.Audio(source="microphone", type="filepath"),
outputs=None)

demo.launch()

这里试了几个 lib, 有一个 pyttsx3, 发现我现有 py3 版本不太匹配,报错.
然后就放弃直接调用 openAI 的能力了.

langchain 官网的例子https://python.langchain.com/en/latest/use_cases/chatbots/voice_assistant.html

用的是 pyttsx3speech_recognition