GPT结合业务基础能力之语音

为了更像心目中的 Jarvis, 我们需要语音交流

from pathlib import Path
import gradio as gr
import openai
from gtts import gTTS
from tempfile import NamedTemporaryFile
from playsound import playsound
from agent import getScheduleAgent # type: ignore

def transcribe(audio):
    # 通过gradio获取到音频
    myfile=Path(audio)
    myfile=myfile.rename(myfile.with_suffix('.wav'))
    audio_file = open(myfile,"rb")
    # 通过 openAI 接口转为 text
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
    # 调取业务, 这里调用之前的日程场景
    response = getScheduleAgent(transcript["text"]); # type: ignore

    print(response)
    # 把结果通过text转语音, 然后播报出来
    gTTS(text=response,lang="zh",slow=False).write_to_fp(voice := NamedTemporaryFile(delete=True))
    playsound(voice.name)
    voice.close()

    return response

demo = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs=None)

demo.launch()

这里试了几个 lib, 有一个 pyttsx3, 发现我现有 py3 版本不太匹配,报错.
然后就放弃直接调用 openAI 的能力了.

langchain 官网的例子https://python.langchain.com/en/latest/use_cases/chatbots/voice_assistant.html

用的是 pyttsx3 和 speech_recognition