Open
Description
你好,我在Windows11的WSL2下安装了FunAudioLLM-APP环境,并执行了voice_chat下的app.py文件,执行成功了,然后我尝试将app.py里的文本转语音和语音转文本的函数复制出来到一个server.py里,尝试将他们改写为fastapi或flask的接口,出现了以下问题:
- 首先资源占用情况,首先是资源情况:内存占用上升,跑到百分之百,然后内存满后磁盘占用也跑满了,磁盘跑满后我的vs code就断开了和wsl的连接,并且电脑响应速度变得很慢,这个过程中显卡占用没有上升,我已经设置了"CUDA_VISIBLE_DEVICES" = "0",
- 我在代码里加入了一些调试信息,在
FunAudioLLM-APP/cosyvoice/cosyvoice/cli/frontend.py
第52行加入了traceback.print_stack()
这句代码,然后运行我的脚本,我发现这个方法调用了两次,第一次是正常执行,第二次就是在不知道哪个地方启动了一个进程执行到了这里
日志如下:
(funaudio) hygx@hygx:~/code/funaudiollm-app/FunAudioLLM-APP/voice_chat$ python server.py
2025-02-18 10:24:57,118 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2025-02-18 10:24:57,118 - modelscope - INFO - Loading ast index from /home/hygx/.cache/modelscope/ast_indexer
2025-02-18 10:24:57,264 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 20c375857ea53f3bb9f252cbf4c9cf58 and a total number of 980 components indexed
transformer is not installed, please install it if you want to use related modules
/home/hygx/anaconda3/envs/funaudio/lib/python3.8/site-packages/torch/_jit_internal.py:726: FutureWarning: ignore(True) has been deprecated. TorchScript will now drop the function call on compilation. Use torch.jit.unused now. {}
warnings.warn(
/home/hygx/anaconda3/envs/funaudio/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-02-18 10:25:03.834324912 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-02-18 10:25:03.834361929 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
00000000000000000000000000000000000000000000000000000000000000000000
111111111111111111111111111
File "server.py", line 52, in <module>
cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
File "/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat/../cosyvoice/cosyvoice/cli/cosyvoice.py", line 30, in __init__
self.frontend = CosyVoiceFrontEnd(configs['get_tokenizer'],
File "/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat/../cosyvoice/cosyvoice/cli/frontend.py", line 54, in __init__
traceback.print_stack()
load leagacy transf breakmodel
load leagacy transf breakmodel
text.cc: festival_Text_init
open voice lang map failed
break model index not valid
Loading remote code successfully: ./sensevoice/model.py
2025-02-18 10:25:11,448 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2025-02-18 10:25:11,448 - modelscope - INFO - Use user-specified model revision: master
INFO: Will watch for changes in these directories: ['/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat']
INFO: Uvicorn running on http://0.0.0.0:5987 (Press CTRL+C to quit)
INFO: Started reloader process [338184] using WatchFiles
2025-02-18 10:25:13,625 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2025-02-18 10:25:13,625 - modelscope - INFO - Loading ast index from /home/hygx/.cache/modelscope/ast_indexer
2025-02-18 10:25:13,685 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 20c375857ea53f3bb9f252cbf4c9cf58 and a total number of 980 components indexed
transformer is not installed, please install it if you want to use related modules
/home/hygx/anaconda3/envs/funaudio/lib/python3.8/site-packages/torch/_jit_internal.py:726: FutureWarning: ignore(True) has been deprecated. TorchScript will now drop the function call on compilation. Use torch.jit.unused now. {}
warnings.warn(
/home/hygx/anaconda3/envs/funaudio/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-02-18 10:25:20.475357737 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-02-18 10:25:20.475410315 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
00000000000000000000000000000000000000000000000000000000000000000000
111111111111111111111111111
File "<string>", line 1, in <module>
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/hygx/anaconda3/envs/funaudio/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat/server.py", line 52, in <module>
cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
File "/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat/../cosyvoice/cosyvoice/cli/cosyvoice.py", line 30, in __init__
self.frontend = CosyVoiceFrontEnd(configs['get_tokenizer'],
File "/home/hygx/code/funaudiollm-app/FunAudioLLM-APP/voice_chat/../cosyvoice/cosyvoice/cli/frontend.py", line 54, in __init__
traceback.print_stack()
load leagacy transf breakmodel
load leagacy transf breakmodel
text.cc: festival_Text_init
open voice lang map failed
break model index not valid
我的代码:
import re
import torch
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
from fastapi import FastAPI, File, UploadFile
from typing import Optional
from fastapi.responses import StreamingResponse, FileResponse, JSONResponse
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
from http import HTTPStatus
import dashscope
from dashscope import Generation
from dashscope.api_entities.dashscope_response import Role
from typing import List, Optional, Tuple, Dict
from uuid import uuid4
from modelscope import HubApi
import torchaudio
import sys
sys.path.insert(1, "../cosyvoice")
sys.path.insert(1, "../sensevoice")
sys.path.insert(1, "../cosyvoice/third_party/AcademiCodec")
sys.path.insert(1, "../cosyvoice/third_party/Matcha-TTS")
sys.path.insert(1, "../")
from utils.rich_format_small import format_str_v2
from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
from funasr import AutoModel
app = FastAPI()
class TextToSpeechRequest(BaseModel):
text: str
REPLY_FILES_DIR = "reply"
TMP_FILES_DIR = "tmp"
os.makedirs(REPLY_FILES_DIR, exist_ok=True)
os.makedirs(TMP_FILES_DIR, exist_ok=True)
dashscope.api_key = "sk-xxxxxx"
speaker_name = '中文女'
cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
sense_voice_model = AutoModel(model="iic/SenseVoiceSmall",
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
trust_remote_code=True, device="cuda:0", remote_code="./sensevoice/model.py")
def text_to_speech(text):
pattern = r"生成风格:\s*([^;]+);播报内容:\s*(.+)"
match = re.search(pattern, text)
if match:
style = match.group(1).strip()
content = match.group(2).strip()
tts_text = f"{style}<endofprompt>{content}"
print(f"生成风格: {style}")
print(f"播报内容: {content}")
else:
print("No match found")
tts_text = text
# text_list = preprocess(text)
text_list = [tts_text]
for i in text_list:
output = cosyvoice.inference_sft(i, speaker_name)
yield (22050, output['tts_speech'].numpy().flatten())
def asr(file_path):
'''file_path可以是一个URL或者本地路径的wav文件'''
# samplerate, data = audio
# file_path = f"./tmp/asr_{uuid4()}.wav"
# torchaudio.save(file_path, torch.from_numpy(data).unsqueeze(0), samplerate)
res = sense_voice_model.generate(
input=file_path,
cache={},
language="zh",
text_norm="woitn",
batch_size_s=0,
batch_size=1,
hotword='华衣共享'
)
return res[0]['text']
@app.post("/tts/")
async def tts(request: TextToSpeechRequest):
"""
将文本转换为音频并保存为 WAV 文件。
返回音频文件的路径。
"""
print(11111111111111111111111111111111111)
text = request.text.strip()
if not text:
raise HTTPException(status_code=400, detail="文本不能为空")
sample_rate, speech_data = text_to_speech(text)
file_name = f"tts_tmp_file.wav"
file_path = os.path.join(os.path.join(os.getcwd(), REPLY_FILES_DIR), file_name)
print(f'file path is:{file_path}')
# 将文本写入文件
torchaudio.save(file_path, speech_data, sample_rate)
# 设置音频文件保存路径
# 确保文件存在
if not os.path.exists(file_path):
raise HTTPException(status_code=500, detail="音频生成失败")
# 返回音频文件
return FileResponse(file_path, media_type="audio/wav", filename="output.wav")
@app.post("/asr/")
async def funaudio_asr(file: UploadFile = File(...)):
print(11111111111111111111111111111111111)
# 保存上传的音频文件到临时文件
temp_audio_path = "temp_audio.wav"
with open(temp_audio_path, "wb") as temp_audio_file:
temp_audio_file.write(file.file.read())
text = asr(temp_audio_path)
# 删除临时文件
os.remove(temp_audio_path)
# 返回识别结果
return JSONResponse(content={"text": text}, status_code=200)
if __name__ == "__main__":
# uvicorn.run("server:app", host="0.0.0.0", port=7999, reload=True, ssl_keyfile="./key.pem", ssl_certfile="./cert.pem")
uvicorn.run("server:app", host="0.0.0.0", port=5987, reload=True)
这个脚本放在FunAudioLLM-APP/voice_chat/server.py
后,将api_key填入之后执行这个脚本就可以直接运行python server.py就会复现这个bug了
Metadata
Metadata
Assignees
Labels
No labels