官方仓库
https://github.com/suno-ai/bark
环境配置
此处为了方便快速简单,通过官方提供的Colab来实现
› 什么是Google Colab[点击展开]
Google Colab是Google提供的一个免费的云端开发环境,可以通过浏览器直接使用。它基于Jupyter Notebook,提供了一个交互式的环境,可以方便地进行Python编程、数据分析、机器学习等任务。使用Google Colab可以免费获得云端GPU和TPU资源,这对于需要大量计算资源的深度学习任务非常有用。此外,Google Colab还支持与Google Drive的无缝集成,方便用户进行文件管理和共享。
值得一提的是,由于Google Colab是基于云端的,用户可以随时保存和分享自己的Notebook,方便协作和交流。
打开后,界面如下,使用前需要登陆Google账号
登录后,在Install中点击左边的按钮来运行,安装需要的依赖环境,整体时间相对会比较长
安装完成后,往下滑,在Basic中引入依赖,预加载模型
至此,依赖环境就已经全部装好了,也就可以上手使用了
使用
和上面一样,新开一栏code
,写好代码运行即可。
基础Demo
会随机选择语音生成,每次生成效果不一样
text_prompt = """
大家好,才是真的好。
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)
增加一些特殊音效
text_prompt = """
[clears throat]大家好,才是真的好。[laughs]
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)
音效列表如下:
- [laughter]
- [laughs]
- [sighs]
- [music]
- [gasps]
- [clears throat]
- — or … for hesitations
- ♪ for song lyrics
- capitalization for emphasis of a word
男女对话
text_prompt = """
WOMAN: 早上好,吃早饭了吗?
MAN: 吃了,吃了俩油条和一个鸡蛋。
"""
audio_array = generate_audio(text_prompt)
Audio(audio_array, rate=SAMPLE_RATE)
提示语言类型
text_prompt = """
大家好,才是真的好。
"""
audio_array = generate_audio(text_prompt, history_prompt="zh_speaker_2")
Audio(audio_array, rate=SAMPLE_RATE)
语音列表如下:
en_speaker_0
en_speaker_1
en_speaker_2
en_speaker_3
en_speaker_4
en_speaker_5
en_speaker_6
en_speaker_7
en_speaker_8
en_speaker_9
de_speaker_0
de_speaker_1
de_speaker_2
de_speaker_3
de_speaker_4
de_speaker_5
de_speaker_6
de_speaker_7
de_speaker_8
de_speaker_9
es_speaker_0
es_speaker_1
es_speaker_2
es_speaker_3
es_speaker_4
es_speaker_5
es_speaker_6
es_speaker_7
es_speaker_8
es_speaker_9
fr_speaker_0
fr_speaker_1
fr_speaker_2
fr_speaker_3
fr_speaker_4
fr_speaker_5
fr_speaker_6
fr_speaker_7
fr_speaker_8
fr_speaker_9
hi_speaker_0
hi_speaker_1
hi_speaker_2
hi_speaker_3
hi_speaker_4
hi_speaker_5
hi_speaker_6
hi_speaker_7
hi_speaker_8
hi_speaker_9
it_speaker_0
it_speaker_1
it_speaker_2
it_speaker_3
it_speaker_4
it_speaker_5
it_speaker_6
it_speaker_7
it_speaker_8
it_speaker_9
ja_speaker_0
ja_speaker_1
ja_speaker_2
ja_speaker_3
ja_speaker_4
ja_speaker_5
ja_speaker_6
ja_speaker_7
ja_speaker_8
ja_speaker_9
ko_speaker_0
ko_speaker_1
ko_speaker_2
ko_speaker_3
ko_speaker_4
ko_speaker_5
ko_speaker_6
ko_speaker_7
ko_speaker_8
ko_speaker_9
pl_speaker_0
pl_speaker_1
pl_speaker_2
pl_speaker_3
pl_speaker_4
pl_speaker_5
pl_speaker_6
pl_speaker_7
pl_speaker_8
pl_speaker_9
pt_speaker_0
pt_speaker_1
pt_speaker_2
pt_speaker_3
pt_speaker_4
pt_speaker_5
pt_speaker_6
pt_speaker_7
pt_speaker_8
pt_speaker_9
ru_speaker_0
ru_speaker_1
ru_speaker_2
ru_speaker_3
ru_speaker_4
ru_speaker_5
ru_speaker_6
ru_speaker_7
ru_speaker_8
ru_speaker_9
tr_speaker_0
tr_speaker_1
tr_speaker_2
tr_speaker_3
tr_speaker_4
tr_speaker_5
tr_speaker_6
tr_speaker_7
tr_speaker_8
tr_speaker_9
zh_speaker_0
zh_speaker_1
zh_speaker_2
zh_speaker_3
zh_speaker_4
zh_speaker_5
zh_speaker_6
zh_speaker_7
zh_speaker_8
zh_speaker_9
完整代码
pip安装依赖
# install bark as well as pytorch nightly to get blazing fast flash-attention
!pip install git+https://github.com/suno-ai/bark.git && \
pip uninstall -y torch torchvision torchaudio && \
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
代码
from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio
preload_models()
text_prompt = """
大家好,我是初始安全公众号,我的博客是https://blog.gm7.org/,主要是用于构建自己的知识库。
不忘初心,方得始终。
纸上得来终觉浅,绝知此事要躬行。
"""
audio_array = generate_audio(text_prompt, history_prompt="zh_speaker_1")
Audio(audio_array, rate=SAMPLE_RATE)
想法
通过chatGPT获取回复,再通过bark将回复转换成有感情的语音输出。
但感觉bark有时候说中文还是有点问题,得有时间再研究研究。