在 iOS App 中使用 AVSpeechSynthesizer 合成语音(普通话、英语、粤语等)

| Swift , iOS

 

内容概览

  • 前言
  • Apple AVSpeechSynthesizer
  • Google Text-to-Speech
  • 总结

 

前言

 

Ficow 自己开发了一款 iOS 应用:小粤粤-学粤语,主要用来学习粤语。想要学粤语的朋友,可以考虑试用一下喔~ 🤪 如果有意见,也欢迎您向我提出!

这是在应用商店中展示的 小粤粤-学粤语 App 的用户界面:

 

言归正传,以前 Ficow 是将应用中需要播放的语音放到了阿里云上的对象存储空间。每次需要播放音频的时候,应用就会去阿里云的对象存储服务器下载音频文件。然而,这些音频文件的质量不是很好,而且音频本身对于调速功能的支持不友好。多次调研之后,Ficow 决定在应用内合成音频。

既然要合成音频,那么就要考虑用什么工具了!

目前为止,Ficow 找到的比较好的音频合成工具如下:

  • 苹果官方支持的 AVSpeechSynthesizer
    • 优点:官方支持、免费使用、简单易用、支持变速变调等功能;
    • 缺点:合成的音频效果中规中矩(还是有机器人念稿子的感觉);
  • Google 的 Text-to-Speech
    • 优点:合成效果超级棒、免费使用的额度较大、支持变速变调等功能;
    • 缺点:被墙了,可能需要使用 google.cn 来解决问题

Ficow 采用了 AVSpeechSynthesizer,所以本文就以它命名了。至于 Text-to-Speech 方案,仅供参考~

 

Apple AVSpeechSynthesizer

 

话不多说,直接上代码:

import AVFoundation

final class CantoneseSpeechSynthesizer: NSObject {

    private lazy var synthesizer:AVSpeechSynthesizer = {
        let sythesizer = AVSpeechSynthesizer()
        sythesizer.delegate = self
        do {
            // 即使是静音模式,也可以播放
            try AVAudioSession.sharedInstance().setCategory(.playback)
            try AVAudioSession.sharedInstance().setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print(error)
        }
        return sythesizer
    }()

    func playCantoneseText(_ text: String) {
        let utterance = AVSpeechUtterance(string: text)
        utterance.voice = AVSpeechSynthesisVoice(language: "zh-HK") // 设定语言,若不设置就会跟随iOS系统的语言
        utterance.rate = AVSpeechUtteranceDefaultSpeechRate // 取值范围:[0, 1],默认语速为 0.5
        synthesizer.speak(utterance)
    }
}

// 在代理方法中监听 AVSpeechSynthesizer 的事件
extension CantoneseSpeechSynthesizer: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {
        print("didStart")
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        print("didFinish")
    }
}

如果想查看 AVSpeechSynthesisVoice 支持的语言列表,可以通过以下方式打印出来:

print(AVSpeechSynthesisVoice.speechVoices())

输出内容如下:

[[AVSpeechSynthesisVoice 0x600003d80820] Language: ar-SA, Name: Maged, Quality: Default [com.apple.ttsbundle.Maged-compact], [AVSpeechSynthesisVoice 0x600003d80da0] Language: cs-CZ, Name: Zuzana, Quality: Default [com.apple.ttsbundle.Zuzana-compact], [AVSpeechSynthesisVoice 0x600003d80b20] Language: da-DK, Name: Sara, Quality: Default [com.apple.ttsbundle.Sara-compact], [AVSpeechSynthesisVoice 0x600003d80460] Language: de-DE, Name: Anna, Quality: Default [com.apple.ttsbundle.Anna-compact], [AVSpeechSynthesisVoice 0x600003d808e0] Language: el-GR, Name: Melina, Quality: Default [com.apple.ttsbundle.Melina-compact], [AVSpeechSynthesisVoice 0x600003d80660] Language: en-AU, Name: Karen, Quality: Default [com.apple.ttsbundle.Karen-compact], [AVSpeechSynthesisVoice 0x600003d80520] Language: en-GB, Name: Daniel, Quality: Default [com.apple.ttsbundle.Daniel-compact], [AVSpeechSynthesisVoice 0x600003d80960] Language: en-IE, Name: Moira, Quality: Default [com.apple.ttsbundle.Moira-compact], [AVSpeechSynthesisVoice 0x600003d80aa0] Language: en-IN, Name: Rishi, Quality: Default [com.apple.ttsbundle.Rishi-compact], [AVSpeechSynthesisVoice 0x600003d80ae0] Language: en-US, Name: Samantha, Quality: Default [com.apple.ttsbundle.Samantha-compact], [AVSpeechSynthesisVoice 0x600003d80be0] Language: en-ZA, Name: Tessa, Quality: Default [com.apple.ttsbundle.Tessa-compact], [AVSpeechSynthesisVoice 0x600003d809a0] Language: es-ES, Name: Mónica, Quality: Default [com.apple.ttsbundle.Monica-compact], [AVSpeechSynthesisVoice 0x600003d80a60] Language: es-MX, Name: Paulina, Quality: Default [com.apple.ttsbundle.Paulina-compact], [AVSpeechSynthesisVoice 0x600003d80b60] Language: fi-FI, Name: Satu, Quality: Default [com.apple.ttsbundle.Satu-compact], [AVSpeechSynthesisVoice 0x600003d80420] Language: fr-CA, Name: Amélie, Quality: Default [com.apple.ttsbundle.Amelie-compact], [AVSpeechSynthesisVoice 0x600003d80c20] Language: fr-FR, Name: Thomas, Quality: Default [com.apple.ttsbundle.Thomas-compact], [AVSpeechSynthesisVoice 0x600003d804a0] Language: he-IL, Name: Carmit, Quality: Default [com.apple.ttsbundle.Carmit-compact], [AVSpeechSynthesisVoice 0x600003d807a0] Language: hi-IN, Name: Lekha, Quality: Default [com.apple.ttsbundle.Lekha-compact], [AVSpeechSynthesisVoice 0x600003d80860] Language: hu-HU, Name: Mariska, Quality: Default [com.apple.ttsbundle.Mariska-compact], [AVSpeechSynthesisVoice 0x600003d804e0] Language: id-ID, Name: Damayanti, Quality: Default [com.apple.ttsbundle.Damayanti-compact], [AVSpeechSynthesisVoice 0x600003d80360] Language: it-IT, Name: Alice, Quality: Default [com.apple.ttsbundle.Alice-compact], [AVSpeechSynthesisVoice 0x600003d806a0] Language: ja-JP, Name: Kyoko, Quality: Default [com.apple.ttsbundle.Kyoko-compact], [AVSpeechSynthesisVoice 0x600003d80d20] Language: ko-KR, Name: Yuna, Quality: Default [com.apple.ttsbundle.Yuna-compact], [AVSpeechSynthesisVoice 0x600003d80560] Language: nl-BE, Name: Ellen, Quality: Default [com.apple.ttsbundle.Ellen-compact], [AVSpeechSynthesisVoice 0x600003d80ca0] Language: nl-NL, Name: Xander, Quality: Default [com.apple.ttsbundle.Xander-compact], [AVSpeechSynthesisVoice 0x600003d80a20] Language: no-NO, Name: Nora, Quality: Default [com.apple.ttsbundle.Nora-compact], [AVSpeechSynthesisVoice 0x600003d80d60] Language: pl-PL, Name: Zosia, Quality: Default [com.apple.ttsbundle.Zosia-compact], [AVSpeechSynthesisVoice 0x600003d807e0] Language: pt-BR, Name: Luciana, Quality: Default [com.apple.ttsbundle.Luciana-compact], [AVSpeechSynthesisVoice 0x600003d805e0] Language: pt-PT, Name: Joana, Quality: Default [com.apple.ttsbundle.Joana-compact], [AVSpeechSynthesisVoice 0x600003d805a0] Language: ro-RO, Name: Ioana, Quality: Default [com.apple.ttsbundle.Ioana-compact], [AVSpeechSynthesisVoice 0x600003d80920] Language: ru-RU, Name: Milena, Quality: Default [com.apple.ttsbundle.Milena-compact], [AVSpeechSynthesisVoice 0x600003d80780] Language: sk-SK, Name: Laura, Quality: Default [com.apple.ttsbundle.Laura-compact], [AVSpeechSynthesisVoice 0x600003d803d0] Language: sv-SE, Name: Alva, Quality: Default [com.apple.ttsbundle.Alva-compact], [AVSpeechSynthesisVoice 0x600003d80620] Language: th-TH, Name: Kanya, Quality: Default [com.apple.ttsbundle.Kanya-compact], [AVSpeechSynthesisVoice 0x600003d80ce0] Language: tr-TR, Name: Yelda, Quality: Default [com.apple.ttsbundle.Yelda-compact], [AVSpeechSynthesisVoice 0x600003d80c60] Language: zh-CN, Name: Tian-Tian, Quality: Default [com.apple.ttsbundle.Ting-Ting-compact], [AVSpeechSynthesisVoice 0x600003d80ba0] Language: zh-HK, Name: Sin-Ji, Quality: Default [com.apple.ttsbundle.Sin-Ji-compact], [AVSpeechSynthesisVoice 0x600003d808a0] Language: zh-TW, Name: Mei-Jia, Quality: Default [com.apple.ttsbundle.Mei-Jia-compact]]

在输出的信息中,可以发现包含 zh-HK(粤语) 的项目:

[AVSpeechSynthesisVoice 0x600003d80ba0] Language: zh-HK, Name: Sin-Ji, Quality: Default [com.apple.ttsbundle.Sin-Ji-compact]

还有,包含 zh-CN(中文) 的项目:

[AVSpeechSynthesisVoice 0x600003d80c60] Language: zh-CN, Name: Tian-Tian, Quality: Default [com.apple.ttsbundle.Ting-Ting-compact]

以及,包含 en-US(美式英语) 的项目:

[AVSpeechSynthesisVoice 0x600003d80ae0] Language: en-US, Name: Samantha, Quality: Default [com.apple.ttsbundle.Samantha-compact]

可以看到,支持的语言有很多。所以,选择您需要的即可~

 

Google Text-to-Speech

 

首先,您可以去 Text-to-Speech 页面试听一下合成语音的效果。对比苹果官方提供的 AVSpeechSynthesizer,效果确实要好一些,主要是听起来更流畅、更自然。

Google Text-to-Speech 提供的语音有两种类型,BasicWaveNet。前者的合成效果一般,后者借助机器学习之后达到了一个比较好的效果。

虽然它比较好,但是 Ficow 还是建议您在采用之前考虑如下因素:

  • 价格,这个语音合成功能不是免费的;
  • 在中国大陆是否可以成功连接到 Text-to-Speech 指定的服务器;
  • 如果将 Text-to-Speech 配置到服务器上,由该服务器提供合成语音的接口,您就需要防范他人盗用您的接口;

针对第二点,Ficow 想到了如下解决方案,仅供您参考:

  • 针对中国大陆,使用 google.cn 来连接 Google Text-to-Speech 指定的 API;
  • 配置一台非大陆的服务器,让这个服务器来请求 Google Text-to-Speech 指定的 API;

考虑到要使用非大陆的服务器,而且质量不错的非大陆服务器价格不菲,Ficow 果断选择了苹果提供的 AVSpeechSynthesizer。虽然,我有在 vultr 上购置美国服务器,但是该服务器的效果达不到我的预期。可能是我买的配置太低了,所以延迟比较高。

 

总结

 

使用苹果官方提供的 AVSpeechSynthesizer,简单、高效、免费,不过合成语音的效果比较一般。
使用 Google 的 Text-to-Speech,需要解决的问题比较多,而且它不是免费的。

如果您有其他想法或建议,欢迎您给我留言!让我们互相学习,共同进步~

 

觉得不错?点个赞呗~

本文链接:在 iOS App 中使用 AVSpeechSynthesizer 合成语音(普通话、英语、粤语等)

转载声明:本站文章如无特别说明,皆为原创。转载请注明:Ficow Shen's Blog

评论区(期待你的留言)