python语音识别入门及实践(2)

当前位置:

首页 > Python基础教程 >

python语音识别入门及实践(2)

通过上下文管理器打开文件并读取文件内容，并将数据存储在 AudioFile 实例中，然后通过 record（）将整个文件中的数据记录到 AudioData 实例中，可通过检查音频类型来确认：

1 2	`>>>` `type(audio)` `<class` `'speech_recognition.AudioData'>`

现在可以调用 recognition_google（）来尝试识别音频中的语音。

1

2

3

4

5

6

>>> r.recognize_google(audio)

'the stale smell of old beer lingers it takes heat

to bring out the odor a cold dip restores health and

zest a salt pickle taste fine with ham tacos al

Pastore are my favorite a zestful food is the hot

cross bun'

以上就完成了第一个音频文件的录制。

利用偏移量和持续时间获取音频片段

若只想捕捉文件中部分演讲内容该怎么办？record() 命令中有一个 duration 关键字参数，可使得该命令在指定的秒数后停止记录。

例如，以下内容仅获取文件前四秒内的语音：

1

2

3

4

5

>>> with harvard as source:

...   audio = r.record(source, duration=4)

...

>>> r.recognize_google(audio)

'the stale smell of old beer lingers'

在with块中调用record() 命令时，文件流会向前移动。这意味着若先录制四秒钟，再录制四秒钟，则第一个四秒后将返回第二个四秒钟的音频。

>>> with harvard as source:

...   audio1 = r.record(source, duration=4)

...   audio2 = r.record(source, duration=4)

...

>>> r.recognize_google(audio1)

'the stale smell of old beer lingers'

>>> r.recognize_google(audio2)

'it takes heat to bring out the odor a cold dip'

除了指定记录持续时间之外，还可以使用 offset 参数为 record() 命令指定起点，其值表示在开始记录的时间。如：仅获取文件中的第二个短语，可设置 4 秒的偏移量并记录 3 秒的持续时间。

1

2

3

4

5

>>> with harvard as source:

...   audio = r.record(source, offset=4, duration=3)

...

>>> recognizer.recognize_google(audio)

'it takes heat to bring out the odor'

在事先知道文件中语音结构的情况下，offset 和 duration 关键字参数对于分割音频文件非常有用。但使用不准确会导致转录不佳。

1

2

3

4

5

>>> with harvard as source:

...   audio = r.record(source, offset=4.7, duration=2.8)

...

>>> recognizer.recognize_google(audio)

'Mesquite to bring out the odor Aiko'

栏目列表