Sphinx语音识别学习记录 (三)-小范围语音英文识别
http://www.cnblogs.com/yin52133/archive/2012/06/21/2557219.html - (一)基本运行测试 http://www.cnblogs.com/yin52133/archive/2012/07/12/2587282.html - (二)自然语言处理原理研究 http://www.cnblogs.com/yin52133/archive/2012/07/12/2587419.html - (三)小范围语音英文识别 http://www.cnblogs.com/yin52133/archive/2012/07/12/2588201.html - (四)小范围语音中文识别 http://www.cnblogs.com/yin52133/archive/2012/06/22/2558806.html - (五)错误调试 http://www.cnblogs.com/yin52133/archive/2012/07/12/2588418.html - (六)我的目标和几个想像的方案(闲置中)
那我们该如何提高准确率呢?
根据第四章的分析,我们需要建立好一点的语音模型,而好一点的语音模型需要几个句子或者几个单词组合类型做出来的
因为我们统计的概率就是连续的单词,出现的概率和,出现某个单词后接着出现另外的单词的概率
语言模型的建立和使用可以参考http://cmusphinx.sourceforge.net/wiki/tutoriallm
为了说明
我重新做了一个文本
4906.txt
open browser
open music
open note
close window
close music
然后直接用http://www.speech.cs.cmu.edu/tools/lmtool.html这个在线工具,生成lm文件和dic文件
然后声学模型使用默认的hub4wsj_sc_8k
直接用pocketsphinx_continuous调用
pocketsphinx_continuous -hmm hub4wsj_sc_8k -lm 4906.lm -dict 4906.dic
以下是测试结果
000000010: CLOSE WINDOW READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 52.52 2.30 0.38 0.74 -0.22 - 0.36 -0.25 0.07 0.17 -0.05 0.12 -0.41 -0.05 > INFO: cmn_prior.c(139): cmn_prior_update: to < 52.17 2.29 0.39 0.77 -0.19 - 0.35 -0.23 0.08 0.17 -0.04 0.13 -0.39 -0.04 > INFO: ngram_search_fwdtree.c(1549): 822 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 14143 senones evaluated (124/fr) INFO: ngram_search_fwdtree.c(1553): 6385 channels searched (56/fr), 572 1st, 4781 last INFO: ngram_search_fwdtree.c(1557): 1117 words for which last channels evalu ated (9/fr) INFO: ngram_search_fwdtree.c(1560): 135 candidate words for entering last p hone (1/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.05 CPU 0.041 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.02 wall 1.768 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 8 words INFO: ngram_search_fwdflat.c(940): 177 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 13906 senones evaluated (122/fr) INFO: ngram_search_fwdflat.c(944): 7497 channels searched (65/fr) INFO: ngram_search_fwdflat.c(946): 546 words searched (4/fr) INFO: ngram_search_fwdflat.c(948): 363 word transitions (3/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.03 CPU 0.027 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.018 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.103 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 35 nodes, 37 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:103:112) = -716626 INFO: ps_lattice.c(1390): Joint P(O,S) = -721218 P(S|O) = -4592 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT 000000011: CLOSE MUSIC READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 52.17 2.29 0.39 0.77 -0.19 - 0.35 -0.23 0.08 0.17 -0.04 0.13 -0.39 -0.04 > INFO: cmn_prior.c(139): cmn_prior_update: to < 52.13 2.48 0.07 0.71 -0.04 - 0.31 -0.25 0.16 0.18 -0.05 0.03 -0.37 -0.08 > INFO: ngram_search_fwdtree.c(1549): 724 words recognized (6/fr) INFO: ngram_search_fwdtree.c(1551): 14052 senones evaluated (117/fr) INFO: ngram_search_fwdtree.c(1553): 5970 channels searched (49/fr), 567 1st, 4580 last INFO: ngram_search_fwdtree.c(1557): 1153 words for which last channels evalu ated (9/fr) INFO: ngram_search_fwdtree.c(1560): 88 candidate words for entering last p hone (0/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.013 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.01 wall 1.675 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words INFO: ngram_search_fwdflat.c(940): 152 words recognized (1/fr) INFO: ngram_search_fwdflat.c(942): 11290 senones evaluated (94/fr) INFO: ngram_search_fwdflat.c(944): 5553 channels searched (46/fr) INFO: ngram_search_fwdflat.c(946): 527 words searched (4/fr) INFO: ngram_search_fwdflat.c(948): 320 word transitions (2/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.015 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.107 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 30 nodes, 12 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:107:118) = -677028 INFO: ps_lattice.c(1390): Joint P(O,S) = -677028 P(S|O) = 0 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT 000000012: OPEN BROWSER READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 52.13 2.48 0.07 0.71 -0.04 - 0.31 -0.25 0.16 0.18 -0.05 0.03 -0.37 -0.08 > INFO: cmn_prior.c(139): cmn_prior_update: to < 51.56 2.26 0.20 0.84 -0.14 - 0.35 -0.22 0.12 0.18 -0.03 0.08 -0.42 -0.04 > INFO: ngram_search_fwdtree.c(1549): 787 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 13726 senones evaluated (117/fr) INFO: ngram_search_fwdtree.c(1553): 5723 channels searched (48/fr), 625 1st, 4153 last INFO: ngram_search_fwdtree.c(1557): 1222 words for which last channels evalu ated (10/fr) INFO: ngram_search_fwdtree.c(1560): 94 candidate words for entering last p hone (0/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.027 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.04 wall 1.746 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 6 words INFO: ngram_search_fwdflat.c(940): 211 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 11139 senones evaluated (95/fr) INFO: ngram_search_fwdflat.c(944): 5235 channels searched (44/fr) INFO: ngram_search_fwdflat.c(946): 497 words searched (4/fr) INFO: ngram_search_fwdflat.c(948): 281 word transitions (2/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.01 wall 0.005 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.105 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 43 nodes, 14 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:105:115) = -663256 INFO: ps_lattice.c(1390): Joint P(O,S) = -663256 P(S|O) = 0 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT 000000013: OPEN MUSIC READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 51.35 2.26 0.23 0.79 -0.10 - 0.33 -0.25 0.15 0.18 -0.01 0.06 -0.42 -0.04 > INFO: cmn_prior.c(139): cmn_prior_update: to < 50.94 2.14 0.22 0.80 -0.16 - 0.34 -0.20 0.14 0.18 -0.00 0.07 -0.44 -0.02 > INFO: ngram_search_fwdtree.c(1549): 656 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 11822 senones evaluated (119/fr) INFO: ngram_search_fwdtree.c(1553): 5069 channels searched (51/fr), 541 1st, 3713 last INFO: ngram_search_fwdtree.c(1557): 1023 words for which last channels evalu ated (10/fr) INFO: ngram_search_fwdtree.c(1560): 84 candidate words for entering last p hone (0/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.032 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 1.89 wall 1.908 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 6 words INFO: ngram_search_fwdflat.c(940): 160 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 11640 senones evaluated (118/fr) INFO: ngram_search_fwdflat.c(944): 5898 channels searched (59/fr) INFO: ngram_search_fwdflat.c(946): 437 words searched (4/fr) INFO: ngram_search_fwdflat.c(948): 263 word transitions (2/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.016 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.018 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.90 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 42 nodes, 12 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:90:97) = -566632 INFO: ps_lattice.c(1390): Joint P(O,S) = -566744 P(S|O) = -112 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT 000000014: OPEN NOTE READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 50.94 2.14 0.22 0.80 -0.16 - 0.34 -0.20 0.14 0.18 -0.00 0.07 -0.44 -0.02 > INFO: cmn_prior.c(139): cmn_prior_update: to < 50.90 2.33 0.24 0.59 -0.04 - 0.31 -0.26 0.20 0.18 -0.01 0.04 -0.48 -0.01 > INFO: ngram_search_fwdtree.c(1549): 533 words recognized (5/fr) INFO: ngram_search_fwdtree.c(1551): 13409 senones evaluated (133/fr) INFO: ngram_search_fwdtree.c(1553): 5722 channels searched (56/fr), 572 1st, 4236 last INFO: ngram_search_fwdtree.c(1557): 1096 words for which last channels evalu ated (10/fr) INFO: ngram_search_fwdtree.c(1560): 129 candidate words for entering last p hone (1/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.031 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 1.86 wall 1.838 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words INFO: ngram_search_fwdflat.c(940): 166 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 14460 senones evaluated (143/fr) INFO: ngram_search_fwdflat.c(944): 7607 channels searched (75/fr) INFO: ngram_search_fwdflat.c(946): 542 words searched (5/fr) INFO: ngram_search_fwdflat.c(948): 336 word transitions (3/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.015 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.017 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.91 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 35 nodes, 12 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:91:99) = -650418 INFO: ps_lattice.c(1390): Joint P(O,S) = -650418 P(S|O) = 0 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT 000000015: OPEN WINDOW READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 50.90 2.33 0.24 0.59 -0.04 - 0.31 -0.26 0.20 0.18 -0.01 0.04 -0.48 -0.01 > INFO: cmn_prior.c(139): cmn_prior_update: to < 50.80 2.08 0.32 0.79 -0.16 - 0.38 -0.21 0.20 0.21 -0.00 0.08 -0.47 -0.01 > INFO: ngram_search_fwdtree.c(1549): 861 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 15363 senones evaluated (125/fr) INFO: ngram_search_fwdtree.c(1553): 6943 channels searched (56/fr), 614 1st, 5227 last INFO: ngram_search_fwdtree.c(1557): 1227 words for which last channels evalu ated (9/fr) INFO: ngram_search_fwdtree.c(1560): 134 candidate words for entering last p hone (1/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.06 CPU 0.051 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.11 wall 1.720 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words INFO: ngram_search_fwdflat.c(940): 225 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 12072 senones evaluated (98/fr) INFO: ngram_search_fwdflat.c(944): 6521 channels searched (53/fr) INFO: ngram_search_fwdflat.c(946): 561 words searched (4/fr) INFO: ngram_search_fwdflat.c(948): 333 word transitions (2/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.014 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.111 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 42 nodes, 43 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:111:121) = -702331 INFO: ps_lattice.c(1390): Joint P(O,S) = -707956 P(S|O) = -5625 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.003 xRT 000000016: CLOSE MUSIC READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 50.44 2.00 0.30 0.77 -0.17 - 0.37 -0.22 0.23 0.22 -0.01 0.09 -0.45 -0.02 > INFO: cmn_prior.c(139): cmn_prior_update: to < 51.19 2.05 0.42 0.55 -0.13 - 0.39 -0.26 0.22 0.19 -0.00 0.09 -0.50 -0.04 > INFO: ngram_search_fwdtree.c(1549): 786 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 14040 senones evaluated (119/fr) INFO: ngram_search_fwdtree.c(1553): 6064 channels searched (51/fr), 649 1st, 4340 last INFO: ngram_search_fwdtree.c(1557): 1260 words for which last channels evalu ated (10/fr) INFO: ngram_search_fwdtree.c(1560): 141 candidate words for entering last p hone (1/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.026 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.08 wall 1.760 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 8 words INFO: ngram_search_fwdflat.c(940): 213 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 12917 senones evaluated (109/fr) INFO: ngram_search_fwdflat.c(944): 6890 channels searched (58/fr) INFO: ngram_search_fwdflat.c(946): 601 words searched (5/fr) INFO: ngram_search_fwdflat.c(948): 359 word transitions (3/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.01 wall 0.012 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.108 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 40 nodes, 32 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:108:116) = -682573 INFO: ps_lattice.c(1390): Joint P(O,S) = -686913 P(S|O) = -4340 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT 000000017: CLOSE WINDOW READY.... Listening... Stopped listening, please wait... INFO: cmn_prior.c(121): cmn_prior_update: from < 51.19 2.05 0.42 0.55 -0.13 - 0.39 -0.26 0.22 0.19 -0.00 0.09 -0.50 -0.04 > INFO: cmn_prior.c(139): cmn_prior_update: to < 51.03 2.23 0.53 0.47 -0.05 - 0.38 -0.27 0.29 0.19 -0.01 0.07 -0.47 -0.05 > INFO: ngram_search_fwdtree.c(1549): 874 words recognized (7/fr) INFO: ngram_search_fwdtree.c(1551): 15967 senones evaluated (133/fr) INFO: ngram_search_fwdtree.c(1553): 7237 channels searched (60/fr), 693 1st, 5296 last INFO: ngram_search_fwdtree.c(1557): 1305 words for which last channels evalu ated (10/fr) INFO: ngram_search_fwdtree.c(1560): 207 candidate words for entering last p hone (1/fr) INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.013 xRT INFO: ngram_search_fwdtree.c(1565): fwdtree 2.08 wall 1.735 xRT INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words INFO: ngram_search_fwdflat.c(940): 292 words recognized (2/fr) INFO: ngram_search_fwdflat.c(942): 16616 senones evaluated (138/fr) INFO: ngram_search_fwdflat.c(944): 9007 channels searched (75/fr) INFO: ngram_search_fwdflat.c(946): 624 words searched (5/fr) INFO: ngram_search_fwdflat.c(948): 334 word transitions (2/fr) INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.020 xRT INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.107 INFO: ngram_search.c(1281): Eliminated 0 nodes before end node INFO: ngram_search.c(1386): Lattice has 38 nodes, 33 links INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:107:118) = -797261 INFO: ps_lattice.c(1390): Joint P(O,S) = -805533 P(S|O) = -8272 INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(878): bestpath 0.00 wall 0.003 xRT 000000018: CLOSE NOTE
结果发现准确率立马提高到90%以上了。。。
而且我的一开始文本库是
open browser
open music
open note
close window
close music
然后我测试的时候想测试下效果读了下open window和close note ,他竟然都准确识别出来了
不过正常口音下准确率虽然很高,但是你如果故意拖长发音那还是会识别不准确的
比如我拖长音节将近5秒读了 opennote 结果是
000000020: CLOSE OPEN NOTE OPEN NOTE
为什么能提高这么多准确率,就是跟统计模型的识别方法有关
记得它经过运算获取到两个连续的单词的出现的概率后,再计算相对频度
而我以前傻傻测试的单个单词族,因为没有什么相对频度之类的,都是单个单词
组合读取后也没有前后关系统计数据,所以准确率就很低了,只能靠dic的发音匹配