Acoustic Model Discussions

Flat
universal accoustic/language models? + some other questions
User: ardwennem2
Date: 12/16/2015 6:58 am
Views: 7204
Rating: 0

Hi,

I am a student from the TU Delft doing research about SR, however I am still quite new to the opensource ASRs and I have some questions about language models and accoustic models specificly.

As far as I could find on the internet almost all LM/AM are in the ARPA format (at least Julius and Spinx) and since Julius uses LM from HTK, I assume HTK has the same format as well? Which is actually my first question: because I also found that HTK uses the HTK ASCII format? Is this different?
I found that kaldi uses FST format, but there appear to be programs which can change LM/AM from ARPA format to FST format.
I could not check ISIP since the site had some problems and gave me a lot of 404 errors..

The reason I ask this, is because if these formats are indeed the same, does this mean you could use LM/AM from for example Sphinx in Julius, HTK and Kaldi as well? 

Another questions I have is, as I was looking at commercial systems (nuance, google api, windows 10 Cortana), I could not find if you cann add your own already existing LM/AM or not. So for example if I have a Sphinx LM/AM can I also use it with Cortana?

 

I found a LM/AM? model for Dutch on http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
However, I cannot find if this is a LM or an AM or both, so I was wondering if anyone knew how I can check that? Or if someone knows more about that specific model? 

 

And a last question i have: if the models you can use with sphinx, htk, Julius and kaldi are indeed the same. Does this mean there is (almost) no difference in the performance of each ASR? And if so, where do the differences lie between HTK, Sphinx and Julius? I did find an article from 2005, but since its from 10 years ago, I was hoping I could get an more up to date answer ;3

 

With kind regards,

Tom Brunner

--- (Edited on 12/16/2015 6:58 am [GMT-0600] by ardwennem2) ---

Re: universal accoustic/language models? + some other questions
User: kmaclean
Date: 12/21/2015 9:11 pm
Views: 233
Rating: 0

>where do the differences lie between HTK, Sphinx and Julius?

Comparing Open-Source Speech Recognition Toolkits

 

 

--- (Edited on 12/21/2015 10:11 pm [GMT-0500] by kmaclean) ---

Re: universal accoustic/language models? + some other questions
User: Visitor
Date: 1/11/2016 2:50 am
Views: 118
Rating: 0

Thank you for your answer, 

the article helps a lot
However, what I get from the article and my own understanding when looking at the different programs is that:

Kaldi provides the best WER however is most difficult to use,
HTK (with julius and hd decode) has good documentation but takes longest. But with Julius has the worst WER and with hd decode has the second best WER.
and CMU sphinx (with pocketsphinx and sphinx 4) is easiest to use but still has worse WER than htk with hd-decode and kaldi.

So my questions following this are:
Except for thet ease of use and better documentation, are there other reasons why one would prefer CMU sphinx and HTK over kaldi?
And why would anyone choose Julius over HD decode and why would anyone choose Sphinx over pocketsphinx?

Thank you,

Tom 

--- (Edited on 1/11/2016 2:50 am [GMT-0600] by Visitor) ---

Re: universal accoustic/language models? + some other questions
User: colbec
Date: 1/12/2016 4:46 am
Views: 2844
Rating: 1

It depends on your perspective. I am interested in high accuracy with speaker dependent models, so WER is not a good indicator for me. Julius, with the global poorest WER frequently gives me close to 100% correct recognition.

Assuming that you are interested in the speaker indpendent domain; as a beginner you will perhaps be inclined towards the tool that provides the best and most prompt answers to questions. You will find that Dan Povey and his associates are great on github with support for Kaldi, Nickolay Shmyrev has helped maybe thousands of beginners with Sphinx on these and other forums, and Professor Akinobu Lee who has recently moved Julius over to github is also providing help with Julius. And Ken Maclean is also great help generally. These are all very busy people, so they won't do your reading for you and may be less inclined to help if you show that you have not done your research.

The choice between Hdecode and Julius is frequently a matter of licensing. The documentation and previous discussions will give you more on this.

--- (Edited on 2016-01-12 5:46 am [GMT-0500] by colbec) ---

PreviousNext