Audio and Prompts Discussions

Nested
Downsampling to 16 KHz
User: tpavelka
Date: 1/30/2009 7:35 am
Views: 6152
Rating: 3

Hello,

I have downloaded the speech files and found out that the audio files from

http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/

do not correspond with the MFC files in

http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/MFCC/16kHz_16bit/MFCC_0_D/

While there are approx 58 hours of MFCCs there are only about 2.5 hours of audio files in Main/16kHz_16bit. I am guessing that the 58 hours came from all the files in the directory Original which are in different sampling rates.

I would like to use the speech data to train HTK acoustic models which will later be used in my own recognizer. The problem is that the parametrization is not 100% compatible with HTK so I will need to parametrize the audio files myself.

I am trying to figure out how do I get all the audio files in the same sample rate (preferably 16KHz). So the question is were the audion files downsampled before the MFCCs in 16kHz_16bit/MFCC_0_D were created? And if so, can you tell me, what software did you use for that, or, even better can I see the scripts and configs?

--- (Edited on 1/30/2009 7:35 am [GMT-0600] by tpavelka) ---

Re: Downsampling to 16 KHz
User: kmaclean
Date: 1/30/2009 9:16 am
Views: 56
Rating: 3

Hi tpavelka,

>While there are approx 58 hours of MFCCs there are only about 2.5

>hours of audio files in Main/16kHz_16bit.

No. 

Not sure how you made your calculations, but the MFCCs are generated from the Main directory - there should be the same number of hours of speech audio in both...

>So the question is were the audion files downsampled before the

>MFCCs in 16kHz_16bit/MFCC_0_D were created?

Yes, from the 'Original' directory into the 'Main' directory.

>And if so, can you tell me, what software did you use for that, or, even

>better can I see the scripts and configs?

Sox,

see here: Main.pm calls the 'Downsample' method in: AUDIO.pm

Ken

--- (Edited on 1/30/2009 10:16 am [GMT-0500] by kmaclean) ---

Re: Downsampling to 16 KHz
User: tpavelka
Date: 1/30/2009 10:01 am
Views: 2393
Rating: 3

Hi, thanks for the fast reply. I have used an automatic downloader to fetch all the speech files. I guess the process got terminated for some reason and I did not notice. That's why when I measured the total length of speech data on my hard drive I only got 2.5 hours.

Thanks for the info on conversion software.

Tomas

--- (Edited on 1/30/2009 10:01 am [GMT-0600] by tpavelka) ---

PreviousNext