Audio and Prompts Discussions

Nested
Sources forum
User: CarlFK
Date: 6/17/2008 2:33 pm
Views: 15221
Rating: 11

Hi Gang,

I just started looking into speech recognition about 24 hours ago, so forgive my newness.  

I have seen a few threads here and there talking about different places to get text and speach (like dvd subs and closed caption) but I bet the idea will come up again as the current posts age. 

How about a new forum dedicated to sources? 

I think the topic is 'seperate' enough to isolate it from the other topics (like for skimming and searching) and would make the "has this been discuessed" and "has this angle been mentioned" questions easier to answer.  I have a few thoughts:  kariokie (both in bars and at home, which  includes the recient RockStar explosion), speach training, tapping into existing streams (the weekly reading to children at the local library) reading bible passages.  

 IVR systems that sample a stranger's voice, analyze it, confirm the hit ("please speak your address","8345 Newland Av" "did you say eighty-three fourty-five Newland Avenue?"  "yes")

 Call centers: 100's of people reading text from a screen.

 Cleaned up dictations: audio -> text, human cleans up the text, submit the pair.  

 Are court transcriptions public?

 Some of these sources may be 'noisy' which might poisen the database if shoveled in whilly nilly.. I have a different thread for that.

--- (Edited on 6/17/2008 2:33 pm [GMT-0500] by CarlFK) ---

Re: Sources forum
User: Visitor
Date: 6/17/2008 9:07 pm
Views: 111
Rating: 9

more sources:

 http://www.gutenberg.org/browse/categories/1  "Audio Book, human-read" - about 380, but in compressed formats.  I'll go find the thread about compressed vs raw.

  http://www.1-language.com/esllistening/easy1_script.htm (compressed, not sure what the licencing is)

http://web.uvic.ca/ling/resources/ipa/charts/IPAlab/IPAlab.htm (can't find a copyright notice)

Also, I hear there is a local project where text is read/recorded text for people who have trouble reading.  They already have A) nice recording environment/equipment, B) no conflict of interest, C) charitable (so they might be of the mindset to want to help a project like voxforge.)



--- (Edited on 6/17/2008 9:07 pm [GMT-0500] by Visitor) ---

Re: Sources forum
User: kmaclean
Date: 6/18/2008 12:10 pm
Views: 125
Rating: 10

Hi CarlFK,

Thanks for the suggestions! 

>How about a new forum dedicated to sources?

We have a dev Wiki page for this: AudioSources  

Ken 

--- (Edited on 6/18/2008 1:10 pm [GMT-0400] by kmaclean) ---

Re: Sources forum
User: kmaclean
Date: 6/18/2008 12:27 pm
Views: 147
Rating: 8

Hi CarlFK,

Thanks again for the links!

The big thing from an acoustic model training perspective (aside from the  Copyright issues) is to have *transcribed* speech audio - i.e. text containing the exact words spoken in the speech audio file. 

With this, we can then process the audio for use in training acoustic models (AMs).  This involves segmenting the speech into roughly 15-20 word sentences, and creating a text file that contains transcriptions of each file (one per line). 

The segmentation process is described here:  Automated Audio Segmentation Using Forced Alignment (Draft).  Although most of the segmentation process is automated, there is still an issue of how to deal with words that are not currently in the pronunciation dictionary.  There are a few approaches, and I've settled on the Sequitor G2P (Grapheme-to-Phoneme) scripts (written in Python - you'll be happy to know...).

The problem is that although Sequitor G2P is quite good, it is not 100% accurate, and pronunciations need to be validated.  I'm currently working on a script (in Perl...) that will help speed things up.  See here for the source: AudioBook.pm.  It's still not finished, but its getting there.

This script will go a long way to help in processing the backlog of uncompressed LibriVox submissions (scroll to the bottom of the page) we have that need to be segmented before they can be used for training an AM. 

Ken 

--- (Edited on 6/18/2008 1:27 pm [GMT-0400] by kmaclean) ---

Re: Sources forum
User: CarlFK
Date: 6/18/2008 2:17 pm
Views: 115
Rating: 8
"Authorization Required" - can I get an account?

--- (Edited on 6/18/2008 2:17 pm [GMT-0500] by CarlFK) ---

Re: Sources forum
User: Visitor
Date: 6/18/2008 6:55 pm
Views: 126
Rating: 10

 

>>“add in Alphabetical sequence.” that one is gonna take some doing :)

just joking – seemed odd that you hadn't already automated that.

>>“you need to make a judgment call

> These type of errors may or may not be caught by the iterative process described above - my sense is that a better acoustic model would catch these errors, but this assumption needs to be tested.

I have a feeling it will be better to eliminate the human interaction from the process. This will speed up the assimilation, and the over all system is based on statistical data, so the anomalies will show up once the data set gets sufficiently large. It may require some sort of auditing of the corpus: periodically re-scan all wav's used to create it, and make sure it still matches the transcription. Although I have to confess I have 0.0 clue how the wav data is matched or converted to the phones, so I may be assuming 'some magic happens.'

I am assuming you are keeping all the .wav and text files, right?

 

> Step 8 - Realigning the Training Data “It is time well spent to review the log to make sure that HVite recognized all the words for each line in your prompts file. ”

Another for the list of things to automate, right?


>>Open Source English-Catalan Dictionary Project [...] But I get a vibe that says these guys would be interested in helping voxforge.

> In what ways do think they might they be able to help?

providing transcribed audio (from the videos.) I would also expect them to add in pronunciations, either textual or audio.   but they are slipping down my list as other more lucrative sources get added

 

--- (Edited on 6/19/2008 8:08 pm [GMT-0500] by CarlFK) ---

Re: Sources forum
User: kmaclean
Date: 6/20/2008 9:30 am
Views: 4067
Rating: 15

>I am assuming you are keeping all the .wav and text files, right?

I'm not sure I understand what you are asking... all submitted audio and text gets put in a Subversion repository, and gets replicated to the VoxForge Repository.  The Listen page contains an English Forum page, with an entry for each submission (including the text of the submission, and a link to the audio on the VoxForge Repository) so that people can add comments to a submission (errors, rate the submission,...).

Ken 

--- (Edited on 6/20/2008 10:30 am [GMT-0400] by kmaclean) ---

PreviousNext