Audio and Prompts Discussions

Nested
medical technical language voice files
User: paradocs
Date: 7/2/2009 4:38 am
Views: 11508
Rating: 7

Greetings
While working on open source programming
I came across your interesting project.
I am be happy to contribute my
Midwest US Iowa voice.

I ask if the current focus of VoxForge
for Desktop Command and Control includes
or will include medical transcription.
Perhaps it is not time for this yet.

What is the relative value to VoxForge of:?

1). Open source simulated medical text files ie
office visit, history, physical, surgery,
disability, radiology, and laboratory reports.

2). Reading of a dictionary list of terms.

3). Reading of a medical term in context
in a phrase.

3). Philips dss/dss2 to wav, Dictaphone digital
hand mike computer unit, vDictate hand mike
recordings of the same material.

4). Male Female reading the same material.

I hope I am not being too ambitious here
as much may depend on my wife, my transcriptionits,
legal advice and cooperation from the influenza
virus.

Best Wishes
paradocs

--- (Edited on 7/2/2009 4:38 am [GMT-0500] by paradocs) ---

Re: medical technical language voice files
User: kmaclean
Date: 7/2/2009 11:12 am
Views: 103
Rating: 10

Hi paradocs,

>What is the relative value to VoxForge of:?

>1). Open source simulated medical text files

Any recording of open source (GPL compatible) text is greatly appreciated!  Just follow the instructions on this page: Submit an AudioBook (LibriVox) Recording to submit your recordings and associated transcriptions.

>2). Reading of a dictionary list of terms.

That is OK too - it can help to ensure we have good coverage of all  phones and triphones in the English language.

>3). Reading of a medical term in context

>in a phrase.

This is better than isolated words, since you can model triphones across different words contexts during acoustic model training.

>3). Philips dss/dss2 to wav, Dictaphone digital

>hand mike computer unit, vDictate hand mike

>recordings of the same material.

We are collecting speech from many microphone types (with varying degrees of success...) Even though recording your voice using different voice recorders may not make that much difference (especially if your microphones are good quality), the fact that a person does not read the same text exactly the same way every time, means that multiple recordings of the same text by the same person are still beneficial. 

However, there are diminishing returns in the context of a speech corpus... where having too many recordings from one person can result in acoustic models that are overtrained for that person (2-3 hours being the max).

Please note that we prefer uncompressed audio recordings.

>4). Male Female reading the same material.

Yes this is OK.  We are, in essence, doing that now with the speech submission application  :) 

>I hope I am not being too ambitious here

>as much may depend on my wife, my transcriptionits,

>legal advice and cooperation from the influenza

>virus.

Any contribution that you can make would be greatly appreciated. 

thanks,

Ken

--- (Edited on 7/2/2009 12:12 pm [GMT-0400] by kmaclean) ---

Re: medical technical language voice files
User: nsh
Date: 7/4/2009 6:56 pm
Views: 92
Rating: 10

I could probably also suggest to review the medical dictionary we could automatically generate. Though it's a highly specialized task that requires both experience in medical terms and in a phonetics it would be very helpful. Here is how it looks:


immunoablative   IH M Y UW N OW AE B L AH T IH V
immunoablative(1)   IH M Y UW N OW AH B L EY T IH V
immunoablative(2)   IH M Y UW N OW AH B L AH T IH V
immunoabsorbent   IH M Y UW N OW AH B Z AO R B AH N T
immunoaffinity   IH M Y UW N OW AH F IH N AH T IY

--- (Edited on 7/4/2009 6:56 pm [GMT-0500] by nsh) ---

Re: medical technical language voice files
User: paradocs
Date: 7/6/2009 12:53 am
Views: 101
Rating: 8

Thanks for your encouragement.

My experience is with indexing
audio files with text to speech
for audio learners, dyslectic,
low vision, and blind.
Please forgive my newcomer
status in the speech to text field.

1) Phonetic medical dictionary:
Yes, this looks interesting to me.
I may have some of the necessary
qualifications. I may have to study
non North American dialects
if this is important.
Perhaps the speech corpora and
model makes these adjustments?

2) Script to audit a machine dictionary:
I am using a script to control
festival to read a dictionary
list with the cmu phonetic option.
I can repeat and easily insert
optional phonemes to compare.

3) Sample editing procedure:
a) Get dictionary phonetics
immunoablative  (im"u-no-ab´lÓ™-tiv)
Dorland's Medical Dictionary
b) convert phonemes to lower case
c) loop festival to read options
(lex.select "cmu")
(SayPhones (quote ( ih m y uw n ow  ae b l ah t ih v ) ) )
d) to consult make wav files,
e-mail, play for others, and poll for a consensus
(set! ut1 (SayPhones (quote ( ih m y uw n ow  ae b l ey t ih v )))
(utt.save.wave ut1 "immunoablative1.wav" "wav" )
e) sample analysis
immunoablative      IH M Y UW N OW AE B L AH T IH V
immunoablative(1)   IH M Y UW N OW AH B L EY T IH V
immunoablative(2)   IH M Y UW N OW AH B L AH T IH V
I have review immunoablative0.wav,
immunoablative1.wav and immunoablative2.wav.
I find that 0 and 2 do not match any
regional accent, even Southern US.
Number 1 should be promoted to the top.
I would then add as number 1
immunoablative      IH M Y UW N OW EY B L EY T IH V
Because during dictaion initial vowels are
simply thought or mouthed but not vocalized,
should bad pronunciations be recognize?
immunoablative      IH M Y UW N OW B L EY T IH V
immunoablative         M Y UW N OW B L EY T IH V
immunoablative      M Y UW N OW AH B L EY T IH V

4) Recorded medical dental technical dictation:
My strategy is to use short several word segments
to avoid ethical and legal issues
of possible association of a recorded segment
with any identifiable patient.
Are there ways to safely uses commercial
speech to text software that will not
contaminate the open source status of
the VoxForge Speech Corpus?

My local 5 to 10 second wav recordings
will be have file names which include the
prompt and end of sentence indicator
if the phrase is a sentence or heading:
0001_final_diagnosis+diabetic_ketoacidosis+.wav
0002_plan+he is to return to clinic in two weeks+.wav
The idea is to allow error checking,
replacement, and searchable prompts.
Simple scripts can then build the files
needed for submission to VoxForge.
Please forgive my rambling if this
makes no sense.

Best Wishes
paradocs

--- (Edited on 7/6/2009 12:53 am [GMT-0500] by paradocs) ---

--- (Edited on 7/6/2009 12:58 am [GMT-0500] by paradocs) ---

--- (Edited on 7/6/2009 1:05 am [GMT-0500] by paradocs) ---

Re: medical technical language voice files
User: kmaclean
Date: 7/6/2009 9:52 am
Views: 308
Rating: 7

Hi Paradocs,

>1) Phonetic medical dictionary:
>[...]Perhaps the speech corpora and
>model makes these adjustments?

In a sense, it does these make adjustments in the acoustic model training process when "forced alignment" is performed on the training set - to select those pronunciation dictionary entries that best match the speech in the training set. 

However, you still need to create the multiple pronunciation entries in the pronunciation dictionary for this "selection of best pronunciation" to occur. 

To start with, it is much easier to create a single best match for a word pronunciation (which best reflects how you or your wife pronounce the word), rather than trying to figure out all possible dialect pronunciations.

>3) Sample editing procedure:
>[...]
>I have review immunoablative0.wav,
>immunoablative1.wav and immunoablative2.wav.
>I find that 0 and 2 do not match any
>regional accent, even Southern US.
>Number 1 should be promoted to the top.
>I would then add as number 1
>immunoablative      IH M Y UW N OW EY B L EY T IH V

seems like a reasonable (though time consuming) approach...

>Because during dictaion initial vowels are
>simply thought or mouthed but not vocalized,
>should bad pronunciations be recognize?

In my opinion, no.  These infrequent mispronunciations become part of the acoustic model, and don't need a specific training case.

But nsh is the expert, he may have a different view...

>4) Recorded medical dental technical dictation:
>My strategy is to use short several word segments
>to avoid ethical and legal issues
>of possible association of a recorded segment
>with any identifiable patient.

That is OK, as long as whoever is reading the text is OK to have their speech released under the GPL license.

>Are there ways to safely uses commercial
>speech to text software that will not
>contaminate the open source status of
>the VoxForge Speech Corpus?

RalfHerzog used Dragon dictate to validate his speech before submitting it to VoxForge, that way he did not have to manually review each submission after recording it. 

It is also good to use something like the Audacity audio editor (open source) to review the waveforms generated by the recordings, to make sure that the recording are not too loud or soft, and to ensure that there is half a second before and after the utterance (the acoustic model training process needs this...)

>My local 5 to 10 second wav recordings
>will be have file names which include the
>prompt and end of sentence indicator

It is best to use the approach used by HTK and Julius, which is to have a separate text file (called prompts or transcriptions) containing the name of the audio file in the first column (without the suffix) and then the prompt transcription. 

Sphinx uses a separate prompt file also, but in a little different format (a little more XML'ish).

>Please forgive my rambling if this
>makes no sense.

All were excellent questions.

thanks for your interest in helping out,

Ken

--- (Edited on 7/6/2009 10:52 am [GMT-0400] by kmaclean) ---

Re: medical technical language voice files
User: paradocs
Date: 7/20/2009 4:50 am
Views: 96
Rating: 8

Hi kmaclean
Hi nsh

I continue to be interested in
machine generated but human checked
medical and technical phonetic lists.

I have made good progress on an editor
to allow efficient and rapid modifications
to a letter to phoneme generated list.

>seems like a reasonable (though time consuming) approach...
Yes, agreed.  The possibility of such consultation was
illustrative of a comprehensive but rare occurrence.

1) Which list of phonemes should be used?
For Festival CMU generated phonemes I do not find
documentation for DX and IX.
Are these needed for efficient phonetic alignment?
Are needs better served with espeak's phoneme symbols?

2) Word list size and subdivision:
Do too many words and too many alternatives
add a cost to the efficiency of a speech model?
If this is so, perhaps specialty lists may help.
Proprietary drug names use is generally tie to
to 17 years U.S. patent period until generics come.
(Hmmmm perhaps that has something to do with locusts.)
Except for the few that become standard words,
this list is best edited separately.

3) There are many dictionary formats that
appear to have just grown from each project.
I am considering a standard feature rich format
from which existing formats may be extracted.
Is there any reference to this in the literature?

Best Wishes and Thankyou for your guidance,
paradocs

--- (Edited on 7/20/2009 4:56 am [GMT-0500] by paradocs) ---

Re: medical technical language voice files
User: kmaclean
Date: 7/20/2009 11:07 am
Views: 982
Rating: 7

Hi Paradocs,

>I continue to be interested in
>machine generated but human checked
>medical and technical phonetic lists.

You might be interested in this thread: Sequitur G2P

>1) Which list of phonemes should be used?

The VoxForge pronunciation dictionary was derived from the xvoice version of  the CMU pronunciation dictionary/phone list.  It can be used with HTK and/or Julius.

I believe that the Sphinx group of speech recognition engines uses CMU pronunciation dictionary version 0.7 (as far as I know, there were no changes to phoneme list, just better edited...)

Someday VoxFoge will have to move to version 0.7 of the CMU dict... therefore, I would recommend using it, and its phone set.

>For Festival CMU generated phonemes I do not find
>documentation for DX and IX.
>Are these needed for efficient phonetic alignment?

Now that is an interesting question... 

I have not looked at this in a while, but it seems that when I first created the VoxForge dictionary, I could not use CMU dict v0.6 (contrary to what I state in this FAQ entry: What is the VoxForge phoneset?) but actually used the CMU unstressed dictionary from the  xvoice site (see Notes file). 

All this time, I have assumed that CMU dict v0.6 and the CMU unstressed dictionary used the same phone set, but that does not seem to be the case, e.g.:

xvoice cmu-unstressed:

ABBREVIATED                    AX B R IY V IY EY DX AX D

Voxforge pronunciation dictionary

ABBREVIATED     [ABBREVIATED]   ax b r iy v iy ey dx ax d

cmu dict v0.6 (stressed - remove numbers for unstressed)

ABBREVIATED  AH0 B R IY1 V IY0 EY2 T AH0 D

Speech Recognition engines don't really care what identifier you use for a particular phoneme, as long as you are consistent.   The CMU pronunciation dictionary is the de facto standard open source pronunciation dictionary (for English), so I would recommend that you use the most current version (CMU dict v0.7).

>Do too many words and too many alternatives
>add a cost to the efficiency of a speech model?

As I alluded to in the previous post, alternative pronunciations are only relevant for the training process, not during speech recognition. 

The forced alignment process in training an acoustic model looks at the actual pronunciations in your training set and selects the pronunciation that best matches the speech, and then uses that for speech recognition... all other alternative pronunciations are ignored during speech recognition.  That is why you sometimes see specialized acoustic models trained for specific dialects.

So your time is better spent creating pronunciations that match the actual speech in your training data.

>3) There are many dictionary formats that
>appear to have just grown from each project.
>I am considering a standard feature rich format
>from which existing formats may be extracted.
>Is there any reference to this in the literature?

This is a project in itself.  Ralfherzog has been very interested in doing something like this.

Look at the SAMPA and IPA formats, and PLS (XML based).

Ken

--- (Edited on 7/20/2009 12:07 pm [GMT-0400] by kmaclean) ---

Re: medical technical language voice files
User: nsh
Date: 7/21/2009 8:22 pm
Views: 92
Rating: 7

Btw, we already have a list of medical terms automatically extracted from the PubMed database:

http://www.ncbi.nlm.nih.gov/pubmed/


and their transcription generated by Sequtuir G2P. The list could be of arbitrary length and already automatically transcribed. If you are interested you could just check the results.

 

--- (Edited on 7/21/2009 8:22 pm [GMT-0500] by nsh) ---

Re: medical technical language voice files
User: paradocs
Date: 7/22/2009 2:47 am
Views: 100
Rating: 7

Greetings,

This is interesting and can perhaps
help compare the efficiency of
Sequitur G2P, espeak, festival and machine
translation of electronic dictionaries
phonetics.

Please give me the link to the term list and
trial phonemes.

The PubMed may be more research science oriented
than clinical.  I was looking at words from
http://en.wiktionary.org/wiki/Category:Medicine
and an open source medical spell checking list :
http://www.e-medtools.com/openmedspel.html
I do not believe word lists present
any copy right issue.

Best Wishes
paradocs

--- (Edited on 7/22/2009 2:47 am [GMT-0500] by paradocs) ---

Re: medical technical language voice files
User: nsh
Date: 7/22/2009 8:19 pm
Views: 2945
Rating: 9

> http://www.e-medtools.com/openmedspel.html


Thanks a lot, I didn't know about it.

>Please give me the link to the term list and trial phonemes.

Here is the automatically transcribed part of the list above

http://www.mediafire.com/?dwzxyiizted

 

--- (Edited on 7/22/2009 8:19 pm [GMT-0500] by nsh) ---

PreviousNext