Mixed Language Speech Recognition Without Explicit-Books Pdf

Mixed Language Speech Recognition without Explicit
22 Apr 2020 | 16 views | 0 downloads | 6 Pages | 457.71 KB

Share Pdf : Mixed Language Speech Recognition Without Explicit

Download and Preview : Mixed Language Speech Recognition Without Explicit


Report CopyRight/DMCA Form For : Mixed Language Speech Recognition Without Explicit



Transcription

93 American Journal of Signal Processing 2012 2 5 92 97. of our knowledge there is no work reported in literature for first task in multi lingual ASR is to identify the language. India specific language mix This problem of identifying language is well addressed in. There are two major distinct frameworks to build mixed literature 5 Language identificat ion using LPC based. language automatic speech recognition ML ASR namely acoustic features was proposed by Cimarusti et al 5 They. mu lti pass and one pass framework In a mu lti pass ML ASR were ab le to identify eight different languages with. the exact instance in spoken speech where language switch reasonable success In another work Foil 7 used prosodic. happens is determined and the language of the speech features for language identification and Naratil et al 10. identified Once the language of the speech segment is successfully used phonotactic acoustic features Later Yan. known corresponding language dependent automatic speech 9 applied a comb ination of acoustic phonotactic and. recognition ASR is used to recognize the speech segment prosodic information for language identification. Note that a typical ASR is language specific and uses Nagawaka 8 co mpared four different methods to identify. acoustic model AM language model LM and a languages and concluded that continuous hidden Markov. pronunciation lexicon PL built for that language to model HMM based method works best Many recognizers. recognize spoken speech The AM LM and PL are like Gaussian Mixture Model GMM single language. constructed from language specific speech and text corpus phone recognition followed by language modelling PRLM. through a training process In the one pass approach an ASR parallel PRLM PPRLM GMM tokenization 6 and. is built namely AM LM and PL which encompasses both Gaussian Mixture Bi gram Model GM BM 11 have also. the languages in the mixed language This enables ML ASR been studied in literature for mu lti lingual speech. on mixed language speech The one pass approach is simpler recognition. compared to mult i pass approach because a there is no need In order to use the mu ltilingual approaches in mixed. to specifically identify the language and b employ several language speech recognition one needs to identify the exact. language specific ASRs Ho wever one pass approach to time instants at which switching fro m one language to. ML A SR poses problems in the form of a need to collect another occurs and follow it up with language identification. sufficient amount of mixed language speech corpus audio Automatic segmentation of different languages within a. and the associated text transcription which can be used to speech utterance had been addressed by Wu et al 4 who use. build the mixed language acoustic and the ML language Bayesian informat ion criteria BIC on Delta MFCC Mel. model required fo r M L ASR In this paper we hypothesize Frequency Cepstral Coefficients In another related work. that one could use available resources for example acoustic Chi Jiun et al 12 use statistical approach to segment and. models of one of the languages in the mixed language and identify language in a speech utterance They use maximu m. carefully construct the LM and PL to do a M L ASR We a posterior MAP estimate to find the boundary segments to. conducted several experiments on mixed language speech do language identification. where the p rimary language is Hindi and the secondary. language is English It should be noted that the approach is. independent of the language mix in the sense that any other. Indian language can take the place of Hindi with appropriate. mapping of the phone in that Indian language to the English. The rest of the paper is organised as follows A short. review on mult i pass and one pass frameworks for multi. lingual speech is discussed in Section 2 followed by. discussion on the mixed language database used in our. experiments and highlighting our approach in performing. mixed language ASR in Section 3 In Section 4 we discuss. experimental results and finally conclude in Sect ion 5. 2 Existing Approaches Figure 2 Multi pass approach for mixed language ASR. Recognition of mixed language speech is still in its init ial Mixed language speech recognition using multi pass. stages of research There are two approaches reported in framework can be realised using the following steps see Fig. literature One being mult i pass framework 4 and other is 2 The mixed language speech input is divided into. the one pass framework 3 Ho wever mult ilingual speech segments based on identification of instant where language. recognition is another area of research which has close change occurs Then the language of the segment is. relationship with ML ASR In mu ltilingual speech identified using a language identification module Then a. recognition the spoken speech is not a mix of two languages language dependent ASR is used to recognize that particular. unlike M L ASR however the main challenge is that one segment of speech The recognition performance of mu lti. does not know a priori the identity of the language So the pass approach depends on a performance of the language. Kiran Bhuvanagirir et al M ixed Language Speech Recognition without Explicit Identification of Language 94. boundary detection and b language identification block and reported in this paper are based on these word utterances. c the actual performance of the language specific ASR During data collection the speakers were supplied a speaker. Clearly a poor performance by any one of the three blocks sheet in Hindi script and were asked to call fro m a quite. affects the overall performance of the mult i pass based environment and the recording was done using a telephony. ML A SR system The one pass framework 3 avoids the card specifically we used a Dialogic CTI card The speech. drawback of mult i pass system by building a PL AM and was recorded at 11 kHz and 8 bits per samp le using a ho me. LM to encompass both the languages in the mixed language grown data collecting application. The acoustic model for mixed language is an AM generated Our approach retains the framewo rk of a one pass method. for the co mbined phoneme set of the languages in the mixed with the use of appropriate PL The use of a modified PL. language Advantage of this approach shown in Fig 3 is enables us a avoid building an AM for the mixed language. that it is not dependent on the language boundary detection note that mixed language speech corpus is difficult to. block or the language identification block It is similar to a collect and b further recognition can be performed with. language specific ASR except that the AM LM and PL are ASR of one of the languages We used the public domain. built for the mixed language Note that this approach needs speech recognition engine Sphin x 15 with the HUB4. mixed language speech and text corpus wh ich generally is English phones AM in one set of experiments and in. not available Clearly the existing approaches cannot be used another set of experiments we used the readily availab le. for M L ASR Hindi ASR 20 AM Hindi phones The reason for using. these AM instead of AM for mixed language was a these. AMs were readily available for use and b building acoustic. models for mixed language was too cumbersome requiring. actual on the field collection of a large amount of speech. corpus to which we did not have access It should be noted. that a Hindi ASR has 59 phonemes while English has only 39. phonemes When using English acoustic models we, approximate those phonemes main ly occurring in Hindi. words which are not in English by replacing the phoneme in. Figure 3 One pass approach for mixed language ASR Hindi by a co mbination of two or more English. phonemes 13 The PL that supports the ASR is constructed. In our approach we used the one pass framework however in the usual way by using the CMU language toolkit 14 for. we used the AM of a single language which was readily all the English words in the corpus However all the Hindi. available instead of trying to undertake the Herculean task words are first transliterated into English and the. of collect ing speech corpus and transcribing it to build AM pronunciation of this English word is obtained using 14 or. for the co mplete phone set which encompasses both the approximate phoneme mapping APM. languages We however built a small database of mixed. language corpus to a construct the language model to. handle mixed language recognition 16 and b to test our. 3 Proposed Approach,We have worked on a specific language mix namely. Hindi English whose usage is very common in the Indian. subcontinent Specifically Hindi being the native language is. spoken majority of t ime co mpared to the non native Figure 4 Proposed approach. language English In our corpus a little mo re than two thirds. of the total spoken words in the corpus were spoken in Hindi. and the rest namely one third being either English words or 4 Results and Discussion. proper nouns Overall our corpus consisted of 46 different. speakers with sufficient gender and age variab ility fro m We conducted in all a set of nine different experiments to. different metros in India Each of the speakers uttered three evaluate the performance of our approach for M L ASR In. to five different sentences which had a mix of Hindi English the first set of experiments we used the English AM s while. of which at least one sentence uttered by the speaker was in the second set of experiment we used the Hind i language. elicited speech The elicited speech gave an indication of the AM s. actual mix of the language as spoken in everyday In all our experiments we used the Sphin x ASR 15 and. conversation In all there were 213 unique spoken sentences the well known n gram LM created using the mixed. consisting of 1946 words All the experimental results language speech corpus that we collected Section 2 In each. 95 American Journal of Signal Processing 2012 2 5 92 97. of these experiments the manner of construction of PL was have presented word error rates WER on Train dataset. different The distribution of the Hindi English and proper Table 2 and Test dataset averaged over three rounds of. noun words in the corpus was 62 28 and 10 cross validation in Tab le 3 separately In case of the Train. respectively For the first set of eight experiments done using dataset the textual data used for constructing the LM is same. English AMs we used two different methods of PL as the corresponding speech data used for recognition wh ile. construction for the three different types of words namely in the case of the Test dataset the text data used for. English words Hindi words and proper nouns The first constructing the LM was not part of the speech data used for. method of PL creation is based on the CM U toolkit 14 and recognition in that sense the data used for LM construction. the second method is based on approximate phoneme and that used for recognition were comp lementary sets It. mapping APM In APM method of lexicon creation a word can be seen that the word accuracies for Hindi and proper. is first transliterated and the equivalent Hindi phonemes are nouns of the ML ASR is higher when the PL for the Hindi. generated each of these Hindi phonemes is then replaced by words and proper nouns is built using the approximated. one or more equivalent English phonemes For example the English phones Expt 3 Expt 5 and Expt 9 co mpared to. Hindi word Matsyagandha is represented using when the PL is built using CMU toolkit Expt 1 Expt 2 Expt. the CMU tool kit as M AE TH S A Y A H G A H N D see Fig 4 Expt 6 Expt 7 and Expt 8 We can conclude that. 5 a While the equivalent pronunciation representation representing non English in this case Hindi and proper. using Hindi phoneme set is M A T A S Y A G A N DH A nouns words using approximate English phonemes decrease. see Fig 5 b Using APM the same word is the WER Overall W ER is less when English words are. represented as M AH TH A H S Y A H G represented by CMU toolkit and the Hindi words and proper. AH N DH HH AH see Fig 5 c Note that in APM a nouns are represented using APM Expt 3 and Expt 9 in the. Hindi phoneme is replaced by one or more equivalent PL A lso note that the performance in Expt 1 and Expt 8 for. English phonemes For example the phone DH occurring English words is far poorer compared to performance in all. only in Hindi is substituted by the phones DH HH in other experiments this can be attributed to the imperfect. English see Fig 5 For examp le the English word representation of Hindi or proper nouns words in Expt 1. Identification can be transliterated and Expt 8 resulting in misrecognition of English words. similarly as aidentiphikation and equivalent pronunciation preceding or succeeding Hindi words or proper nouns we. using Hindi phoneme set is EI D E N T I PH I K EI SH A N used 3 gram representation of the mixed language in the. A see Fig 5 b Using APM it is represented as AY D EH LM. N T IY F IY K EY SH A H N see Fig 5 c In the ninth Table 2 Word Error Rate Train dataset. experiment we represented every word in the PL using both. Accuracies, the alternative phonetic representations see Fig 5 d. Expe rime English Hindi Proper, namely using CMU and APM Tab le 1 shows experiment O verall. nts words words nouns, number and the method used to construct the PL For accuracy W.
100 100 100, example in Expt 6 APM was used to construct the English correct correct correct. Mixed Language Speech Recognition without Explicit Identification of Language 94 boundary detection and b language identification block and c the actual performance of the language specific ASR Clearly a poor performance by any one of the three blocks affects the overall performance of the multi pass based ML ASR system The one pass

Related Books

Sand Filter Owner s Manual Reliant Pool Products

Sand Filter Owner s Manual Reliant Pool Products

Sand Filter Owner s Manual INSTALLATION OPERATION amp PARTS This equipment must be installed and serviced by a qualified technician in accordance with all applicable codes and ordinances Improper installation can create hazards which could result in property damage serious injury or death Improper installation will void the warranty Notice to Installer This manual contains important

Azul y rojo Jos Mar a de Llanos La esfera de los libros

Azul y rojo Jos Mar a de Llanos La esfera de los libros

Jos Antonio y de Pedro Arrupe tras ingresar en la Compa a de Jes s vivi la expulsi n de la Rep blica y desde el exilio el fusilamiento por los rojos de dos de sus hermanos De regreso a Espa a se convirti en cura del r gimen capell n del Frente de Juventudes y del SEU creador del SUT e incansable director de ejercicios espirituales que lleg a impartir incluso al

Data is the new gold The future of real estate service

Data is the new gold The future of real estate service

The future of real estate service providers 2 Introduction 03 A glimpse into the future 04 Transformation through technological innovations 05 Digitalization in new construction and existing buildings 09 Transformation through modified user behavior 11 Big data determines the industry s future 14 The skills of tomorrow s real estate manager 18 Conclusion20 Table of contents ata is the

Implications of Big Data for Customs How It Can Support

Implications of Big Data for Customs How It Can Support

How It Can Support Risk Management Capabilities March 2017 Yotaro Okazaki 2 Abstract The purpose of this paper is to discuss the implications of Big Data for Customs particularly in terms of risk management To ensure that better informed and smarter decisions are taken some Customs administrations have already embarked on big data initiatives leveraging the power of analytics

The Impact of Big Data and Artificial Intelligence AI in

The Impact of Big Data and Artificial Intelligence AI in

The Impact of Big Data and Artificial Intelligence AI in the Insurance Sector 1

the maritime industry Use of big data in Paterson Simons

the maritime industry Use of big data in Paterson Simons

Use of big data in the maritime industry This research report was commissioned by Trelleborg Marine Systems and supported by Port Technology 2 Chapters FOREWORD by Richard Hepworth 3 PRESIDENT TRELLEBORG S MARINE SYSTEMS OPERATION THE PTI INSIGHT 4 A WORD FROM PORT TECHNOLOGY INTERNATIONAL SECTION 1 5 BIG DATA IN THE MARITIME INDUSTRY TODAY OVERVIEW KEY STATISTICS AND LEADING OPINIONS

Meeting the challenges of big data

Meeting the challenges of big data

Meeting the challenges of big data A call for transparency user control data protection by design and accountability 19 November 2015 2 P a g e The European Data Protection Supervisor EDPS is an independent institution of the EU The Supervisor is responsible under Article 41 2 of Regulation 45 2001 With respect to the processing of personal data for ensuring that the fundamental

THE BIG POTENTIAL OF BIG DATA images forbes com

THE BIG POTENTIAL OF BIG DATA images forbes com

Executives believe that they are using big data enough when they aren t A majority of agencies and non agencies said that they were frequently or always making sufficient use of data in marketing deci sions However only about one in 10 non agencies managed more than half their advertising marketing with big data and a third of agencies used big data in more than half their initiatives

BIG DATA AND THE 2030 AGENDA FOR SUSTAINABLE DEVELOPMENT

BIG DATA AND THE 2030 AGENDA FOR SUSTAINABLE DEVELOPMENT

Big Data and Policy for the 2030 Agenda for Sustainable Development 30 6 1 A Vision for Big Data and the 2030 Agenda 30 6 2 Possible Action 31 List of Tables Figures and Boxes Table 1 Roles of Stakeholders in Data Ecosystem Figure 1 Big Data and Policy Figure 2 The 3 V s Figure 3 The 3 C s Diagram Figure 4 The Interface of Big Data and Open Data Figure 5

Big Data is the Future of Healthcare Cognizant

Big Data is the Future of Healthcare Cognizant

Big Data is the Future of Healthcare With big data poised to change the healthcare ecosystem organizations need to devote time and resources to understanding this phenomenon and realizing the envisioned benefits Executive Summary Big data is already changing the way business decisions are made and it s still early in the game However because big data exceeds the capacity and