![]() “We trained multilingual speech recognition models on over 1,100 languages using a 1B parameter wav2vec 2.0 model,” Meta’s researchers explained. Meta said the resulting models performed well on both standard benchmarks such as FLEURS and in comparison with other speech recognition models. The MMS project trained multiple, self-supervised models on around 500,000 hours of speech data in over 1,400 languages, before fine-tuning the resulting models for a specific speech task, such as multilingual speech recognition or language identification. ![]() With it, it’s possible to train speech recognition models on far less data. Wav2vec 2.0 is a self-supervised learning algorithm that enables machines to learn without relying on labeled training data. Of course, 32 hours of data is not enough to train a conventional supervised speech recognition model, and that’s why wav2vec 2.0 was used. “As part of this project, we created a dataset of readings of the New Testament in over 1,100 languages, which provided on average 32 hours of data per language,” Meta’s researchers said. Its translations are often studied for text-based language translation research, and for many there are also publicly available audio recordings of people reading these texts. To overcome the lack of data for certain languages, Meta’s researchers turned to the Bible, which unlike most other books has already been translated into many thousands of languages. Meta’s MMS project does away with the requirement by combining a self-supervised learning algorithm called wav2vec 2.0 with a new dataset that provides labeled data for over 1,100 languages, and unlabeled data for almost 4,000 languages. For many languages, especially the more obscure ones, that data simply doesn’t exist. However, training high-quality models generally requires enormous amounts of data - thousands of hours of audio, together with transcriptions of what’s being said. ![]() Meta Platforms Inc.’s artificial intelligence research team today said it has open-sourced a new project called Massively Multilingual Speech, which aims to overcome the challenges of creating accurate and reliable speech recognition models.ĪI models that can recognize human speech and respond to it clearly have a lot of potential, especially for people who rely entirely on voice access to obtain information.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |