They skilled it on two new information units: one which incorporates audio recordings of the New Testomony Bible and its corresponding textual content taken from the web in 1,107 languages, and one other containing unlabeled New Testomony audio recordings in 3,809 languages. The workforce processed the speech audio and the textual content information to enhance its high quality earlier than operating an algorithm designed to align audio recordings with accompanying textual content. They then repeated this course of with a second algorithm skilled on the newly aligned information. With this technique, the researchers have been in a position to educate the algorithm to study a brand new language extra simply, even with out the accompanying textual content.
“We will use what that mannequin discovered to then shortly construct speech techniques with very, little or no information,” says Michael Auli, a analysis scientist at Meta who labored on the mission.
“For English, we’ve got tons and plenty of good information units, and we’ve got that for a couple of extra languages, however we simply don’t have that for languages which can be spoken by, say, 1,000 individuals.”
The researchers say their fashions can converse in over 1,000 languages however acknowledge greater than 4,000.
They in contrast the fashions with these from rival firms, together with OpenAI Whisper, and declare theirs had half the error fee, regardless of protecting 11 occasions extra languages.
Nevertheless, the workforce warns the mannequin continues to be liable to mistranscribing sure phrases or phrases, which might lead to inaccurate or doubtlessly offensive labels. Additionally they acknowledge that their speech recognition fashions yielded extra biased phrases than different fashions, albeit solely 0.7% extra.
Whereas the scope of the analysis is spectacular, using non secular texts to coach AI fashions will be controversial, says Chris Emezue, a researcher at Masakhane, a corporation engaged on natural-language processing for African languages, who was not concerned within the mission.
“The Bible has quite a lot of bias and misrepresentations,” he says.