Last Saturday, I was keynote speaker and presented something at the Goethe Institute never before seen…well, heard…in Cameroon.
Uwe Jung, the librarian at Goethe had thrown out another challenge, he asked the group if we could get the computer to speak Cameroonian languages. He must have known this would interest me, and he would have been right. I dove in, searching documentation on the challenges of converting these complex languages from text back into speech.
I knew some of the challenges and graces. We have a common alphabet here in Cameroon for all 280 languages. This is great, because the computer will see a subset of the same 40 characters, and I can predict what I might find. The challenge is that each language has used each letter to represent a slightly (or wildly) different sound…so I knew this couldn’t be a one-size-fits-all job.
Challenge 1: Pronunciation
I started with Ewondo, the local language I’d been learning. Even though I’m a newbie, it is the only Cameroonian language that I knew both the sounds and words. I set to work, placing basic phrases in the application, and it seemed that French pronunciation was closer than any other. i had my framework…my pre-invented wheel.
Next…to add the letters that aren’t in French…mostly fun vowels. It turned out single vowels were easy, but double vowels produced unpredictable results. Ok..so I learned to add exceptions.
At this point, a young Ewondo woman came into my office for tech support. After I finished fixing her problem, and as she was packing up to leave..I quickly typed in and played a greeting in Ewondo from my computer, the equivalent to “Good day, how are you?” She responded without thinking with the correct response. If I ever needed confirmation that I was on the right track, that was it.
Challenge 2: Tone
Ewondo uses tone to express meaning. Think of how you would pronounce “Coming.” or “Coming?”. The way your voice rises on the question, that’s tone. Ewondo, and most of the languages of this region, use tone to determine the meaning of the word…so your two pronunciations of “coming” could have 2 completely different word meanings.
Sometimes linguists choose to mark tones as accents over the letters, but most see that simplicity decides which languages that are easily learn and write. Ewondo marks most of their tones, and the ones not marked are predictable. Bassaa does the same, but has many more complex possibilities. Badw’ee only marks tone when distinguishing between two different words with the same spelling (homographs). Bassaa and Ewondo are easy to “translate” into speech because most of the important information is written in the words. Badw’ee would need a large dictionary giving the “true” pronunciation of each word.
Anyways, I had no idea how to represent tone in this application, so I got in touch with the developer. He gave me some cryptic hints that eventually led to a solution. Now, the computer could pronounce tone (frankly better than myself) without creating confusion.
Challenge 3: Special cases
Even though Cameroonian languages are quite consistent, there are always words that require a special pronunciation that you wouldn’t expect from reading it. Maybe they came from a neighboring language, French, or English. Maybe two words are pronounced as one. Cleaning up the pronunciation will be a long process of putting in texts and “fiddling” when it makes mistakes. After some fiddling, I moved on to the PowerPoint at the last moment..finishing at about 4am Saturday morning.
Challenge 4: Fine tuning
I stood on the shoulders of giants to create this. My version was a Frankenstein of several languages. I pulled the vowel from “why” to pronounce one sound, and I used the French I for both I and e (in certain situations). These are rough equivalents and great for a mock-up, but not complete work. A trained linguist (which I am not yet) could do recordings and analysis to create vowels specially for Ewondo that match exactly the tone and timbre. I counted over 30 numerical variables to define a single vowel.
A few short hours later, I was on my way to Goethe with some colleagues and making final adjustments to my presentation.
I talked about the technology and the tools, all in French and on 4 hours of sleep. I knew about the orthography of most of the languages in the room, and could point out some of the challenges that each might have.
I had decided that the best example of the technology would be to create a male and female voice, and create an Ewondo conversation. Mouths dropped around the room…the computer was speaking one of their languages. I had noticed in town that it was easier to understand an Ewondo woman than an Ewondo man, and they seemed to agree. So we stuck with the female voice for working. Obviously it had the accent of an american cow, but it was close enough that they could understand and follow it.
Click Ewondo Conversation to listen to it.
They all had questions…wondering if we could make it speak Ghomala, Kwasio, Bassaa. They also wondered if voice recognition (I talk, it types) could work. It’s all a possibility, but consider how many millions of work hours have gone into English recognition, and how flaky it is.
The Ewondo team was most excited, and took my code to immediately start improving it. They started adding pronunciation rules I hadn’t even imagined, and even jumped over what I thought were limitations in the program. The copy I have now from them is much better..the obvious benefit of having a mother-tongue linguist and programmer in tandem.
We had some issues with the Windows version, but Kwasio started a draft voice as well.
Another linguist, from the Nuasue language, couldn’t come to the presentation, but was quite interested in the practical applications of this technology, and wanted it to speak his language. Kibassa had the vision that he could use it for reading to children and the illiterate, teaching the language, and even checking the naturalness of a text. Yesterday, Kibassa and I sat down to work out Nuasue. (Actually the word “nuasue” means “ours” in Nuasue. Imagine the conversation where a traveling linguist asks which language a man is speaking, and he replies in his language “ours, of course”.)
So we started with the Ewondo framework…and had to rework almost everything. The vowels and consonants are often the same, but sound nothing alike. The tones were too strong and jumpy for Nuasue, so we smoothed them out. Kibassa found that the male voice was more understandable for Nuasue…interesting. We still have more work to make the voice natural, we think it’s breaking syllables in the wrong places, but it’s getting there. We impressed his fellow linguist with spoken Biblical text when he returned. Kibassa was right about the naturalness check. The spoken voice gave a surprising pronunciation of one word, and he realized that he had misspelled it in his translation.