Last night I was attempting to learn a little bit of Chinese after being inspired from a series of documentaries and travel vlogs (ChopStick Travel) that I’ve been taking in over the last 3-4 weeks.
Anyway, I opened up DuoLingo and was quickly disappointed. Then I tried Memrise, which was slightly better, but it suffered from the same issues.
Most critically, these apps fail because they have no variability. Every time you hear the recording of “nĭ hăo” it is the exact same one. Every time you see the script written “你好” it uses the same font.
Deep learning is a mathematical approximation of how the neuron’s in our brains work. At least when you compare to very simple brains like those in worms, current AI tech is a reasonably good simulation. I think it’s reasonable to assume that we can gain insight into the human brain from things we learn developing AIs.
When we build these simulated neural networks how they are trained is critically important to how they perform. For instance: training a computer to understand spoken language requires an immense number of audio samples (Mozilla’s open data set is currently at 12GB of compressed audio). When training an AI to do visual character recognition it takes thousands of samples to get a good working model. Elon Musk has said they need 1 billion miles of recorded driving to get a reasonable self driving model. AIs need a LOT of data to train on.
Granted the human brain is vastly better at learning than current Deep Learning AI algorithms, but we are not so much better that we can learn new words with a single sample. Hearing a word said in one way, recorded on a sound stage will do very little to help recognition when you hear it in person outside at a loud food market, or with a different accent, even a cadence change might throw off your comprehension.
If you came to an AI expert to get a voice recognition system trained and you had spent $50,000 at a sound studio getting one perfect example of each word to use as your training data. You’d be laughed at for wasting so much money on something so useless. It is preposterous to train AI on single sample sizes, but that seems to be what we expect to happen with people.
I wish there was a language learning platform that took the approach of compiling a minimum of 100 samples of each word (at least at the beginning levels) ideally taken in-context with video so you can see the facial movements and pick up on body language. It could initially seed the library of data with clips from YouTube or movies. Over time it could ask users to contribute snapchat style short clips of them saying words/phrases in their native language to help others on the platform. A particularly memorable clip might be the one that cements the word in your memory, and hearing all the variations will help train your ear to recognise the word. It could use image recognition AI training sets avoid learning a mental translation process that slows down fluency.
Using the same kind of data we use to train a Deep Learning algorithm but use it to train humans would be super interesting. It might unlock insights in psychology, education and computer science.
As an endeavour it would be awesome to have 10 new people try the app every month and do a 1 hour session on-site with a new language. Then see how much they get through a conversation with a native speaker afterwards. Optimise the app over time for real improvement with comprehension and perhaps conversation. Language learning is an ideal case for testing the limits of how quickly humans can learn something new.
Perhaps a billion dollar idea if only I had the capacity to pursue it. (the amount of user contributed data could be a gold-mine if it worked)