Learning a new language is hard work. Especially if it is Chinese, which is a tonal language that is written using more than 10.000 different characters. Finding our way around in this linguistic labyrinth is a daunting task. But do not fear, for we have the power of Python at our side, and with its help we will machine-learn Chinese!
Chinese is a tonal language, which means that pronouncing a syllable differently will usually change its meaning. And while this can be very funny, it can also be rather embarrassing for language learners. So, to keep us from getting into linguistic trouble, we’ll write a little Python tool that helps us to improve our pronunciation.
Reading the morning newspaper while having a nice cup of tea doesn’t sound so complicated, does it? Well, if that newspaper is printed in Chinese we will have to know about 2.500 characters just to make it through the first pages. Again, machine-learning will come to our rescue!
Pronunciation and characters mastered, we’ll still have to learn a large amount of words and phrases, so where to begin? To answer this, we’ll make use of Bayesian techniques to identify the low-hanging fruits of the Chinese language.
Congratulations, you should now be fluent in Chinese (or at least Machine-Learning).
Very nice and appealing proposal, congrats!
Nevertheless, I think it would be helpful to better clarify the contents and the expectations for this training directly in the abstract rather than briefly sketching them in the notes for reviewers.
If you would do so, you will get in return all of my attention (and votes) !-)
Thanks a lot for your consideration.
Cheers
to clarify the content:
This tutorial is about using machine learning in language learning. We will explore various methods of machine learning and data analysis to help us get a better understanding of the Chinese language.
I hope to help participants to learn more about the following things:
* Using Python to work with unstructured data (speech, text, images)
* Extract meaningful features from this data for analysis
* Use various data analysis techniques to extract meaning from the data
* Visualize the results using e.g. Matplotlib
In the tutorial we will use the "canonical" Python data analysis stack:
* IPython notebook
* numpy, scipy & (possibly) pandas
* scikit-learn
* matplotlib
In addition, we are going to use a variety of libraries to help us to retrieve and process different types of input data.
I will provide a repository with the code and data for the tutorial, and a Virtualbox image with all the necessary tools preinstalled (for people that don't want to set up everything themselves).