Saturday, 2 March 2013

Machine learning endeavour- Mid term review

Mid term review
It’s been a few months since I posted my machine learning goals, and this seems an appropriate time to review what’s been done so far and what’s left to go.
I had started out with a goal of 5 ML related courses, and I am on track to do a little more than I had planned. These are the courses that I've now targeted for completion (all @ Coursera)
Course name
Probabilistic Graphical Models
June 2012
Computing for Data Analysis
Dec 2012
Machine Learning
Dec 2012
Neural Networks for machine learning
Dec 2012
Data Analysis
in progress
Linear and Discrete Optimization
in progress
Computational Methods for Data Analysis
in progress
non-certificate course
Natural Language Processing
in progress
I also took a large part of the “Design and Analysis of Algorithms I” course which I could not complete due to overlapping course schedules.
My take on each of these courses is in another blog post.  
I had another goal of entering 4 kaggle competitions and finishing in the top 50%, which appears to be a little more difficult to achieve. I entered more than 2 competitions, and with my best placing being 55/940 in one (digit recognizer training) competition, with no significant placings in others.
My perspective is that competitions are probably not the best place to learn skills for a few reasons.
One, competition data usually needs quite a bit of pre-processing and exploratory analysis to figure out the approach.
Secondly, most competition data is not small enough to fit in a typical PC’s memory, and thus will require some big data and/or parallelization approaches such as Apache Mahout/CUDA and the knowledge of the accompanying toolset. Thus a significant part of your time would be spent in learning a toolset which is useful, but orthogonal to the goal of getting feedback on your machine learning approach.
Third, to get high on the leaderboard for a competition usually requires ensemble methods, or in other words, trying out several models and then using model averaging.

However, Kaggle competitions are a great way to hone your skills once you have a good grasp of the toolset that you plan to use, and the ups and downs on the leaderboard are quite a bit more exciting than assignments in a course. And its certainly possible to get decent results with simpler methods such as regression.
Hopefully my next ‘status update’ would see me close to completion. :)

No comments: