Positive Trail: Machine learning endeavour- Mid term review

Mid term review

It’s been a few months since I posted my machine learning goals, and this seems an appropriate time to review what’s been done so far and what’s left to go.

I had started out with a goal of 5 ML related courses, and I am on track to do a little more than I had planned. These are the courses that I've now targeted for completion (all @ Coursera)

No	Course name	Completed	Certificate
1	Probabilistic Graphical Models	June 2012	link
2	Computing for Data Analysis	Dec 2012	link
3	Machine Learning	Dec 2012	link
4	Neural Networks for machine learning	Dec 2012	link
5	Data Analysis	in progress
6	Linear and Discrete Optimization	in progress
7	Computational Methods for Data Analysis	in progress	non-certificate course
8	Natural Language Processing	in progress

I also took a large part of the “Design and Analysis of Algorithms I” course which I could not complete due to overlapping course schedules.

My take on each of these courses is in another blog post.

I had another goal of entering 4 kaggle competitions and finishing in the top 50%, which appears to be a little more difficult to achieve. I entered more than 2 competitions, and with my best placing being 55/940 in one (digit recognizer training) competition, with no significant placings in others.

My perspective is that competitions are probably not the best place to learn skills for a few reasons.

One, competition data usually needs quite a bit of pre-processing and exploratory analysis to figure out the approach.

Secondly, most competition data is not small enough to fit in a typical PC’s memory, and thus will require some big data and/or parallelization approaches such as Apache Mahout/CUDA and the knowledge of the accompanying toolset. Thus a significant part of your time would be spent in learning a toolset which is useful, but orthogonal to the goal of getting feedback on your machine learning approach.

Third, to get high on the leaderboard for a competition usually requires ensemble methods, or in other words, trying out several models and then using model averaging.

However, Kaggle competitions are a great way to hone your skills once you have a good grasp of the toolset that you plan to use, and the ups and downs on the leaderboard are quite a bit more exciting than assignments in a course. And its certainly possible to get decent results with simpler methods such as regression.

Hopefully my next ‘status update’ would see me close to completion. :)

Positive Trail

Saturday, 2 March 2013

Machine learning endeavour- Mid term review

No comments: