It’s been a few months since I posted my machine learning goals, and this seems an appropriate time to review what’s been done so far and what’s left to go.
I had started out with a goal of 5 ML related courses, and I am on track to do a little more than I had planned. These are the courses that I've now targeted for completion (all @ Coursera)
No
|
Course name
|
Completed
|
Certificate
|
1
|
Probabilistic Graphical Models
|
June 2012
| |
2
|
Computing for Data Analysis
|
Dec 2012
| |
3
|
Machine Learning
|
Dec 2012
| |
4
|
Neural Networks for machine learning
|
Dec 2012
| |
5
|
Data Analysis
|
in progress
| |
6
|
Linear and Discrete Optimization
|
in progress
| |
7
|
Computational Methods for Data Analysis
|
in progress
|
non-certificate course
|
8
|
Natural Language Processing
|
in progress
|
I also took a large part of the “Design and Analysis of Algorithms I” course which I could not complete due to overlapping course schedules.
My take on each of these courses is in another blog post.
I had another goal of entering 4 kaggle competitions and finishing in the top 50%, which appears to be a little more difficult to achieve. I entered more than 2 competitions, and with my best placing being 55/940 in one (digit recognizer training) competition, with no significant placings in others.
My perspective is that competitions are probably not the best place to learn skills for a few reasons.
One, competition data usually needs quite a bit of pre-processing and exploratory analysis to figure out the approach.
Secondly, most competition data is not small enough to fit in a typical PC’s memory, and thus will require some big data and/or parallelization approaches such as Apache Mahout/CUDA and the knowledge of the accompanying toolset. Thus a significant part of your time would be spent in learning a toolset which is useful, but orthogonal to the goal of getting feedback on your machine learning approach.
Third, to get high on the leaderboard for a competition usually requires ensemble methods, or in other words, trying out several models and then using model averaging.
However, Kaggle competitions are a great way to hone your skills once you have a good grasp of the toolset that you plan to use, and the ups and downs on the leaderboard are quite a bit more exciting than assignments in a course. And its certainly possible to get decent results with simpler methods such as regression.
Hopefully my next ‘status update’ would see me close to completion. :)
No comments:
Post a Comment