Saturday, 23 February 2013

Kaggle digit recognizer using SVM

Originally posted on Posterous(Oct 7 2012) which is shutting down
This is the second post in the series, solving Kaggle digit recognition with the algorithms taught in the Coursera ML course.
For the second try, Support Vector Machine was the algorithm used. SVMs were invented (or discovered) in 1995 and quickly became the tool of choice for both regression and classification. Although multi-class classification in SVM is not as straightforward as in logistic regression, the prominent tool kits (LIBSVM was the one I used) have support for it.
I first did some pre-processing to scale the digit image values from 0-255 to 0 to 1, which is recommended by SVM. Some clojure snippets that do the same are attached.
Loading ....

I then used libsvm to first find the optimum values of the constants (C & Gamma) used in SVM. The only difficulty I encountered was that when using the
"-v" option, libsvm would not generate the model file. I had to resort to using the python interface to save the model file and then run svm-predict to get the predictions.
Note that the test file also needs to be scaled to the same values (from 0-255 to 0-1), and the test file needs to have a dummy value of 'y' (which should be between 0 and 9).
With vanilla SVM the kaggle scores were at 97% :)

No comments: