Monday, 8 July 2013

My attempt at the Microsoft Author-Paper Identification Challenge at Kaggle


Introduction


The Microsoft Author-Paper Identification challenge is a machine learning competition hosted at Kaggle.com, and this post documents my attempts at this problem.

The Data


The data consists of a training and validation set that were released early in the competition. The training data consisted of the following comma separated files:

  • Paper.csv, where each record described data about a Paper, such as its id, publication year, journal or conference.
  • Author.csv, where each record described an Author, his Affiliation, etc
  • Journal.csv and Conference.csv, which described each journal or conference
  • PaperAuthor.csv, a noisy dataset that listed Authors and Papers ascribed to them.
The training file consisted of an AuthorId, followed by a list of PaperIds that were correctly, and incorrectly assigned to that author. The submission format was that each line should contain an author id, followed by a list of paper ids, in decreasing order of probability that the paper was authored by the given author.
For e.g. Author1, P1 P2 P3
Where the probability that Author1 wrote P1 is highest, and P3 being the lowest.

Analysis


Although the evaluation metric used was Mean Average Precision, this problem is a classification problem. Given an Author id-Paper id tuple, predict the probability that the Paper was authored by the said author. 

Provided along with the data was the code for a simple model that generated a 5 features, and then used a RandomForest classifier to classify each tuple.

Data Cleaning


I wasn't too comfortable with using sql to extract features, hence I loaded the csv files into mongodb using mongo import. The data cleaning stage took a while, I changed the contents to lowercase, removed newlines within some dataset entries and removed English stop words.

Approach:


The initial 5 features provided looked at the commonsense features such as number of coauthors who published in the same journal and so on. While treating this as a classification problem, the out-of-bag error rate (using RandomForests) was already around 95%.
I noticed at this point that to improve scores, I would have to start using features that depended on text fields such as AuthorName, PaperName, Affiliation (e.g. University).

In some cases the Author name was just marked wrong. For e.g. if 'Andrew B' was marked as the Author name for 99 papers and 'Kerning A' was the Author name for the 100th paper, this was obviously a mistake. I calculated the 'Jaccard distance' on the following set of features:

  • Author names: Intersection of names in the author set with the current author name.
  • Co author names: Intersection of coauthor names similar with current co-author.
  • Affiliation: Intersection of this paper's affiliation with the current author's affiliations.
  • Title: similarly computed the Jaccard similarity.
Adding all of these features improved my validation set submission score to about 95%. Despite toying around, I didn't make much improvements on the score after this.

I also tried to use a Naive Bayes classifier to classify the 'bag of title words', into 2 classes, 'current author' or 'other'. Unfortunately the projected time it would take seemed like 4 continuous days of CPU time, and with Bangalore's power failures, it was unlikely it would ever finish, so I abandoned the approach.

My last score improvement came unexpectedly, I noticed that several author-paper entries that were 'correctly marked', had multiple entries in the dataset. I was surprised at this rather obvious mistake (or goofup, however you want to look at it), but adding this feature (number of paper-author counts) improved the classification rate by a whole percent. 



Learning and Notes:



  • This competition required creating hand-coded features from the multiple files that comprised of the dataset. The ability to quickly try out a feature and see if it improved scores was critical. I spent a lot of time on implementing a Naive Bayes classifier from scratch, without testing out the classification accuracy on a small part of the dataset, and finally had to abandon this feature due to lack of CPU hours. 
  • This was probably one competition where Hadoop/MapReduce skills wouldn't hurt, since the datasets were large enough that parallelization would have helped. 
  • The time required for training and prediction using RandomForests was really small, probably of the order of 10 minutes or less. This was very different from other competitions (where I used deep neural nets, for e.g.) where I would have to watch plots of errors of training and validation sets for hours on end. 
  • Observing quirks of the data is important, as the duplicate paper-author ids demonstrated. 
  • I didn't visualize or graph anything in this competition, and that's something I should improve upon, although it wasn't obvious what it might have been useful for, in this case. 


This was the first competition where I got into the top 25%, and my overall Kaggle ranking was inside the top 1%.
   

Wednesday, 12 June 2013

Machine learning endeavour: At the finish line



So I'm finally at a point where I can say I'm done with my Machine Learning endeavour that I started about 19 months ago. Of course, learning is a lifelong process, but getting to the finish line of what seemed to be a far-fetched goal sometime back, feels incredibly sweet.

Some quick statistics:
  • Number of courses completed: 6
  • Number of courses partially completed: 3
  • Number of programming assignments: 32
    • Natural Language Processing: 3
    • Machine Learning: 8
    • Probabilistic Graphical models :9
    • Neural Networks for Machine Learning: 4
    • Computing for Data Analysis: 2
    • Data Analysis: 2
    • Design and Analysis of Algorithms: 4
  • Number of Kaggle competitions entered: 4
    • Notable rankings:
      • 72/1158 in the Digit Recognition competition. 
      • 125/557 in the KDD Cup 2013-Author Paper Identification Challenge
      • Ranked among the top 2% of all Kagglers. 

This is the list of courses that I completed:

No
Course name
Completed
1
Probabilistic Graphical Models: Advanced track with distinction
June 2012
2
Dec 2012
3
Dec 2012
4
Dec 2012
5
Mar 2013
6
Natural Language Processing with distinction
May 2013


Thoughts on getting here? In no particular order:


  • Learning is incredibly fun. And self-motivated learning is much more fun than college. Motivation in college education is usually built around too much stick and too little carrot. 
  • Although online learning might seem very disconnected, the online student community is tight-knit and extremely helpful. I would not have completed certain assignments in PGM and NLP if not for other students who posted helpful tips. 
  • Support from family and friends matter a lot. 
  • Finishing a course is just one part of the story. Applying the learning in a real-world context is a very different challenge.
Blogs/books that I found helpful:
Other self-learners:



Sunday, 14 April 2013

Data Analysis @ Coursera - a review


Here is my review of the Data Analysis course at Coursera which I recently completed.

There are several "data X" and "big data Y" kind of courses nowadays, and its quite difficult to know up front if the course you signed up for is the course you need. I'll try to outline what this particular course is, and what you can expect from it.

First off, this is a Data Analysis course in R. Knowing R is a prerequisite and if you come to this course without any knowledge of R expecting to pick up the basics along the way, it will be quite challenging. Completing Prof. Roger Peng's R course is the ideal way to ease into the material for this course.

This course teaches you statistics while trying to make sense of the data. There are innumerable data sets that are explained, and playing with data is the ideal tool to practically understand those statistics concepts. The initial part of the course tends to repeat R's strengths in graphing and data cleaning and munging. The initial part of the course goes at a relaxed pace, and somewhere in the middle the speed picks up and the last few weeks become quite hectic. The second part of the course is pretty much a machine learning course, with clustering and classification algorithms being explained in quick succession. If you have been through Prof. Andrew Ng's machine learning course, the difference here is that very little mathematics involved. The classification methods used (such as random forests, which I had never used before) are explained from a "how to use it" point of view and the math basics are not covered.

Since it does not try to get into the mathematical basis of every method, it covers much more ground, such as ensemble learning and ways to do model averaging. Although the knowledge of math is certainly useful, this course showed that it is possible to do predictive modelling quite effectively simply by knowing the methods and learning how to apply them. It is therefore a practitioner's course in Data Analysis.

The difference between this course and the Machine learning course, is that this one is much more exploratory. Often in machine learning problems, the goal is often to just get to the lowest mis-classification rate on the validation/test set. Here, the emphasis is much more on interpreting and explaining the data the data (usually graphically) and understanding how a few features (especially if the dataset has relatively many dimensions) are responsible for most of the variance. I often struggled with this, because in a classification problem, it was relatively easy to do dimensionality reduction (using Principal Component Analysis) and then use multiple classifiers such as SVM/Neural Nets/Random forests, while it was relatively hard to explain feature variance on data that has been processed through PCA.

The assignments in this course also reflect its exploratory nature and are peer assessed using a rubric. Both assignments requires one to write a data analysis explaining the motivations, the methods to clean the data, methods used to classify or cluster, the statistical tests used and so on. So sticking to what the rubric demands is quite important, and straying from it, even though your write-up is excellent, leads to lower scores.

All said and done, this is an excellent course to improve your knowledge of data analysis, statistics and machine learning.

Here are my suggestions to make this course even better:

  • More assignments. The assignments are quite big, and they take a bit of time. I would prefer shorter assignment that gives one the opportunity to play with more data sets.
  • The course probably tries to cover too much material. I was quite confused by the various tests for statistical significance which were explained in bits and pieces in various parts of the course, and only later did I develop some understanding of what should be used in which case.  If this course included more material, it can be certainly split into a basic and advanced data analysis course. 
  • I would love to have this course offered in a language agnostic way. For simple one and two liners, R is bearable. Writing code longer than that makes one itchy for a 'real' language (I'm working my way through Data analysis in Clojure, hopefully with the knowledge gained there I could finally say goodbye to R). 
  • The course should probably include some material on handling analysis in big data. R holds its data  largely in-memory, and datasets that push this limit will make any analysis difficult. I think Clojure/Incanter is a good combination of an excellent language married to a robust toolkit, but I'm yet to run classification on large data sets using that toolkit, so that calls for more experimentation (and a blog post too).

Wednesday, 13 March 2013

Kaggle event recommendation competition


 


This post documents my attempts on the Kaggle event recommendation competition.

This competition was the first non-training competition that I entered, and it was a good learning experience. The competition's goal was to classify a set of events that are presented to a user, sorted by how likely that particular user will click "Yes" to attending that event. 

The data consists of several 'parts', such as:
1.      A given user's recommendation of 'yes', or 'no' for an event.
2.      Some data about the user such as location, gender, age etc
3.      Some data about the event, such as the time and place.
4.      A list of friends for a set of users
5.      A list of attendees for a set of events.
My initial approach was to treat this as a classification problem, when given a [user,event] tuple, to assign a score of 1 if the user is likely to click on Yes for that event. I started out by hand-coding features such as
1.      percentage of friends who marked yes, no, maybe to that event.
2.      Parsing the location fields of the event and user to determine city, state and country and a set of binary features for the user being in the country, state and city as the event.
3.      The event popularity.
4.      Time based features, such as whether a user is old enough to visit an event, and the difference in hours between the user seeing the event, and the event starting.

Approaches


I initially used the SVM classifier from libSVM but I ran into a couple of issues with libsvm. One, the scaling feature of libsvm didn't work very well (or rather the results were worse than when I scaled them outside libsvm), and second, a lot of times libsvm would predict the same class forall the tuples in the test set.
I then used a logistic regression classifier with a tiny (0.005) regularization value, which gave a classification rate of about 74%. The submission to kaggle resulted in a 47% success rate, which was only better than random values, but worse than the result sorted by popularity of the event.



At first sight, this competition resembled a typical collaborative filtering(CF) problem where each user gives a 'rating' to an event of 1 or 0. The only catch was, that in addition to the user-event matrix, there are additional features for both users and event that could not be directly used in CF solvers. 

My second approach was to solve the CF problem. I had a choice of using Restricted Boltzmann Machines (described quite nicely in the Neural networks course @ Coursera), or using the normal matrix factorization approach. I used the excellent Graphlib package which has a toolkit for solving CF problems. The input is the user-event sparse matrix with 1's where ever a user marked yes for an event. I used the ALS solver (which tackles the problem of finding low rank approximation of the recommendation matrix) in the CF toolkit, which gave an error of 7% after some experimentation with step sizes. My submission with the results from the ALS solver again gave a similar score of 45% on the kaggle leader board.

I had then given up on the comp, when I got an idea of averaging the 2 models used. On averaging the score from both the models with equal weights, the score on the leader board exceeded the popularity measure by a small bit, about 51%. 

What I learnt from this competition


Apart from the scoring and algorithms bit, I realized that I needed a whole new set of skills for such data analysis. The most important was the tools to clean and sort data. Since I had some experience with Clojure, I decided to put it to test. Initially I tried to load the data in-memory, which as expected, resulted in OutofMemoryErrors despite having 16G of RAM. Fortunately, I discovered Incanter, which gives Clojure capabilities similar to R. (As an aside, if you're looking at playing with R, do check out the Data Analysis course at Coursera by Jeff Leek, the capabilities of R described in the course are a revelation for the R newbie). Incanter too had similar memory issues, but thanks to its excellent integration with mongodb, I was able to load the datasets into mongodb and dissect the data as required. 

The second facet which was new to me was the dirty-ness and variation among data. There were multiple sets of users, about 3k+ users in test and train datasets, about 38k users who had features such as age and gender described, and a much large set of users (about 1 million+ if I recall correctly) who were friends of the earlier set of users, but had nothing else known about them. It was a similar situation with the events, where a small set were well described and a large set had no features associated. 

The third notable learning was the size of the data. While the dataset would not qualify as 'big data', while attempting to run Principal Component Analysis on the user-friend matrix, I ran into memory issues with Octave even on sparse matrices. This matrix had about 38k rows (only the well described users), and had about 210k entries. I was unable to run "svds" on this sparse matrix as I was unable to scale the features of the covariance matrix to have a zero mean and standard deviation of 1. Perhaps using Graphlab to get the matrix  decomposition would have been a better idea.

At the end of the day, these submissions ranked in the middle of the leaderboard. Hoping to do better next time.

Here's my Clojure and Incanter sources on Github.

Edit: Here's another take on the same competition.

Saturday, 2 March 2013

Reviews of some Coursera courses that I completed

A long overdue post on some Coursera courses that I completed.

Probabilistic Graphical Models by Prof Koller


Probabilistic Graphical Models is probably the most difficult course that I did. I had ventured into this course with a fair background of linear algebra and probability, but I still found this course extremely demanding.

Here are some reasons why:
     the course material is compressed and the instructor (Prof. Koller) moves through the material rather quickly.
     This course has a few prerequisites that aren’t mentioned as, well, prerequisites: basic machine learning, felicity with Octave/Matlab, the ability to convert mathematical notation to working Matlab code.
     I found the programming assignments to be quite tough, or put differently, more difficult than any other Coursera course. If not for the generous help of other students in the forum, (a view echoed by many other students), it would be impossible to get through the assignments. In my case, I had to sometimes grind through the assignment to finish it in time, sometimes missing the intuition behind a method that was used in the assignment.
     The time estimate of about 15 hours per week (spent on the course) is quite low in my opinion, it would be closer to 20/25 hours per week.

However, this is still a must-do course for students of ML. The material, as well as the instructor are excellent, and the programming exercises are beautifully designed, in that they give you a real world problem to solve in almost all the assignments which leads to much better insights. Having completed the course, I keep going back to the assignments and course material to refresh my memory every once in a few weeks.

Course difficulty (on  a scale of 5): 5/5
Time required per week: 20 hours.
Prerequisites: Probability, Octave, Basic ML
Fun quotient: 4/5



Machine Learning by Prof. Andrew Ng


One cannot ask for a better introduction to Machine learning. This course has a gentle pace, no prerequisites, and gives a 10000 ft view of the many aspects of Machine learning in general.
Prof Ng is a very gentle instructor, and he tries to break down the complicated math and explain it in a very understanding manner.
The assignments are probably too easy as they just ask you to fill up a few lines in Octave files, but that does not take away the usefulness of the course. It shouldn’t be hard to complete this course in parallel with a couple of others

Course difficulty (on  a scale of 5): 2.5/5
Time required per week: 6 hours.
Prerequisites: none
Fun quotient: 4/5



Neural Networks for Machine Learning by Prof. Hinton


Prof. Hinton is one of the leading lights of Neural Networks, an area of ML research that had been relegated to the sidelines in the 80s and 90s but is now in the limelight thanks to recent advances in the field.

The basic ML course does dip its toes in the neural networks pool, but this course (but naturally) goes much deeper. The material as well as the instructor are excellent, and the course lectures are punctuated by Prof. Hinton’s deadpan humour. This course requires a little mathematical background (such as a calculating derivatives for backpropagation) but the math is explained quite nicely.
My only suggestion in this course would be to have more programming assignments (there were only 4 in the first edition), however, the quizzes do require some programming to answer questions.

Course difficulty (on  a scale of 5): 3.5/5
Time required per week: 10 hours.
Prerequisites: Bits of calculus, octave,
Fun quotient: 5/5







Machine learning endeavour- Mid term review

Mid term review
It’s been a few months since I posted my machine learning goals, and this seems an appropriate time to review what’s been done so far and what’s left to go.
I had started out with a goal of 5 ML related courses, and I am on track to do a little more than I had planned. These are the courses that I've now targeted for completion (all @ Coursera)
No
Course name
Completed
Certificate
1
Probabilistic Graphical Models
June 2012
2
Computing for Data Analysis
Dec 2012
3
Machine Learning
Dec 2012
4
Neural Networks for machine learning
Dec 2012
5
Data Analysis
in progress
6
Linear and Discrete Optimization
in progress
7
Computational Methods for Data Analysis
in progress
non-certificate course
8
Natural Language Processing
in progress
I also took a large part of the “Design and Analysis of Algorithms I” course which I could not complete due to overlapping course schedules.
My take on each of these courses is in another blog post.  
I had another goal of entering 4 kaggle competitions and finishing in the top 50%, which appears to be a little more difficult to achieve. I entered more than 2 competitions, and with my best placing being 55/940 in one (digit recognizer training) competition, with no significant placings in others.
My perspective is that competitions are probably not the best place to learn skills for a few reasons.
One, competition data usually needs quite a bit of pre-processing and exploratory analysis to figure out the approach.
Secondly, most competition data is not small enough to fit in a typical PC’s memory, and thus will require some big data and/or parallelization approaches such as Apache Mahout/CUDA and the knowledge of the accompanying toolset. Thus a significant part of your time would be spent in learning a toolset which is useful, but orthogonal to the goal of getting feedback on your machine learning approach.
Third, to get high on the leaderboard for a competition usually requires ensemble methods, or in other words, trying out several models and then using model averaging.

However, Kaggle competitions are a great way to hone your skills once you have a good grasp of the toolset that you plan to use, and the ups and downs on the leaderboard are quite a bit more exciting than assignments in a course. And its certainly possible to get decent results with simpler methods such as regression.
Hopefully my next ‘status update’ would see me close to completion. :)

Saturday, 23 February 2013

Kaggle digit recognizer using SVM







Originally posted on Posterous(Oct 7 2012) which is shutting down
This is the second post in the series, solving Kaggle digit recognition with the algorithms taught in the Coursera ML course.
For the second try, Support Vector Machine was the algorithm used. SVMs were invented (or discovered) in 1995 and quickly became the tool of choice for both regression and classification. Although multi-class classification in SVM is not as straightforward as in logistic regression, the prominent tool kits (LIBSVM was the one I used) have support for it.
I first did some pre-processing to scale the digit image values from 0-255 to 0 to 1, which is recommended by SVM. Some clojure snippets that do the same are attached.
Loading ....


I then used libsvm to first find the optimum values of the constants (C & Gamma) used in SVM. The only difficulty I encountered was that when using the
"-v" option, libsvm would not generate the model file. I had to resort to using the python interface to save the model file and then run svm-predict to get the predictions.
Note that the test file also needs to be scaled to the same values (from 0-255 to 0-1), and the test file needs to have a dummy value of 'y' (which should be between 0 and 9).
With vanilla SVM the kaggle scores were at 97% :)

Machine learning- using Kaggle Digit Recognizer as a test bed for ML algorithms






Originally posted on Posterous (Sept 24 2012), which is shutting down

The Machine Learning course by Prof. Andrew Ng is an excellent introduction to machine learning(ML henceforth). While doing the course, I came across Kaggle's handwriting digit recognition problem. The digit recognition problem is a Kaggle 'non-competition' that works as a test bed for ML.
What follows is a series of blog posts that describes how the algorithms taught in the Coursera ML course can be applied to the digit recognition problem.
One of the algorithms taught in the ML course is logistic regression. Logistic regression is an algorithm used in classification problems, such as classifying email as spam and non-spam. Digit recognition is also a classification problem. While classifying email as spam and non-spam requires 2 classes, digit recognition requires an image of a digit to be classified as any single digit numbers from  0 to 9, or into 10 classes. 
The available data:
Kaggle provides 28000 images of training data in a csv file. Each image is represented by a 28x28 pixel matrix. For logistic regression, this can be thought of as a feature vector with 784 features. The value of lambda (a parameter to prevent overfitting) has been fixed at 0.1. 
The algorithm:
The algorithm learns the parameters (784 plus one) using the data on the training set, and then computes the class of the digit (or rather, makes a prediction on the digit) for each image in the test set. Since multi-class logistic regression is used, the algorithm will output 10 scores per input image, one for each digit. The digit with the highest score is taken to be output.   
The output:
The classification accuracy using multi-class regularized logistic regression came to approximately 91%. Kaggle mentions that only a part of the test set is evaluated, so when the entire test set is evaluated, the score could be better or worse.
Next steps:
  • Improve the value of the lambda parameter to get a better fit
  • Use a different algorithm such as SVM or Artificial Neural Networks
The octave source code can be found in the Github repo.

Hacking my education- Machine learning Endeavour



Originally posted on posterous (Oct 9 2012) which is shutting down.
I have had an interest in machine learning since a few years. In spite of the numerous resources available online, I had never really gotten to a system of organized study due to various reasons.
With the advent of Coursera and other MOOCs, it has become relatively easy to gain knowledge in a particular field of study, with all the advantages of a classroom-like environment (great teachers, a well-rounded and structured syllabus and motivated fellow students) without the attendant disadvantages (cost, distance and time).
When I started on this endeavour in Dec 2011, I did not think I would get this far. Having left college 12 years ago, it was quite an effort to get starting studying ML, and I was told that my getting into machine learning would not be easy because of the math prerequisites. However, close to 10 months down the line, I realize that nothing is impossible.
So this is my attempt to “enlist the goals” in my machine learning endeavour and chart my progress so far.  I’ve checked the syllabi of a few master’s courses in Machine Learning, and here are some of the courses that are required to be taken.  
I plan to finish at least 5 courses related to ML before May 31st 2013, from the ones in this list:
  1. Probabilistic  Graphical Models @ Coursera
  2. Machine Learning @ Coursera
  3. Neural Networks @ Coursera
  4. Computing for Data Analysis (a course on the R language) @ Coursera
  5. Image and Video processing @ Coursera
  6. Natural Language processing @ Coursera
  7. Optimization @ Coursera
  8. Statistics 101 @ Udacity.
Real world machine learning is more than just taking courses, it should involve working on some projects. Since I’m mostly self studying, it would be difficult to find a proctored project to work on. However, there’s Kaggle(a site that hosts Machine learning competitions)  to the rescue. My goal is to compete and finish at least 4 competitions in the top 50% of the field.
My progress so far:
  1. A course in Linear Algebra @ GuruPrevails-completed Jan 2012
  2. Probability & Statistics @ GuruPrevails-completed Mar 2012.
  3. Probabilistic  Graphical Models @ Coursera – completed successfully in June 2012
  4. Machine Learning @ Coursera – in progress, ends Oct 2012.
  5. Neural Networks @ Coursera – in progress, ends Dec 2012
  6. Computing for Data Analysis (a course on the R language) @ Coursera – in progress, ends Oct 2012.
  7. Algorithms – Design and Analysis @ Coursera – didn’t complete beyond 4thassignment.
  8. Kaggle digit recognizer “learning” competition: currently 23rd/400 teams (Oct 2012).
Along the way, I’ve realized that my learning methods are sub-optimal, there are other ways to learn than just ‘being motivated and working hard’. I signed up for the “learn more study less” course by Scott H Young, and although it is too early to see any results, the course material is certainly worth it.
I’m off to a much needed break to Roopkund (it’s been 4 years since I hiked in the Himalayas L) , and its back to the Machine Learning Endeavour after that!

Clojure zippers - 2


Continuation of the earlier post on zippers.

So what if you want a tree with content in the parent nodes? Something like this:



We'll have to use records or maps as each node, and then use functions (usually keys) to get the content from the parent. Alex has a blog post on using records, here I'll stick with using a map
Loading ....

Note that "makenode" simply has to assoc(iate) the provided "children" vector to the existing map to add children.
On a side note, clojure.zip source is an excellent introduction on how to use metadata.


Paths to the root node
Although clojure.zip has the 'path' function that gives the list of nodes traversed from the root to the current node, it contains the whole node rather than just the content of :self. That is easily remedied thus:
Loading ....



Adding children to all leaf nodes

Adding children to all the leaf nodes is a useful feature in programming a board game, where all the prospective moves are stored in a tree, and we wish to look 4/5/6/n moves ahead. In such cases, we can grow the leaf node (find out the next possible set of moves).

Loading ....
We use a different function called add-children to create a vector of children to attach to each leaf node

clojure zippers


I bumped into Clojure's zipper data structure while trying to implement aMinimax algorithm for a game. There are some excellent resources on zippers, Brian's is a good introduction
Why should you use zippers? That's a good question, when you already have tree-seq in clojure/core.
Tree-seq is good for navigating a tree, printing out its nodes. Zipper provide the ability to edit a tree, which tree-seq does not.
A couple of things stumped me when I first encountered zippers. 
One of them is that a zipper declared with vector-zip doesn't have any content in a parent node. In other words, all the number elements of the following vector are leaf nodes.

Loading ....


Brian calls the head of the tree the "tomorrow tree". Actually, all the parent elements (or those that have branches), are vectors (or the tomorrow tree) and do not contain any "content".  And here's how you could visualize the tree





Tree editing


Since clojure is functional, each tree edit returns a new (copy of) tree. So the basic idea is, do some edits and call zip/root to get the root of the tree. Note that zip/root calls zip/node on the tree, so you need to create a zipper again if you want to traverse the edited tree.
Loading ....