Saturday, June 27, 2009

10% already ?!

There has not been much progress for me in the past four months. As my "new" ideas no longer made noticeable contribution to the final result, I stopped working on improvements, instead, tried to develop some theory about the methods in the Netflix competition. It really bugs me to fiddle with hundreds of predictors and billions of parameters, without understanding exactly what I am doing and why.

It surely is more enjoyable thinking than computing the unchanging RMSEs. And I thought I could do that for a long time, anyway, the 10% seemed residing in a distant future. It's kinda funny in retrospect, because part of my theory was to measure the signal-to-noise ratio, thinking it might show that the data is simply too noisy to achieve 10% improvement. The team "BellKor's Pragmatic Chaos" just proved how wrong I was. It's like watching a marathon runner finished the race with a 15-meter triple jump. Amazing! Kudos to them!

It will be interesting to watch what happens in the next 30 days.

Monday, February 23, 2009

Dace in Netflix

Today is the last day of my paternity leave. Before going back to work, I guess it's time to log the fun I have had with the Netflix challenge.

I began to study the problem when I got a break between projects in late October 2008. After reading papers from the leading teams (many thanks to BellKor and Gravity for their papers), I realized it was such an interesting and well-defined problem. Since the college days, I have always wanted to work on Artificial Intelligence, but ended up living with VLSI designs everyday. Now here is a good chance to satisfy my itch and learn from experts in this area. Besides, the remote chance of one million dollars surely does not hurt.

This actually became a perfect task after my son was born. The competition is mostly about algorithms, yet needs significant amount of run time to produce results. So I can do the thinking while holding the baby in my arms and mumbling lullabies. Find a break to code something, kick off a run, only need to come back hours later to check the results. And when I am watching baby gradually learn to interact with the world, I couldn't help thinking where machine learning can lead us to.

There is a big difference though. While my son keeps growing everyday, the progress on the Netflix prize seems to reach an asymptotic stop. As of today, my score is 9.52% improvement. Starting from tomorrow, there will be less time for me to spend on the problem, but I will still work on it. There are still a few ideas to explore and it's fun!