How to win a free trip to San Francisco

By: CriteoLabs / 18 Jul 2014

One possible way was to qualify to the TopCoder Open onsite final, San Francisco, Nov 2014. Let me explain how I managed to take one of the last four places of the last qualification Marathon Match round. TopCoder Marathon Matches are complex algorithmic challenges to be solved in one or two weeks (hopefully the onsite final is only 12 hours long).

The subject

In this last qualification round, we had to create a collage of 200 images to reconstruct a given target image. The source and target images were randomly taken from Wikipedia. We were allowed to downscale the images but not to rotate them. The goal was to find a non-overlapping collage that minimizes the RMSE with a target image. See the full problem statement.

The problem was really fun but unfortunately the outcome wasn’t really visually impressive. Indeed, the RMSE error we had to optimize mainly penalizes luminosity error and is not really suited to capture human eye perception. Here are two illustrations of the output of my final solution after two weeks of competition:

High contrast with bad RMSE (47) but decent visual outcome

High contrast with bad RMSE (47) but decent visual outcome


Low contrast with better RMSE (35) but poor visual outcome

Low contrast with better RMSE (35) but poor visual outcome

The approach

We had 10 seconds per test-case to construct the collage given a target image and 200 smaller source images. My approach was the following:

  • Find a good initial position
  • Optimize the solution using some kind of simulated annealing
  • Have a highly optimized error and re-sampling computation.

Finding a good initial image layout

We first find a good initial image layout of sub-images, then assign the 200 images to minize error on this fixed layout

A way to find a good layout is to do a recursive cut that selects at each step which sub-image to split into two rectangles so that the lowest in-rectangle variance is achieved (try detect dark and light sub-images).

But as we knew all the 1000 possible target images, we could actually precompute the layout of sub-images for all these images. We simply kept the layout of the best solution as an initial step for other solutions using the same target. The encoding took 2 bytes per sub-image and I managed to make it fit in the 256KB source code limit we had.

Once we had a layout of sub-images, we used a greedy approach that assigned the best remaining image for each position, starting with the position that had the largest error gap between the best image and the second best. The idea was that, if an image is really better for a given place than all the others, then we need to pick it before it is used in another place.

Other competitors optimally solved this assignment problem using the Hungarian Algorithm which I didn’t try during the contest.

Local improvements of the solution
Then we optimize the position using a simulated annealing like approach. The following elementary steps are allowed:

  • Swap two images (including swapping with an already allocated image)
  • Merge two images into one (if they are sharing an edge)
  • Split an image into two sub images
  • Move the edge of an image with all the other images touching the edge (and recursively).

If no move improves the RMSE, we make a random swap. Note that we don’t evaluate all splits or edge moves but a random subset of them. And we of course keep the best solution seen so far.

Fast computation of resampling and RMSE
The key point was to find a good solution in 10s was to have a highly optimized computation of image RMSE and re-sampling. The following tricks were used:

  • Use cache of already computed image squared errors. Note that STL unordered_map was too slow and was replaced by a very basic Hash Table. I had a hit ratio of 80% on this cache, showing that we tried to reevaluate the same moves many times.
  • Decompose the re-sampling of the image into a horizontal re-sampling followed by a vertical re-sampling, then cache all the horizontal re-sampled images (up to 100 images to store per source image)
  • Use floating representation of pixels during re-sampling and optimize the computation using SSE.
  • Compute the RMSE using SSE.
  • Use a lower bound of the squared error computed from mean colors of both target and source image to avoid computing unnecessary errors when we cannot do better than a current best move found.

With all these improvements, 80% of the CPU time was consumed by RMSE and resampling computations.
I tried but abandoned quite soon to solve the problem using sub-sampled smaller images to be able to reduce the computational cost of error evaluation with a reduced accuracy. First results were disappointing and I didn’t try further.


I worked on the contest during evenings, nights and week-ends. I got my final solution at the end of the second week-end and submitted it on Monday evening hoping to stay in the top 4 for the next two days. Last day was quite stressful, looking at the increasing scores of the competitors. I ended 3rd of provisional and final ranking (my undercover name is doudouille), which gave me my ticket to San Francisco. It will be my 5th TCO final with competitors from all around the world: Poland(3), China(2), France(2), South Korea(1), Ukraine(1), Brazil(1), Japan(1), Georgia(1).

Author: Pascal Pons

  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.