Criteo Engineering > Prediction > Prototype: GPU for real-time prediction server

Prototype: GPU for real-time prediction server

By: CriteoLabs / 05 Mar 2014

A few months ago, we tried to move our real-time prediction component to dedicated GPU servers.

The basic operations are simple enough: hashes, scalar products, plus some specific operations for our prediction algorithm.

Some of our coworkers were really skeptical because the GPU paradigm does not fit very well with these operations: notably, the prediction step requires random access to large chunks of RAM that do not fit well the GPU memory model.

So we built a quick C++ prototype using OpenCL and a Quadro 600 found in an unused workstation to bench against reference C# code, and we got some interesting results.

The algorithm as run on the GPU is comprised of five steps:

Data transfer to the GPU
Data formatting within the GPU: some byte array manipulations to adapt data structures to GPU
Hash computation
Prediction: apply the prediction model
Pass back data to the main CPU

TABLE1

20% of the time is overhead due to GPU technical constraints (1,2 & 5).

If we compare the 80% remaining to CPU:

TABLE2

We increased the sequential prediction rate from 140 000 prediction/s to 430 000 prediction/s on our test server. The main improvement is on the MurmurHash3 computation step where we get a nearly x10 speed-up.

It is also interesting to see that the prediction step is very slow on GPU compared to CPU.

We think these results can be explained by memory access patterns:

The hash computation has good locality.
The kernel used for prediction does many accesses to its global memory where our prediction models are stored. This is done in a random pattern which cannot take advantage of memory coalescing and is therefore very inefficient bandwidth-wise.

A topic we chose to ignore is the concurrency model: this experiment has been made on a single-threaded scenario, whereas we obviously run multithreaded in production.

We are still looking for tricks to avoid access to global memory and improve our prototype, and try this out on an APU.

By the way, if you want to play with these technologies at scale, we are hiring! 😉

Authors: Laurent Vion & Vincent Perez

CriteoLabs
Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.
See DevOps Engineer roles

Prototype: GPU for real-time prediction server

Related articles

Categories

Products & Technology

Updates

About

Get in Touch

Paris Office

Palo Alto Office

Ann Arbor Office

Grenoble Office

Prototype: GPU for real-time prediction server

Related articles

Categories

Products & Technology

Updates

About

Connect with Us

Get in Touch

Paris Office

Palo Alto Office

Ann Arbor Office

Grenoble Office