Criteo Engineering > Conference > Mickaël Lacour and Justin Coffey at Twitter openconf

Mickaël Lacour and Justin Coffey at Twitter openconf

By: CriteoLabs / 04 Apr 2014

Last week, Mickaël Lacour and Justin Coffey went to Twitter’s OSS #conf to talk about our work migrating over 1PB of data from RCFile to Parquet.

We import over 30B records per day into our Hadoop cluster and have a large user base running thousands of queries against it as well as many hundreds of aggregations being executed throughout the day. The benefits of columnar storage engines for analytic workloads are numerous ( http://en.wikipedia.org/wiki/Column-oriented_DBMS and http://research.google.com/pubs/pub36632.html are good places to start) and given our heavy usage of Hive, we quickly opted for the RCFile format.
RCFile has been great to us, but we wanted to move away from a Hive-only solution and towards a more open format that would be easy to use across any hadoopian execution engine (think scalding, pig, spark, impala, etc.). Parquet ( http://parquet.io/) looked like the perfect fit, but was lacking Hive support. We went ahead and contributed that layer and then got to work putting it into production.

It is currently live on a few of our largest datasets and we are working on moving over the rest and expect to complete the job in the next month or so.

We are super excited about this as we look forward to taking advantage of all the work being done in the Parquet world (new encodings, indexes, and more) and the flexibility to start looking at alternatives to Hive for analytics work.

finally at #CONF we have @jqcoffey wrapping things up by talking about @ParquetFormat usage at @CriteoEng pic.twitter.com/PZDZzU1aRb

— Twitter Open Source (@TwitterOSS) April 3, 2014

CriteoLabs
Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.
See DevOps Engineer roles

Mickaël Lacour and Justin Coffey at Twitter openconf

Related articles

Categories

Products & Technology

Updates

About

Get in Touch

Paris Office

Palo Alto Office

Ann Arbor Office

Grenoble Office

Mickaël Lacour and Justin Coffey at Twitter openconf

Related articles

Categories

Products & Technology

Updates

About

Connect with Us

Get in Touch

Paris Office

Palo Alto Office

Ann Arbor Office

Grenoble Office