Criteo.MemCache – why we wrote yet another MemCache driver
Sometime ago we were using Enyim/Couchbase client to connect to our Memcached/couchbase cluster.
Since then, we developed our own MemCache client library at Criteo and it has been recently open sourced on GitHub ( http://bit.ly/15DE04z). This article will talk about this client, and the reason we created it.
#Why did we develop it?
There were times when we had huge issues on our Couchbase production clusters, and also on our web servers which connect to it. Good thing to know is that, the Memcached instance that runs couchbase has a limit to 10K incoming connections. Beyond this limit, the process just refuses the connection. When we reached this point, the client on the web server saw the node as dead. This led to a very unpredictable distribution of the keys inside the cluster, a redundancy of many values, a huge increase of used memory, and finally the apocalypse on our web applications. All of that because we added new web servers to handle this growing traffic. That is when we had to find another solution.
So we tried to use moxi, and connect to only a few nodes in the cluster. This worked sometimes, but we soon understood that moxi is not very stable, triples the network bandwidth, and uses a huge amount of CPU.
We also wanted asynchronous API for our MemCache client library since we were using thread pools to handle synchronous requests, and we had a huge bunch of threads just spawned waiting on the network response. Since we wanted to replicate the data across our two data centers in the USA, we were dealing with 70ms ping links. We had huge bunch of sleeping threads to do that.
Then we had to prepare for the future, and we decided to experiment a new way to connect to Memcached.
#Our first prototype
To begin this experiment, we decided to test what kind of performance we could expect from a raw tcp socket over these high latency links. The first step was to send a huge batch of pre-computed requests, tweaking the socket parameters and to just drop the incoming responses. Actually, the only tweak was on tcp buffer size. The higher the latency, the bigger buffer you need. Since our dedicated links are pretty good, we didn’t see that many errors and there was almost no drawbacks to have big tcp buffers.
After a good tweak, it was possible to fill all our Gb interface with one single socket. Then it was confirmed that the Memcached server was not limiting at all.
For the second try, we read the response offline, to check there was no errors. Good, no errors, memcached really does handle our traffic, our only limit is the driver itself. Let’s go for the real coding stuff!
#Do we patch the existing open-source driver?
At the first look, Enyim driver seems very plugin oriented. I was very excited to hack into it and create my own plugin!
After a deeper look, a plugin was not really doable, the internals are not that well exposed. A better way would be to fork and alter the code. Let’s go patching!
Hundreds of lines later, I just realized that the whole project was designed around the text protocol, with an adaptation to handle binary protocol. Everything was really fully synchronous, and it was a pain to add an asynchronous API on top of it. It went nowhere, then after several attempts, I realized that for the small number of features we really use, it would be simply cleaner and easier to rewrite everything from scratch. The idea of our own driver was launched!
#Preparing the battle plan
We are starting a very exciting project, but since we want our new project to fit better to our needs, first we have to think what our needs are.
The first one is to be very similar to Enyim for the public interfaces. If it wasn’t, the migration of our applications would be a pain.
Let’s gather what are the features we use: get, set, delete with values which are only byte arrays, and we need the same key distribution among the clusters than Eyim does (Ketama). We don’t need increments, compare and swap, append, then we shouldn’t implement them at first. Keep it simple!
The plugin oriented interfaces of Enyim are really useful. The Couchbase driver heavily uses it to implement their own features. We must do the same. Anyway, this design seems more testable, so it seems to be the way to do it.
Remember the main motivation: to be asynchronous. The prototype showed me that the second asynchronous API of .NET sockets (the one with SocketAsyncEventArgs) is really efficient. Let’s use it!
We see that in the prototype, handling separately the requests and the responses was really efficient. So I will decouple the requests sending and the response handling. A queue to forward the request to match the response will enable us to mutiplex our requests.
Seems simple no? Ok, not that much but after few months of development and experimentation we were ready to release it to production. No issues! Hum, I mean no major issues! Ok I mean no major issue we were not able to solve…
The driver is now live and really fullfills our needs. All our problems are solved and there is no war in the world! Ok, it helps I mean, and all the problems raised are now solved. We can now think about the future.
We didn’t develop it just to fit basic needs, it was only the first stage. We can now think of more features.
We want to be able to use all the memcache protocols, this mean our goal is to implement all the opcodes documented in the protocol ( http://bit.ly/1uBhdgp).
We do use couchbase buckets. They don’t use Ketama key distribution. This means if we don’t want to stick to moxi for them, we must implement the vBuckets key distribution and their configuration from the Couchbase REST API.
We really want to experiment with a disconnected protocol. Memcached expose also a UDP interface. Then, a very simple implementation of the transport layer using UDP is definitely something we will do. Not sure it goes anywhere, but I least we must test it.
The byte array oriented interface is a good start. The next step will be to expose a generic interface with serializers.
#Should you use it?
If you read everything until this point, the answer should be clear. You should use it if you have the same constraints as us, and no additional needs.
* You can serialize everything from you own code. The current interface only accepts byte arrays.
* You don’t need the currently unimplemented features, or you think you can easily implement them. append, prepend, increment, decrement are easy to implement, for multiple response queries, remember that the transport is not yet able the handle them in a clean way. Implement them at your own risk.
* You don’t use a Couchbase bucket, or you accept to connect it through moxi.
This project was really exciting. On top of it, it really solves critical issues we were not able to address before:
* The high latency links are no longer an issue for us. We just use more memory to have bigger TCP buffers and a queue of pending requests.
* We dropped from up to 16 TCP connections per node to only 1 to handle even more load.
* We are now able to use asynchronous interfaces and lower the number of spawned threads. Our web applications are now much cleaner than before and more reactive.
All this work was really useful for us and I’m hoping to hear the same story from other people. If you use our driver and enjoy the way it works, please let us know!
Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.See DevOps Engineer roles