JCrete 2018

By: CriteoLabs / 19 Sep 2018

What is JCrete

As described on the web site:

An Open Spaces Conference on an Island in the Mediterranean Sea

But all attendees speak about it as an “unconference” in opposition as a traditional one. What is so different? No CFP, No speakers, No pre-planned agenda, Open Space Technology usage.

In practice, when you arrive to JCrete you don’t know about the subjects that will be discussed during the 4 days (Last day is a Hackaton). Even if the main topic is Java/JVM ecosystem, other subjects from “Self-awareness of introverts” to “Startup experiences” via “Spectre/Meltdown crash course” are discussed.

https://twitter.com/badrelhouari/status/1022493744296275969

Everybody is a speaker. Subjects are submitted and voted by everybody. Then, we try to schedule according to number of votes. When a session starts, everybody is allowed participate to the open discussion about the subject.

2018 (deluxe) edition

I had the chance this year, again, to attend to this unconference on the beautiful island of Crete. As several sessions were happening at the same time, the choice was sometime difficult. Here is my report about the ones I attended:

Day 1

Exception handling

Proposed by Cliff Click – Video

What is the best way to handle errors in programming language? Automatically unwind stacks with exceptions (Catch & Rethrow) or Go style (or Elm): Either Panic and kill the process or manually handling each error and unwind if necessary.
Another alternative mentioned is to use coroutine mechanism to patch faulting code and retry.

The crucial thing when dealing with exceptions is to keep information of the context to be able to troubleshoot correctly and not like catching a checked exception then throwing a new (runtime) exception losing the first one (cause).

Regarding performance, we do not care because it should be exceptional…

It was remarked that there is conflation of idea about a real error (OutOfMemory, StackOverflow, NullPointer, …) and raising an exception, for instance, when a record not found in a table.

In functional programming, there is some concepts with 2 monads: Try and Either

There is a consensus that Java Checked Exceptions made the handling very difficult forcing people taking care about exception when they don’t know to do about it.
The language could technically chain automatically rethrown exception together through causes.

Why catching exception: Either you can recover then you handle the exception and you are done, or you catch but you don’t know what to do, why bother catch then? Maybe you annotate the exception with additional information but rethrow because this is not the final catch.

 

Challenges of AOT

Proposed by Ivan Krylov

There are 2 different issues that you may want to address: startup and warmup. You can address warmup by storing JIT compiler data from previous run and apply them at startup to avoid gathering them again during a warmup period. This is what Azul is doing with ReadyNow! on Zing JVM.

Excelsior Jet and OpenJDK provide AOT feature that provide fast startup and no warmup at the expense of peak performances as no JIT will perform optimistic optimizations. There are also some limitations like no dynamic loading of classes, no or limited reflections, no method handles or lambda expressions.

In case of Serverless where you need to restart frequently containers it may make sense to precompile to save startup time.

Day 2

Value Types

Proposed by Rémi Forax – Video

Valhalla project includes 2 things: Value Types & Partial reification of generics (AKA generic specialization)

Value type is not a struct because there is no copy semantic to avoid performance & concurrency issues.
This is an immutable data structure.
VM decides if value types will be on stack or heap. For interpreter phase, it will be on stack, but when JIT compiled, it will mainly use registers. If big enough, it may automatically be boxed on the heap.
Regarding arrays, small arrays of value types can be flatten, but if the array is too big, it will be tranformed to array of references
Most of the time the optimization will happen, unlike Escape Analysis.

For initialization: default & withfield primitives. The value type java constructor will be transformed to the following bytecodes:

default
withfield 
withfield
...

default gives zero initialized value type, then each withfield will create a new value type initialized with value for the field described.

Currently, a reference to an array inside a value type is a reference, it is not flattened.

Some restrictions for value types:

  • no identity (Object header)
  • == returns false and will not compile. To perform component-wise comparison you need to use .equals() method
  • no null value for value type

Perf War Stories

Proposed by Jean-Philippe Bempel

This session was a potpourri of different performance war stories from various persons, so I will not report all of them. However I will report mine:

At Criteo, we are running one of the largest Hadoop Cluster in Europe. It’s running around 150M of containers a day. Each container is a JVM that is spawn each time. The team in charge of running the cluster built a tool to monitor those JVM individually. I was in charge to collect JMX metrics and build heuristics based on those metrics. When doing that, I discover that every single JVM reports Full GC during the startup. The cause of this Full GC was “Metadata GC Threshold”. When metaspace, where reside class metadata, is not big enough and need to be resized, JVM triggers a Full GC to increase this space. The pause induced is only 200-500ms long. For batch jobs seems not a big deal, but at the scale of 150M containers, time adds up dramatically fast, up to hundreds of hours. It’s frustrating as it’s very simple to fix it by increasing the default value for the Metaspace with the JVM options -XX:MetaspaceSize. By this simple fix we were able to save significant resource usage on the cluster!

GC shout out

Proposed by Simone Bordet – Video

More low latency GCs on the JVM ecosystem:

  • Azul C4
  • Red Hat Shenandoah
  • Oracle ZGC

Also added in JDK 11, Epsilon GC which removes barrier overhead. It is doing only allocation and no reclamation, no special reference processing, no pinning management.

For concurrent collectors, mutator/application can outpace the collector, and with limited heap, you can run to a problem.
How to deal with it:

  • Shenandoah paces the allocation  and, if not enough, fallback to pause to finish the work
  • ZGC does not have the fallback to FullGC and stalls allocations until room is made
  • C4 and ZGC are storing forwarding information outside the heap and release quickly the collection set. So chances are the first page released can satisfy the current allocation

In C4, mutator threads can do extra work in the case where reference fixups is not yet performed for some objects already relocated.
Escape Analysis, can avoid creating some objects, and so sophisticated compilers can also reduce the work for the GC.

Playing with Stop-The-World collectors tends to force you to limit the heap, while with concurrent collector and outpace problem you tend to increase the size of the heap which also help the GC by doing more concurrent work.

Regarding footprint (related to Cloud constraints), STW collectors have the following trade-off: Either you have low footprint, or you have frequent GC which increase latency. For concurrent collector, you can run it proactively and periodically to get rid off floating garbage and reduce footprint without incurring pauses. But you trade CPU cycles.
Counter-intuitively concurrent collectors do better job when they have more room than just the live set to balance the work.

Memory spaces on alternative devices (NV-DIMM) is a new caching hierarchy layer, so it’s a cache policy problem.

Are those low latency GC ready for production?

  • Shenandoah is functionally complete, performance is good, could be tried in production. Stable enough to be shipped with RHEL.
  • ZGC is under experimental flag in JDK 11 and works only on Linux x86-64.
  • C4 is in production on Linux x86-64 for about 8 years.

Some testings with C4 report mutators throughput impacted by collector. There is few percent overhead due to barriers, Load Value Barrier more specifically.
Concurrent collectors have very low pauses thanks to barriers, but if you care about throughput you choose a STW collector which does not have this overhead.

Day 3

JDK migration/Release cadence

Proposed by Andres Almiray & Aristos Tofallis – Video

It should be as easy as changing a pom file to migrate to new JDK but when you try to migrate to JDK 10, you realize that third-parties libraries are not ready. Travis CI proposes early-access JDK to build a project, but not all companies can use Travis CI as they are privacy concerns about Travis-like tools. Migrating to at least JDK 9 and deal with module issues can ease future upgrade of JDKs

Starting from JDK 11, Oracle releases are not free to use in production! So what JDK to use? OpenJDK? Zulu from Azul? Red Hat? IBM? What about support? backport of security issues/bugs?
Regarding Oracle JDK, it is free for development and test, but for production usage you need to to pay a license. LTS (Long Term Support) label only applies to Oracle JDK.
Some vendors offer support for some JDK versions like Azul and Red Hat.
OpenJDK support is only dependent on the downstream build of the vendor. Oracle says it’s six month but it only for its version, not OpenJDK in itself.
AdoptOpenJDK provides also downstream builds that passes the TCK.
You can have enough internal expertise to run a team and build your own version of OpenJDK (like SAP, Twitter, …) or you can pay for a vendor (Azul, Red Hat, Oracle…)

There are 2 solutions for deprecation: Deprecated or deprecated for removal. And you have the version when the removal will be performed. When API are for removal it’s because they are harmful.
Some modules like JAXB or Corba are removed from the JDK because it was a mistake to add them in the first place (JDK 6). They are available on Maven. Some companies have painful process to add dependencies for licensing or security and be part of the JDK is easier to rely on.

OpenJDK can be seen as Linux Kernel, and Oracle, Azul, Red Hat are the distributions.

Some critical libraries need to follow the OpenJDK release cycle closely to be ready as dependencies for others (Maven, ASM, …), so 6 months releases may be too fast to keep up the pace. At least OpenJDK is moving faster, and there are not so many differences between for example JDK 10 & 11, so it shouldn’t be that difficult.

GraalVM

Proposed by Oleg Selajev

GraalVM is a polyglot VM that supports JVM languages, but also LLVM based ones. It comes, at first, from academic research project in Oracle Labs.
This VM is able to transform interpreter for a language to an optimizing compiler to compile for JVM but also node.js, Oracle DB or as standalone executable. The latter is done by using SusbtrateVM which provide inside the executable a JVM with a simple GC implementation.

Since JVM Compiler Interface has been included into OpenJDK (JDK9), you can change easily the JIT compiler. Graal Compiler plug in through JVMCI to get bytecode and generate optimized code. This setup can be used for Scala applications and turns out to be more efficient than C2. Twitter has pushed this in production successfully.

Truffle is a framework to create managed languages. It provides an API for creating interpreter for your language. Sulong is another framework that targets LLVM.

There are 2 editions:

  • Community Edition (Open source, GPLv2 + classpath exception)
  • Enterprise Edition (paid licence) includes additional performance (more optimizations)

Day 4

Spectre/Meltdown crash course

Proposed by Cliff Click

In this session, Cliff made a lecture on how speculative execution works on x86 and how Spectre/Meltdown exploits are able to read any memory content.

There is a good summary by Charles Nutter here.

CLR-JVM implementation differences

Proposed by Jean-Philippe Bempel

On the CLR, for executing code, you need to compile it, there is no interpreter. There is currently only 1 tier close to C1 on JVM. It has the constraint to compile quickly as JIT compiler is executed into the calling thread.

A big difference regarding the job for the JIT is that virtual methods need to be explicitly declared. So virtual calls are rather the exception then the norm.

.NET interface calls are more expensive than their equivalent on the JVM: The resolution of the concrete method to call need to lookup in different places.

There are plans to have a tiered compilation on the CLR. New optimizations are included into .NET Core such as devirtualization in trivial cases.

Regarding GC, the big difference is in the sizing: JVM needs to fix a limit on the heap while CLR can consume all the physical memory available on the machine.

3 generations are available: gen0 (Eden), gen1 (Survivors), gen2 (Old). There is also a special space for large objects (> 85,000 bytes): LOH (Large Object Heap).

The algorithm is very similar to CMS GC as for gen2, there is background marking and sweeping and at some point, gen2 can be compacted (like a FullGC).

A specificity to CLR is there is a lot of pinned objects (allocated buffer that should not be moved because native calls). In consequence, the GC is handling free-lists and deal with fragmentation a lot more than we can see on JVM.

Very little tuning is available on the CLR. It works based on heuristics. On the opposite, on the JVM we have  a lot of control, but sometimes we can see a lot of options used in production systems without any understanding why those options are set in the first place!

Post written by:


Jean-Philippe Bempel

Staff Software Engineer, R&D.

Twitter: jpbempel

 

  • CriteoLabs

    Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.