Tech note: Java Virtual Machine(JVM) vs Erlang Runtime System(ERTS)

Java is my first language. From third year of undergraduate, I switched to C++ because of image processing/computer graphics related course work. After graduation, I never use Java as Java is not used frequently in the game industry.

Elixir is my most recently language. It is derived from Erlang. I know Elixir follow the path of Unity->Ulink->Riak(Ulink's default database option, which is written in Erlang)->Erlang->Elixir(derivation of Erlang). Follow this path comes with Phoenix(one of most recently web-framework).

Now, I have many programming language friends, Java/C++/C#/Python/Elixir and some niche languages. To know a programming language, I only have to know the machine and how the machine speaks the language. Particularly for Java and Elixir, the core is how their underlying virtual machine works.

In Erlang/Erlang virtual machine case, I was forced to think the relationship among OS processes, OS threads and CPU cores. To make it simple,
1. An OS process is a kind of resource center of execution units, which are called threads.
2. An OS process must contains at least one thread. Each thread is executed by one CPU core. (Normal applications normally contain several threads. For example, Microsoft office word uses at least two threads, one for data-saving, one for UI display. So that office word could auto-save the document while the user is editing.)
3. Processes share nothing among themselves. OS processes shares no memory. Erlang processes shares no memory.
4. An Erlang related OS process will spawn an OS thread for each CPU core. Such OS thread is called scheduler, which could manage a large number of Erlang process.

Disclaimer: following is a self study note after reading "Comparison of Erlang Runtime System and Java Virtual Machine". Almost all content is summarized/copied from the article. Refer to the article for completeness.

Java Virtual Machine(JVM) vs Erlang Runtime System(ERTS)

The comparison is in these four areas

overall architecture
memory layout
parallelism/concurrency
runtime optimization

1. Overall architecture

Java Virtual Machine(JVM)

JVM works by executing bytecode statements from a class file, generated by compiling the java source code. This indirection is what gives programmers the ability to compile their source code once and then execute it on different platforms that have JVM implementations. HotSpot VM by Oracle is the most widely used implementation, chosen for comparison in this article.

Erlang Runtime System(ERTS)

Erlang is a functional programming language based on the actor model, which dates back to 1973. The actor model specifies the actor as the unit of concurrency. In Erlang, such unit is call Erlang process. Actors can only communicate with each other by sending messages, but are otherwise independent, which allows to run them in parallel with each other.

Similarly to Java, ERTS works by executing an intermediate representation of Erlang source code, also known as BEAM code.

2. Memory layout

JVM(HotSpot VM)

Figure 1. HotSpot VM memory layout

Heap: where all objects and arrays live
Heap on the HotSpot VM consists of three distinct areas(aka generation) called Eden, survivor and Tenured space.[ this optimization stems from the observation that most objects die young, thus it makes sense to put them into a separate area and garbage collect that smaller area more often than the whole heap)

Non heap area: designated for metadata about the classes loaded and a Code Cache, which is used for compilation and storage of methods that have been compiled to native code by the JIT compiler.

Thread: every thread in the JVM has

a program counter, which holds the address of the current instruction(if it’s not native),
a stack which holds frames for each executing method
a native stack It is used for native methods, and created per thread. If the native methods cannot be loaded by a JVM then it need not have native method stacks. Memory size is managed similar to general JVM stacks like fixed or dynamic. JVM will throw StackOverflowError or OutOfMemoryError accordingly.

Every thread in JVM is mapped one-to-one to an OS level thread, which is then scheduled and managed by the OS.

JVM has many Garbage Collection algorithms embedded, which cater to different needs of the applications. Example, generational garbage collection: heap is divided into a young and old generation.

In general to collect garbage from the shared heap, a stop-the-world pause is needed, where all application threads are halted.

ERTS

Figure 2. Erlang Runtime System memory layout

Each Erlang process has its own memory area, which contains

Heap and stack: Unlike Java, Erlang process has its own heap, which share same memory space with stack, where both are growing towards each other. Such setup makes it very cheap to check whether a process is running out of heap or stack space.
Process control block, used to hold various metadata about the process and a message queue. Binary data that is larger than 64 bytes is kept in a separate area (Binaries), so that different processes could simply refer to it by a pointer.
Message Queue: input queue for messages it receives

Garbage Collection(GC) in Erlang
Binary area: reference counting
Per process GC: copying collector, allocate another memory area and then copy over each element that is still referenced, starting with the root set. When a process dies all its memory can be deallocated right aways. No global stop-the-world pauses.

3. Parallelism/concurrency

Java: more general-purpose programming language
Erlang: a tool for building massively scalable soft real-time systems with requirements on high availability

3.1 Java--Shared State

In 1971 Edsger Dijkstra posed and solve the Dining Philosophers Problem about concurrency. His solution is often referred to as shared state concurrency, which is the programming model Java uses.

The JVM has a single heap and sharing state between concurrent parties is done via locks that guarantee mutual exclusion to a specific memory area. This means that the burden of guaranteeing correct access to shared variables lies to the programmer, who must guard the critical sections by locks. Shared state between threads brings troubles:
a. Deadlocks(example, two thread both needs resources A, B. One locks A and the other locks B. No thread can progress)
b. Race conditions(occurs when programmer hasn’t correctly guaranteed the order of execution between two threads. Which means that the correctness of the program depends on the OS scheduler, which is not deterministic, resulting in hard to reproduce errors)
c. Spending too much time in the critical section(one thread holds shared resources for too long, making other interested parties wait and do nothing during that time)

3.2 Erlang--Message passing

In actor model, actors uses message passing to communicate. There are no share memory among actors to lock. In Erlang, such actor is called process, which is not OS process. OS processes share no memory. Erlang execution unit also share no memory. That could be the reason why Erlang execution unit is named as process. Because of per the heap per process architecture in ERTS, message passing is done by coping the message from one process heap to another.

Figure 3 Erlang process, scheduler and CPU cores

JVM maps its threads one-to-one to OS level threads, which are then scheduled by the OS scheduler, but in Erlang a process is simply a separate memory area and mapped n-to-m to OS level threads. Since 2006 ERTS supports true multithreading through Symmetric Multi Processing (SMP) with SMP ERTS starts with the same number of schedulers as there are cores. Refer to Figure 3, each core will have an Erlang scheduler which is OS thread. Each Erlang scheduler could schedule a large number of Erlang processes.

Each scheduler has a run queue that contains runnable processes. A process runs until it tries to receive a message, but the mailbox was empty, or it runs out of reductions. The meaning of reductions in ERTS is not clearly defined, but they should represent “units of work” and are roughly equivalent to function calls. Each process that starts running initially has 2000 reductions, and when it runs out is put at the end of the run queue.

Scheduler will balance work between each other. Blocking an Erlang process doesn’t block a scheduler. Erlang process are not OS level threads they are more lightweight, which is the main reason ERTS can run hundreds of thousands processes. By default, Erlang process will use around 2.5KB on a 64bit machine, where as a Java thread starts off with a 1024KB stack on a 64bit machine.

Java version of lightweight threads. Quasar, Akka. However, without fundamental changes in how the JVM works, one cannot guarantee that an arbitrary piece of code will not block.

4. Runtime optimization

Both Erlang and Java source code are not directly compiled to executable binaries or native code. Instead they rely on the runtime to execute the statements in the intermediate language(bytecode for java and BEAM code for Erlang)

Java: JIT compiler

Purely interpreting commands sequentially is slower simply because of having to translate commands over and over again into the machines native instructions, not to mention various optimization that good compilers do. JIT compiler in the HotSpot works by keeping track of what methods are “hot”(called often) and then optimizing and compiling them to native code. This has the benefit that effort is not spent to compile/optimize methods that don’t execute or execute rarely.

Erlang: HIPE and BEAMJIT
ERTS comes with a ahead-of-time(AOT) compiler called HiPE(High Performance Erlang). The user has to choose which functions or modules are compiled into native code. It doesn’t do it automatically during runtime for most used functions.

Compiling ahead of time strips away many possibilities for optimization that the HotSpot JIT compiler does. Essentially the HotSpot JIT compiler is able to gamble the system into better performance by doing shortcuts. For example the HotSpot JIT compiler assumes that a method never throws an exception or that most methods are not overloaded and links directly to a specific callsite instead of traversing the class hierarchy each time to find the most specific method.

A just-in-time compiler for ERTS is called BEAMJIT.

Summary

JVM provide tools to retrofit any concurrency model, but retrofitting anything won't e the same as taking it into the initial design.

“Erlang accidentally has the right properties to exploit multi-core architectures – not by design but by accident”--Joe Armstrong

HamsterCode

Search This Blog

Tech note: Java Virtual Machine(JVM) vs Erlang Runtime System(ERTS)

Comments

Post a Comment

Popular posts from this blog

A simple prototype to MOBA in Unity C#

Fast Game Development in C#(Unity, ET, MongoDB)