Skip to main content

Tech note: Java Virtual Machine(JVM) vs Erlang Runtime System(ERTS)

Java is my first language. From third year of undergraduate, I switched to C++ because of image processing/computer graphics related course work. After graduation, I never use Java as Java is not used frequently in the game industry.

Elixir is my most recently language.  It is derived from Erlang. I know Elixir follow the path of Unity->Ulink->Riak(Ulink's default database option, which is written in Erlang)->Erlang->Elixir(derivation of Erlang). Follow this path comes with Phoenix(one of most recently web-framework).

Now, I have many programming language friends, Java/C++/C#/Python/Elixir and some niche languages. To know a programming language, I only have to know the machine and how the machine speaks the language. Particularly for Java and Elixir, the core is how their underlying virtual machine works. 

In Erlang/Erlang virtual machine case, I was forced to think the relationship among OS processes, OS threads and CPU cores.  To make it simple,
1. An OS process is a kind of resource center of execution units, which are called  threads.
2. An OS process must contains at least one thread. Each thread is executed by one CPU core. (Normal applications normally contain several threads. For example, Microsoft office word uses at least two threads, one for data-saving, one for UI display. So that office word could auto-save the document while the user is editing.)
3. Processes share nothing among themselves. OS processes shares no memory. Erlang processes shares no memory.
4. An Erlang related OS process will spawn an OS thread for each CPU core. Such OS thread is called scheduler, which could manage a large number of Erlang process.

Disclaimer: following is a self study note after reading "Comparison of Erlang Runtime System and Java Virtual Machine". Almost all content is summarized/copied from the article. Refer to the article for completeness.

    Java Virtual Machine(JVM) vs Erlang Runtime System(ERTS)


The comparison is in these four areas
  • overall architecture
  • memory layout
  • parallelism/concurrency
  • runtime optimization

1. Overall architecture


Java Virtual Machine(JVM)

JVM works by executing bytecode statements from a class file, generated by compiling the java source code. This indirection is what gives programmers the ability to compile their source code once and then execute it on different platforms that have JVM implementations. HotSpot VM by Oracle is the most widely used implementation, chosen for comparison in this article.

Erlang Runtime System(ERTS)

Erlang is a functional programming language based on the actor model, which dates back to 1973. The actor model specifies the actor as the unit of concurrency.  In Erlang, such unit is call Erlang process. Actors can only communicate with each other by sending messages, but are otherwise independent, which allows to run them in parallel with each other.

Similarly to Java, ERTS works by executing an intermediate representation of Erlang source code, also known as BEAM code.


2. Memory layout


JVM(HotSpot VM)
 Figure 1. HotSpot VM memory layout

Heap: where all objects and arrays live
Heap on the HotSpot VM consists of three distinct areas(aka generation) called Eden, survivor and Tenured space.[ this optimization stems from the observation that most objects die young, thus it makes sense to put them into a separate area and garbage collect that smaller area more often than the whole heap)

Non heap area: designated for metadata about the classes loaded and a Code Cache, which is used for compilation and storage of methods that have been compiled to native code by the JIT compiler.

Thread: every thread in the JVM has
  • a program counter, which holds the address of the current instruction(if it’s not native),
  • a stack which holds frames for each executing method
  • a native stack  It is used for native methods, and created per thread. If the native methods cannot be loaded by a JVM then it need not have native method stacks. Memory size is managed similar to general JVM stacks like fixed or dynamic. JVM will throw StackOverflowError or OutOfMemoryError accordingly.
Every thread in JVM is mapped one-to-one to an OS level thread, which is then scheduled and managed by the OS.

JVM has many Garbage Collection algorithms embedded, which cater to different needs of the applications. Example, generational garbage collection: heap is divided into a young and old generation.

In general to collect garbage from the shared heap, a stop-the-world pause is needed, where all application threads are halted.


ERTS

 Figure 2. Erlang Runtime System memory layout


Each Erlang process  has its own memory area, which contains
  • Heap and stack: Unlike Java, Erlang process has its own heap, which share same memory space with stack, where both are growing towards each other.  Such setup makes it very cheap to check whether a process is running out of heap or stack space.
  • Process control block, used to hold various metadata about the process and a message queue.  Binary data that is larger than 64 bytes is kept in a separate area (Binaries), so that different processes could simply refer to it by a pointer. 
  • Message Queue: input queue for messages it receives

Garbage Collection(GC) in Erlang

Binary area: reference counting
Per process GC: copying collector, allocate another memory area and then copy over each element that is still referenced, starting with the root set. When a process dies all its memory can be deallocated right aways.  No global stop-the-world pauses.



3. Parallelism/concurrency



Java: more general-purpose programming language
Erlang:  a tool for building massively scalable soft real-time systems with requirements on high availability

3.1 Java--Shared State

In 1971 Edsger Dijkstra posed and solve the Dining Philosophers Problem about concurrency. His solution is often referred to as shared state concurrency, which is the programming model Java uses.

The JVM has a single heap and sharing state between concurrent parties is done via locks that guarantee mutual exclusion to a specific memory area. This means that the burden of guaranteeing correct access to shared variables lies to the programmer, who must guard the critical sections by locks. Shared state between threads brings troubles:
a. Deadlocks(example, two thread both needs resources A, B. One locks A and the other locks B. No thread can progress)
b. Race conditions(occurs when programmer hasn’t correctly guaranteed the order of execution between two threads. Which means that the correctness of the program depends on the OS scheduler, which is not deterministic, resulting in hard to reproduce errors)
c. Spending too much time in the critical section(one thread holds shared resources for too long, making other interested parties wait and do nothing during that time)


3.2 Erlang--Message passing


In actor model, actors uses message passing to communicate. There are no share memory among actors to lock. In Erlang, such actor is called process, which is not OS process.  OS processes share no memory. Erlang execution unit also share no memory. That could be the reason why Erlang execution unit is named as process.  Because of per the heap per process architecture in ERTS, message passing is done by coping the message from one process heap to another.
Figure 3 Erlang process, scheduler and CPU cores

JVM maps its threads one-to-one to OS level threads, which are then scheduled by the OS scheduler, but in Erlang a process is simply a separate memory area and mapped n-to-m to OS level threads. Since 2006 ERTS supports true multithreading through Symmetric Multi Processing (SMP) with SMP ERTS starts with the same number of schedulers as there are cores. Refer to Figure 3, each core will have an Erlang scheduler which is OS thread. Each Erlang scheduler could schedule a large number of Erlang processes.

Each scheduler has a run queue that contains runnable processes. A process runs until it tries to receive a message, but the mailbox was empty, or it runs out of reductions. The meaning of reductions in ERTS is not clearly defined, but they should represent “units of work” and are roughly equivalent to function calls. Each process that starts running initially has 2000 reductions, and when it runs out is put at the end of the run queue.

Scheduler will balance work between each other. Blocking an Erlang process doesn’t block a scheduler. Erlang process are not OS level threads they are more lightweight, which is the main reason ERTS can run hundreds of thousands processes. By default, Erlang process will use around 2.5KB on a 64bit machine, where as a Java thread starts off with a 1024KB stack on a 64bit machine.

Java version of lightweight threads. Quasar, Akka. However, without fundamental changes in how the JVM works, one cannot guarantee that an arbitrary piece of code will not block.

4. Runtime optimization

Both Erlang and Java source code are not directly compiled to executable binaries or native code. Instead they rely on the runtime to execute the statements in the intermediate language(bytecode for java and BEAM code for Erlang)

Java: JIT compiler

Purely interpreting commands sequentially is slower simply because of having to translate commands over and over again into the machines native instructions, not to mention various optimization that good compilers do. JIT compiler in the HotSpot works by keeping track of what methods are “hot”(called often) and then optimizing and compiling them to native code.  This has the benefit that effort is not spent to compile/optimize methods that don’t execute or execute rarely.

Erlang: HIPE and BEAMJIT
ERTS comes with a ahead-of-time(AOT) compiler called HiPE(High Performance Erlang). The user has to choose which functions or modules are compiled into native code. It doesn’t do it automatically during runtime for most used functions.

Compiling ahead of time strips away many possibilities for optimization that the HotSpot JIT compiler does. Essentially the HotSpot JIT compiler is able to gamble the system into better performance by doing shortcuts. For example the HotSpot JIT compiler assumes that a method never throws an exception or that most methods are not overloaded and links directly to a specific callsite instead of traversing the class hierarchy each time to find the most specific method.

A just-in-time compiler for ERTS is called BEAMJIT. 

Summary

JVM provide tools to retrofit any concurrency model, but retrofitting anything won't e the same as taking it into the initial design.

“Erlang accidentally has the right properties to exploit multi-core architectures – not by design but by accident”--Joe Armstrong

Comments

Popular posts from this blog

A simple prototype to MOBA in Unity C#

Recently, I have been fast prototyping a new game project.  Here is the approach. 1. FontEnd--Hotfix Support Hotfix is essential, only in China, for better customer experience and more buggy apps. Lua vs ILRuntime(C#) Both of the hotfix approaches using interpretor to interpret files(Lua, IL Dll). They share roughly same performance. As Unity supports Code in C#, it makes ILRuntime the perfect approach for hotfix using ILRuntime as front end programmers only need to focus on one language C#. AOP Hotifx With well structured design of front end, you can make Unity runs all compiled C# code without ILRuntime initially, and only trigger ILRuntime after hotfix happen, as these C# files are identical. 2. Backend--Network ORUDP(Ordered Reliable UDP) is used in MOBA and Lidgren is the great open source networking solutions on UDP.  Lidgren could integrate into Unity with little effect with all the github samples around. 3. Backend--Property System Property systems are us...

Fast Game Development in C#(Unity, ET, MongoDB)

This is the continue of the previous blog fast prototyping of a new game project. With the ET game framework(refer to https://github.com/egametang/Egametang), and lack of server support, I decide to continue my attempt to develop our a game all in C#. The progress become very fast, due to following factors. 1. A fast UI making approach. UI has been one of most time consuming work during game development. Managing buttons/labels is extremely tedious when there are lots of UI pages. There is a simple way to handle UI. Take NGUI for example, we assemble the UI widget in a prefab, and drag a list of widgets(example, buttons, labels) in a script, click a button to auto-generate all the boilerplate code. The auto-generated code could be just two files, one view script one control script. For example, Login UI panel, three scripts would be generated, Login_C, Login_V, Login_M. In Login_V, all the widget assignment related code is generated, in Login_C, the value of widget is as...