JVM performance tuning
Objective: Java applications can run into performance issues if garbage collection is not tuned properly. You may have experienced that your JVM is not responding, its taking a lot of time for garbage collection or there are out of memory errors in your logs. If you are facing the same problems, then you would need to know on how to analyse heap dumps and on how to dive into deep heap dumps, thread dumps and much more.
Basics: Let understand some basics about JVM and simple techniques in its performance tuning.
Firstly, its important to understand the structure of Java Heap Memory. Java heap is always divided into three major regions. The three regions are young, survivor and permanent generation.
You can analyse java heap by using tools like console or visual vm. Before your analysis you would need to understand heap regions and their purpose :-
1) Heap Memory Pool "Par Eden Space"
2) Heap Memory Pool "Par Survivor Space"
3) Heap Memory Pool "CMS Old Gen"
4) Non Heap Memory Pool "Metaspace"
5) Non Heap Memory Pool "Code Cache"
6) Non Heap Memory Pool "Compressed Class Cache"
Lets look at each heap/non heap memory division one by one.
1) Par Eden Space : Most new java objects are created in Par Eden Space. Once this heap area is exhausted, JVM initiates a stop the world garbage collection (GC) in the Eden space. At this point, based on the GC algorithm, jvm will clean all unreachable objects and will recover memory in young gen. But there would be reachable objects and these objects will be moved over to Survivor Space.
2) Par Survivor Space : As reachable objects are moved to Survivor space, at some point this survivor space would also get filled. When this space gets filled, and in some case even where this space is not full, GC would run in this region, cleaning up memory for unreachable objects and reachable objects will be moved to CMS Old Gen on Heap. This is another stop the world GC.
3) CMS Old Gen : Any objects surviving the GC in par Survivor space will be move to the old gen. This is the heap area meant for permanent objects. Ideally a GC should never run in this region. So, you need to decide on the size of this region on the basis of the memory occupied by your permanent objects. And also the size of Par Eden and Survivor space should be such that temporary objects do not end up in this region.
4) Metaspace : From JDK 8 hot spot, this new memory space region was created. This region holds memory for your live class metadata and classloaders. JVM argument MaxMetaspaceSize will limit this memory space. If this memory space is filled upto the max specified, then a GC would run in this region. By default there is no such max size and this region can grow upto the maximum native memory available to the operating system.
5) Code Cache :  All native codes and compiled methods lie here. Size of this memory space can be set by JVM argument ReservedCodeCacheSize. Typically its value is 48M.
6) Compressed Class Cache : Class Cache is another memory region. When you enable UseCompressedOops and/or UseCompressedClassPointers via JVM arguments, this still memory space starts filling up. It stores metaspace data but in a different way.
Now that we have a basic idea on the memory spaces in JVM, we can look at the performance tuning basics one by one. They are as follows:-
1) Young generation size and its ratio to Perm Gen : In an ideal scenario, when a young gen gc runs, then it recovers memory for all objects and a very few will be moved to survivor space. And once survivor space is full, a GC on survivor space should recover all temporary objects and only permanent objects should be moved to Perm gen. And If temporary objects are getting move to Perm Gen then, you may not be having a properly tuned GC. You would like to increase the size of young ten memory space. This is because in your case GC in Young and Survivor space is happening too quickly and as a result temporary objects are getting moved to Perm gen. A typical young gen to perm gen ratio is 1: 4. And to tune you to simulate appropriate load for your application. On the other hand, young gen space should not be very large as there will be longer pauses for GC on the same.
3) Tune Your Processes which take up most memory : Identifying the process which is causes memory spikes is the most critical. You may need to run processes in isolation to do the same. Use jmap command or use a tool like Visual VM to create heap dumps. It would generate the heap dump in hprof format. Analyse it in Eclipse MAT to get deep heap dumps. This will help you evaluate on the objects occupying the most memory. This is not possible when analysing upon shallow stats on heap usage. 
Also, Oracle's flight recorder is a very useful profiler.
More to follow soon...........
Questions :-
1) How can i monitor the memory performance of my JVM ?
Just tap into the the mbean java.lang.Memory.HeapMemoryUsage. You can monitor all performance metrics for JVM by tapping into jvm mbeans. Also, do consider pushing your own mbeans to make sure that the monitoring is robust.
2) Give me one major cause of high GC overheads in java applications ?
Object serialisation and deserialisation is one of the major causes of gc overhead. Java serialization and deserialization, by default has a performance overhead. A custom serializer like kyro serializer is recommended.
3) What is the size of a heap dump file?
A typical heap dump file size is around 1.5x of your heap size of JVM after full garbage collection.
4) Which tool should be used to analyze heap dumps?
Eclipse MAT is a very good option to analyse JVM heap dumps and generate deep heap profiles. Oracle's flight recorder is another option.
5) Does ObjectMapper in jackson create a lot of GC overhead ?
Yes ObjectMapper's convert method should be used as minimum as possible. Do not convert your stream of bytes to String and then to your POJO. The stream of bytes should be de-serialized directly to your POJO.
6) How do I benchmark my code ?
Yes. For an important piece of code, we should benchmark in the unit testing phase itself. I have noted down the best benchmarking techniques by unit testing in the below link.
benchmark-your-code.html
Other Links:
sparkInsides.html
GC Tuning
6) Compressed Class Cache : Class Cache is another memory region. When you enable UseCompressedOops and/or UseCompressedClassPointers via JVM arguments, this still memory space starts filling up. It stores metaspace data but in a different way.
Now that we have a basic idea on the memory spaces in JVM, we can look at the performance tuning basics one by one. They are as follows:-
1) Young generation size and its ratio to Perm Gen : In an ideal scenario, when a young gen gc runs, then it recovers memory for all objects and a very few will be moved to survivor space. And once survivor space is full, a GC on survivor space should recover all temporary objects and only permanent objects should be moved to Perm gen. And If temporary objects are getting move to Perm Gen then, you may not be having a properly tuned GC. You would like to increase the size of young ten memory space. This is because in your case GC in Young and Survivor space is happening too quickly and as a result temporary objects are getting moved to Perm gen. A typical young gen to perm gen ratio is 1: 4. And to tune you to simulate appropriate load for your application. On the other hand, young gen space should not be very large as there will be longer pauses for GC on the same.
2) Choose GC Algorithm : UseG1GC when using jdk8 and having heap size greater than 4 GB. Also, use string de-duplication along with G1GC to ensure more memory is cleanup by G1. Strings and character array take up the most memory on most JVM heaps. G1 would detect duplicate strings and make them point to the same internal character array. This G1 collector divides heap into smaller region and identifies regions with most garbage collectible objects. The background threads of garbage collector (Garbage first) G1 , will scan the most garbage collectible region first, thereby reducing the overall GC time. This GC compacts the objects and has other benefits. This is the preferred GC also in most scenarios. Concurrent Mark Sweep and ParNew are other widely used GC algorithms.
Also, Oracle's flight recorder is a very useful profiler.
More to follow soon...........
Questions :-
1) How can i monitor the memory performance of my JVM ?
Just tap into the the mbean java.lang.Memory.HeapMemoryUsage. You can monitor all performance metrics for JVM by tapping into jvm mbeans. Also, do consider pushing your own mbeans to make sure that the monitoring is robust.
2) Give me one major cause of high GC overheads in java applications ?
Object serialisation and deserialisation is one of the major causes of gc overhead. Java serialization and deserialization, by default has a performance overhead. A custom serializer like kyro serializer is recommended.
3) What is the size of a heap dump file?
A typical heap dump file size is around 1.5x of your heap size of JVM after full garbage collection.
4) Which tool should be used to analyze heap dumps?
Eclipse MAT is a very good option to analyse JVM heap dumps and generate deep heap profiles. Oracle's flight recorder is another option.
5) Does ObjectMapper in jackson create a lot of GC overhead ?
Yes ObjectMapper's convert method should be used as minimum as possible. Do not convert your stream of bytes to String and then to your POJO. The stream of bytes should be de-serialized directly to your POJO.
6) How do I benchmark my code ?
Yes. For an important piece of code, we should benchmark in the unit testing phase itself. I have noted down the best benchmarking techniques by unit testing in the below link.
benchmark-your-code.html
Other Links:
sparkInsides.html
GC Tuning
