Java - Garbage Collection

Table of Contents


In software and information technology, garbage collection, or GC for short, refers to automatic memory management that helps to avoid memory problems; the advantage is bought at the expense of increased resource consumption. Among other things, the memory requirements of a computer program are minimized. At runtime, the system tries to automatically identify memory areas that are no longer needed and then releases them. Some automatic garbage collections also merge the memory areas that are still in use.

Garbage collection relieves the programmer from performing manual memory management where the programmer specifies what objects to deallocate and return to the memory system and when to do so. Other similar techniques include stack allocation, region inference, memory ownership, and combinations of multiple techniques. Garbage collection may take a significant proportion of total processing time in a program and, as a result, can have significant influence on performance.

In Java, as the developer does not explicitly remove the memory in the program code, the garbage collector finds the unnecessary (garbage) objects and removes them. This garbage collector was created based on the following two hypotheses:

  • Most objects soon become unreachable.
  • References from old objects to young objects only exist in small numbers.

Whenever you run a java program, JVM creates three threads.

  1. main thread
  2. Thread Scheduler
  3. Garbage Collector Thread.

In these three threads, main thread is a user thread and remaining two are daemon threads which run in background.

The task of main thread is to execute the main() method. The task of thread scheduler is to schedule the threads. The task of garbage collector thread is to sweep out abandoned objects from the heap memory.

These hypotheses are called the weak generational hypothesis. So in order to preserve the strengths of this hypothesis, it is physically divided into young generation and old generation in HotSpot VM. The Heap is divided into young and old generations as follows :

  • Young Generation: It is place where lived for short period. When objects disappear from this area, we say a "minor GC" has occurred. It is divided into two spaces:
    • Eden Space: When object created using new keyword memory allocated on this space.
    • Survivor Space: This is the pool which contains objects which have survived after java garbage collection from Eden space.
  • Old Generation: This pool basically contains tenured and virtual (reserved) space and will be holding those objects which survived after garbage collection from Young Generation. When objects disappear from the old generation, we say a "major GC" (or a "full GC") has occurred.
    • Tenured Space: This memory pool contains objects which survived after multiple garbage collection means object which survived after garbage collection from Survivor space.
  • Permanent Generation: This memory pool as name also says contain permanent class metadata and descriptors information so PermGen space always reserved for classes and those that is tied to the classes for example static members.

GC for the Young Generation

The young generation is divided into 3 spaces.

  • One Eden space
  • Two Survivor spaces

There are 3 spaces in total, two of which are Survivor spaces. The order of execution process of each space is as below:

  • The majority of newly created objects are located in the Eden space.
  • After one GC in the Eden space, the surviving objects are moved to one of the Survivor spaces.
  • After a GC in the Eden space, the objects are piled up into the Survivor space, where other surviving objects already exist.
  • Once a Survivor space is full, surviving objects are moved to the other Survivor space. Then, the Survivor space that is full will be changed to a state where there is no data at all.
  • The objects that survived these steps that have been repeated a number of times are moved to the old generation.

As you can see by checking these steps, one of the Survivor spaces must remain empty. If data exists in both Survivor spaces, or the usage is 0 for both spaces, then take that as a sign that something is wrong with your system.

GC Performance Tuning

One of the most important contributors to optimal application performance is the Java Garbage collection process. In order to collect Garbage efficiently, the heap is divided essentially into two sub-areas:

  • Young generation (nursery space)
  • Old generation (tenured space)

The right garbage collector is a determining factor for optimum application performance , scalability and reliability. The GC algorithms are listed in the following list:

  • Serial Collector
    • Has the smallest footprint of any collector
    • Runs with a footprint that requires a small number of data structures
    • Uses a single thread for minor and major collections
  • Parallel Collector
    • Stops all app threads and executes garbage collection
    • Best suited for apps that run on multicore systems and need better throughput
  • Concurrent Mark-Sweep Collector
    • Has less throughput, but smaller pauses, than the parallel collector
    • Best suited for all general Java applications
  • Garbage-First (G1) Collector
    • Is an improvement from the CMS collector
    • Uses entire heap, divides it into multiple regions

Garbage Collection Tuning

GC tuning is not always required for all Java-based services and applications. This means that a Java-based system has these options and actions in operation:

  • The memory size is specified with the options -Xms and -Xmx.
  • The option -server is included.
  • Logs like Timeout Log will not be left in the system.

In other words, you need GC tuning on your system if you haven't set the memory size and if you have printed too many timeout logs.

But one thing needs to be remembered: GC tuning is the last task.

Think of GC tuning's fundamental cause. The garbage collector clears a Java object. The number of objects to be removed and the number of GCs to be carried out by the garbage collector depends on the number of objects to be created. Therefore, you should decrease the number of created objects to control the GC performed by your system first.

We have to use StringBuilder or StringBuffer and make this a way of life rather than a string.

And it's better to collect as few logs as possible.

We can classify the GC tuning purposes into two.

  • the number of items passed into the old area should be minimised
  • reduce the execution time for Full GC

The following table shows options related to memory size among the GC options that can affect performance.

Heap area size-XmsHeap area size when starting JVM
-XmxMaximum heap area size
New area size-XX:NewRatioRatio of New area and Old area
-XX:NewSizeNew area size
-XX:SurvivorRatioRatio of Eden area and Survivor area