What is the Garbage-First Collector?
Garbage-First (G1) Collector is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. G1 is the long term replacement of the Concurrent Mark-Sweep Collector (CMS). Whole-heap operations, such as global marking, are performed concurrently with the application threads, to prevent interruptions proportional to heap or live-data size. Concurrent marking provides both collection "completeness" and identifies regions ripe for reclamation via compacting evacuation. This evacuation is performed in parallel on multi-processors, to decrease pause times and increase throughput.
The G1 collector achieves these goals through several techniques. The heap is partitioned into a set of equal-sized heap regions, each a contiguous range of virtual memory. G1 compacts as it proceeds. It copies objects from one area of the heap to the other. Thus, it will not encounter fragmentation issues that CMS might encounter. There will always be areas of contiguous free space from which to allocate, allowing G1 to have consistent pauses over time. G1 uses a pause prediction model to meet a user-defined pause time target. It achieves smoother pause times than CMS at comparable throughput.
After G1 performs a global marking phase determining the liveness of objects throughout the heap, it will immediately know where in the heap there are regions that are mostly empty. It will tackle those regions first, potentially making a lot of space available. This way, the garbage collector will obtain more breathing space, decreasing the probability of a full GC. This is also why the garbage collector is called Garbage-First. As its name suggests, G1 concentrates its collection and compaction activity first on the areas of the heap that are likely to be full of reclaimable objects, thus improving its efficiency.
While most real-time collectors work at the highly granular level of individual objects, G1 collects at the region level. If any region contains no live objects it is immediately reclaimed. The user can specify a goal for the pauses and G1 will do an estimate of how many regions can be collected in that time based on previous collections. So, the collector has a reasonably accurate model of the cost of collecting the regions, and therefore "the collector can choose a set of regions that can be collected within a given pause time limit with high probability." In other words, G1 is not a hard real-time collector - it meets the soft real-time goal with high probability but not absolute certainty. G1 attempts to yield higher throughput in return for the softer, but still moderately stringent, real-time constraints. This is a good fit for large-scale server applications which often have large amounts of live heap data and considerable thread-level parallelism. G1 also provides some finer control, allowing a user to specify a fraction of time during a period of execution to be spent on garbage collection. For example, for every 250ms of execution spend no more than 50ms on garbage collection.
G1 is planned to replace CMS in the Hotspot JVM. There are two major differences between CMS and G1. The first is that G1 is a compacting collector. G1 compacts sufficiently to completely avoid the use of fine-grain free lists for allocation, which considerably simplifies parts of the collector and mostly eliminates potential fragmentation issues. As well as compacting, G1 offers more predictable garbage collection pauses than the CMS collector and allows users to set their desired pause targets.