Java GC Tutorials - Herong's Tutorial Examples - v1.11, by Dr. Herong Yang
GCPerfP99V2.java - Work Load Level
This section provides tutorials to find the work load level for the GCPerfP99V2.java program to the impact of execution environment interruptions to a very level. Work load of 1000 gives me a coefficient variation of 1.22% on my computer used in 2019.
A major difficulty of measuring an application performance are execution interruptions from the execution environment. In my case, the JVM may interrupt the application thread to run the GC thread or other threads and the operating system may interrupt the JVM to run other processes.
In this tutorial let's see if I can increase the work load to reduce the impact of environment interruptions to a very level.
Test 1 - Running GCPerfP99V2.java with the default work load using Serial GC.
herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \ GCPerfP99V2 1024 32 32 0 10 1 100 Parameters: Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=1 [0.063s] Run start 1 [0.094s] Run end 1: 31ms [0.109s] Run start 2 [0.125s] Run end 2: 31ms [0.125s] Run start 3 GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 24.365ms [0.156s] Run end 3: 31ms -- Run 3 took 7ms = 31ms - 24ms [0.156s] Run start 4 [0.172s] Run end 4: 16ms [0.172s] Run start 5 [0.188s] Run end 5: 16ms [0.188s] Run start 6 GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 27.026ms [0.219s] Run end 6: 31ms -- Run 3 took 4ms = 31ms - 27ms [0.219s] Run start 7 [0.234s] Run end 7: 15ms [0.234s] Run start 8 [0.25s] Run end 8: 16ms [0.25s] Run start 9 GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 26.779ms [0.281s] Run end 9: 31ms [0.281s] Run start 10 [0.297s] Run end 10: 16ms Statistics of percentile runs: Total number of runs = 10 Percentile % = 100 Execution time = 234 ms Average time per run = 23 ms Standard deviation = 7.615773105863909 ms Objects processed = 320 Throughput = 1367 objects/second Latency = 968 ms/1000 objects Throughput (max, ave, min) = (2133, 1367, 1032) Latency (min, ave, max) = (468, 731, 968) Worst runs dropped:
As you can see, normal runs with no GC pauses took 15ms to 31ms. This tells me that the impact of execution environment is too high comparing to the application execution time. This is also reflected by the coefficient of variation (the ratio of the standard deviation over the average): 8ms/23ms = 24.78%.
So we need to increase the work load to reduce the impact of OS interruptions.
Test 2 - Running GCPerfP99V2.java with 10 times of the default work load, which could improve the coefficient of variation.
herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \ GCPerfP99V2 1024 32 32 0 10 10 100 Parameters: Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=10 [0.094s] Run start 1 [0.109s] Run end 1: 15ms [0.125s] Run start 2 [0.156s] Run end 2: 47ms [0.156s] Run start 3 GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 21.724ms [0.187s] Run end 3: 31ms -- Run 3 took 9ms = 31ms - 22ms [0.187s] Run start 4 [0.203s] Run end 4: 16ms [0.203s] Run start 5 [0.219s] Run end 5: 16ms [0.234s] Run start 6 GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 28.368ms [0.266s] Run end 6: 47ms -- Run 6 took 19ms = 47ms - 28ms [0.266s] Run start 7 [0.281s] Run end 7: 15ms [0.281s] Run start 8 [0.312s] Run end 8: 31ms [0.312s] Run start 9 GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 28.963ms [0.359s] Run end 9: 47ms [0.359s] Run start 10 [0.375s] Run end 10: 16ms Statistics of percentile runs: Total number of runs = 10 Percentile % = 100 Execution time = 281 ms Average time per run = 28 ms Standard deviation = 13.67845020461017 ms Objects processed = 320 Throughput = 1138 objects/second Latency = 1470 ms/1000 objects Throughput (max, ave, min) = (2133, 1138, 680) Latency (min, ave, max) = (468, 878, 1470) Worst runs dropped:
Work load of 10 has no obvious differences from work load 1. In fact, coefficient of variation increased to 14ms/28ms = 50.00%.
So I need to increase the work load to a higher value.
Test 3 - Running GCPerfP99V2.java with 100 times of the default work load:
herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \ GCPerfP99V2 1024 32 32 0 10 100 100 Parameters: Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=100 [0.152s] Run start 1 [0.259s] Run end 1: 107ms [0.267s] Run start 2 [0.371s] Run end 2: 112ms [0.372s] Run start 3 GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 25.388ms [0.488s] Run end 3: 117ms [0.488s] Run start 4 [0.58s] Run end 4: 92ms [0.58s] Run start 5 [0.672s] Run end 5: 92ms [0.672s] Run start 6 GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 25.515ms [0.788s] Run end 6: 116ms -- Run 6 took 90ms = 116ms - 26ms [0.788s] Run start 7 [0.88s] Run end 7: 92ms [0.88s] Run start 8 [0.972s] Run end 8: 92ms [0.972s] Run start 9 GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 26.074ms [1.095s] Run end 9: 123ms -- Run 9 took 97ms = 123ms - 26ms [1.095s] Run start 10 [1.183s] Run end 10: 88ms Statistics of percentile runs: Total number of runs = 10 Percentile % = 100 Execution time = 1031 ms Average time per run = 103 ms Standard deviation = 12.533953885346794 ms Objects processed = 320 Throughput = 310 objects/second Latency = 3846 ms/1000 objects Throughput (max, ave, min) = (363, 310, 260) Latency (min, ave, max) = (2754, 3225, 3846) Worst runs dropped:
Ok, work load of 100 gives me a coefficient of variation of 12.62%.
Test 4 - Running GCPerfP99V2.java with 1000 times of the default work load:
herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \ GCPerfP99V2 1024 32 32 0 10 1000 100 Parameters: Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=1000 [1.197s] Run start 1 [2.352s] Run end 1: 1155ms [2.361s] Run start 2 [3.501s] Run end 2: 1149ms [3.502s] Run start 3 GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 18.735ms [4.665s] Run end 3: 1164ms [4.666s] Run start 4 [5.829s] Run end 4: 1164ms [5.829s] Run start 5 [6.979s] Run end 5: 1150ms [6.98s] Run start 6 GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 22.835ms [8.14s] Run end 6: 1161ms [8.14s] Run start 7 [9.268s] Run end 7: 1128ms [9.269s] Run start 8 [10.4s] Run end 8: 1132ms [10.4s] Run start 9 GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 23.174ms [11.554s] Run end 9: 1154ms [11.555s] Run start 10 [12.682s] Run end 10: 1128ms Statistics of percentile runs: Total number of runs = 10 Percentile % = 100 Execution time = 11485 ms Average time per run = 1148 ms Standard deviation = 13.516656391282572 ms Objects processed = 320 Throughput = 27 objects/second Latency = 37037 ms/1000 objects Throughput (max, ave, min) = (28, 27, 27) Latency (min, ave, max) = (35714, 37037, 37037) Worst runs dropped:
Perfect, work load of 1000 gives me a coefficient of variation of 1.22%. Now I know what level of work load is needed remove the impact of operating system interrupts from our performance tests.
Table of Contents
Heap Memory Area and Size Control
JVM Garbage Collection Logging
Introduction of Garbage Collectors
Serial Collector - "+XX:+UseSerialGC"
Parallel Collector - "+XX:+UseParallelGC"
Concurrent Mark-Sweep (CMS) Collector - "+XX:+UseConcMarkSweepGC"
Garbage First (G1) Collector - "+XX:+UseG1GC"
The Z Garbage Collector (ZGC) - "+XX:+UseZGC"
Object References and Garbage Collection
►Garbage Collection Performance Test Program
GCPerformance.java - GC Performance Test Program
GCPerformance.java - Program Output
Performance Impact of Wait Time
Performance Impact of Object Size
Performance Impact of Chunk Size
Performance Jumps Not Related to GC
Performance Test and System Interruptions
"START /REALTIME" - Run JVM with Highest Priority
GCPerfP99.java - 99th Percentile Performance
GCPerfP99.java - Output Verification
GCPerfP99V2.java - Percentile Performance with Load
►GCPerfP99V2.java - Work Load Level
GCPerfP99V2.java - Object Number and Size
Performance Tests on Serial Collector
Performance Tests on Parallel collector
Performance Tests on Concurrent collector
Performance Tests on G1 collector