GCPerfP99V2.java - Work Load Level

Java GC Tutorials - Herong's Tutorial Examples

∟Garbage Collection Performance Test Program

∟GCPerfP99V2.java - Work Load Level

This section provides tutorials to find the work load level for the GCPerfP99V2.java program to the impact of execution environment interruptions to a very level. Work load of 1000 gives me a coefficient variation of 1.22% on my computer used in 2019.

A major difficulty of measuring an application performance are execution interruptions from the execution environment. In my case, the JVM may interrupt the application thread to run the GC thread or other threads and the operating system may interrupt the JVM to run other processes.

In this tutorial let's see if I can increase the work load to reduce the impact of environment interruptions to a very level.

Test 1 - Running GCPerfP99V2.java with the default work load using Serial GC.

herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \
   GCPerfP99V2 1024 32 32 0 10 1 100

Parameters:
   Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=1
[0.063s] Run start 1
[0.094s] Run end 1: 31ms
[0.109s] Run start 2
[0.125s] Run end 2: 31ms
[0.125s] Run start 3
GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 24.365ms
[0.156s] Run end 3: 31ms
   -- Run 3 took 7ms = 31ms - 24ms

[0.156s] Run start 4
[0.172s] Run end 4: 16ms
[0.172s] Run start 5
[0.188s] Run end 5: 16ms
[0.188s] Run start 6
GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 27.026ms
[0.219s] Run end 6: 31ms
   -- Run 3 took 4ms = 31ms - 27ms

[0.219s] Run start 7
[0.234s] Run end 7: 15ms
[0.234s] Run start 8
[0.25s] Run end 8: 16ms
[0.25s] Run start 9
GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 26.779ms
[0.281s] Run end 9: 31ms
[0.281s] Run start 10
[0.297s] Run end 10: 16ms
Statistics of percentile runs:
   Total number of runs = 10
   Percentile % = 100
   Execution time = 234 ms
   Average time per run = 23 ms
   Standard deviation = 7.615773105863909 ms
   Objects processed = 320
   Throughput = 1367 objects/second
   Latency = 968 ms/1000 objects
   Throughput (max, ave, min) = (2133, 1367, 1032)
   Latency (min, ave, max) = (468, 731, 968)
Worst runs dropped:

As you can see, normal runs with no GC pauses took 15ms to 31ms. This tells me that the impact of execution environment is too high comparing to the application execution time. This is also reflected by the coefficient of variation (the ratio of the standard deviation over the average): 8ms/23ms = 24.78%.

So we need to increase the work load to reduce the impact of OS interruptions.

Test 2 - Running GCPerfP99V2.java with 10 times of the default work load, which could improve the coefficient of variation.

herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \
   GCPerfP99V2 1024 32 32 0 10 10 100

Parameters:
   Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=10
[0.094s] Run start 1
[0.109s] Run end 1: 15ms
[0.125s] Run start 2
[0.156s] Run end 2: 47ms
[0.156s] Run start 3
GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 21.724ms
[0.187s] Run end 3: 31ms
   -- Run 3 took 9ms = 31ms - 22ms

[0.187s] Run start 4
[0.203s] Run end 4: 16ms
[0.203s] Run start 5
[0.219s] Run end 5: 16ms
[0.234s] Run start 6
GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 28.368ms
[0.266s] Run end 6: 47ms
   -- Run 6 took 19ms = 47ms - 28ms

[0.266s] Run start 7
[0.281s] Run end 7: 15ms
[0.281s] Run start 8
[0.312s] Run end 8: 31ms
[0.312s] Run start 9
GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 28.963ms
[0.359s] Run end 9: 47ms
[0.359s] Run start 10
[0.375s] Run end 10: 16ms
Statistics of percentile runs:
   Total number of runs = 10
   Percentile % = 100
   Execution time = 281 ms
   Average time per run = 28 ms
   Standard deviation = 13.67845020461017 ms
   Objects processed = 320
   Throughput = 1138 objects/second
   Latency = 1470 ms/1000 objects
   Throughput (max, ave, min) = (2133, 1138, 680)
   Latency (min, ave, max) = (468, 878, 1470)
Worst runs dropped:

Work load of 10 has no obvious differences from work load 1. In fact, coefficient of variation increased to 14ms/28ms = 50.00%.

So I need to increase the work load to a higher value.

Test 3 - Running GCPerfP99V2.java with 100 times of the default work load:

herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \
   GCPerfP99V2 1024 32 32 0 10 100 100

Parameters:
   Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=100
[0.152s] Run start 1
[0.259s] Run end 1: 107ms
[0.267s] Run start 2
[0.371s] Run end 2: 112ms
[0.372s] Run start 3
GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 25.388ms
[0.488s] Run end 3: 117ms

[0.488s] Run start 4
[0.58s] Run end 4: 92ms
[0.58s] Run start 5
[0.672s] Run end 5: 92ms
[0.672s] Run start 6
GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 25.515ms
[0.788s] Run end 6: 116ms
   -- Run 6 took 90ms = 116ms - 26ms

[0.788s] Run start 7
[0.88s] Run end 7: 92ms
[0.88s] Run start 8
[0.972s] Run end 8: 92ms
[0.972s] Run start 9
GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 26.074ms
[1.095s] Run end 9: 123ms
   -- Run 9 took 97ms = 123ms - 26ms

[1.095s] Run start 10
[1.183s] Run end 10: 88ms
Statistics of percentile runs:
   Total number of runs = 10
   Percentile % = 100
   Execution time = 1031 ms
   Average time per run = 103 ms
   Standard deviation = 12.533953885346794 ms
   Objects processed = 320
   Throughput = 310 objects/second
   Latency = 3846 ms/1000 objects
   Throughput (max, ave, min) = (363, 310, 260)
   Latency (min, ave, max) = (2754, 3225, 3846)
Worst runs dropped:

Ok, work load of 100 gives me a coefficient of variation of 12.62%.

Test 4 - Running GCPerfP99V2.java with 1000 times of the default work load:

herong> java -XX:+UseSerialGC -Xms400m -Xmx400m -Xlog:gc \
   GCPerfP99V2 1024 32 32 0 10 1000 100

Parameters:
   Size=1024KB, Base=32, Chunk=32, Warmup=0, Runs=10, Loads=1000
[1.197s] Run start 1
[2.352s] Run end 1: 1155ms
[2.361s] Run start 2
[3.501s] Run end 2: 1149ms
[3.502s] Run start 3
GC(0) Pause Young (Allocation Failure) 106M->38M(386M) 18.735ms
[4.665s] Run end 3: 1164ms
[4.666s] Run start 4
[5.829s] Run end 4: 1164ms
[5.829s] Run start 5
[6.979s] Run end 5: 1150ms
[6.98s] Run start 6
GC(1) Pause Young (Allocation Failure) 144M->74M(386M) 22.835ms
[8.14s] Run end 6: 1161ms
[8.14s] Run start 7
[9.268s] Run end 7: 1128ms
[9.269s] Run start 8
[10.4s] Run end 8: 1132ms
[10.4s] Run start 9
GC(2) Pause Young (Allocation Failure) 180M->119M(386M) 23.174ms
[11.554s] Run end 9: 1154ms
[11.555s] Run start 10
[12.682s] Run end 10: 1128ms
Statistics of percentile runs:
   Total number of runs = 10
   Percentile % = 100
   Execution time = 11485 ms
   Average time per run = 1148 ms
   Standard deviation = 13.516656391282572 ms
   Objects processed = 320
   Throughput = 27 objects/second
   Latency = 37037 ms/1000 objects
   Throughput (max, ave, min) = (28, 27, 27)
   Latency (min, ave, max) = (35714, 37037, 37037)
Worst runs dropped:

Perfect, work load of 1000 gives me a coefficient of variation of 1.22%. Now I know what level of work load is needed remove the impact of operating system interrupts from our performance tests.