GCPerfP99.java - 99th Percentile Performance

This section provides a GC test program, GCPerfP99.java, that uses 99th percentile performance measurements.

From previous tutorials, we learned that a long system interruption has a huge impact on latency and a small impact on throughput. This is because latency is defined based on the worst execution, while throughput is defined based on the average execution time.

One way to reduce the system interruption impact on latency is to define it as the 99th percentile (or P99) latency, which throws away 1% worst runs, then takes the latency of the rest 99% good runs.

P99 latency is a better measurement, because if system interruption happens less than 1% of the time, then P99 latency is actually 100% accurate.

I have created another GC test program, GCPerfP99.java that uses P99 latency measurement:

/* GCPerfP99.java
 * Copyright (c) 2018, HerongYang.com, All Rights Reserved.
 */
class GCPerfP99 {
   static MyList objList = null;
   static int objSize = 1024;  // in KB, default = 1 MB
   static int baseSize = 32;   // # of objects in the base
   static int chunkSize = 32;  // # of objects per run chunk
   static int warmup = 64;     // warmup loops: 64*32 = 2GB
   static int runs = 1000;     // number of runs
   public static void main(String[] arg) {
      if (arg.length>0) objSize = Integer.parseInt(arg[0]);
      if (arg.length>1) baseSize = Integer.parseInt(arg[1]);
      if (arg.length>2) chunkSize = Integer.parseInt(arg[2]);
      if (arg.length>3) warmup = Integer.parseInt(arg[3]);
      if (arg.length>4) runs = Integer.parseInt(arg[4]);
      System.out.println("Parameters:");
      System.out.println("   Size="+objSize+"KB"
         +", Base="+baseSize +", Chunk="+chunkSize
         +", Warmup="+warmup+", Runs="+runs);
      objList = new MyList();
      myTest();
   }
   public static void myTest() {
      for (int m=0; m<baseSize; m++) {
         objList.add(new MyObject());
      }
      for (int k=0; k<warmup; k++) {
         for (int m=0; m<chunkSize; m++) {
            objList.add(new MyObject());
         }
         for (int m=0; m<chunkSize; m++) {
            objList.removeTail();
         }
      }
      
      long[] times = new long[runs+1];
      times[0] = System.currentTimeMillis();
      for (int i=0; i<runs; i++) {
         for (int m=0; m<chunkSize; m++) {
            objList.add(new MyObject());
         }
         for (int m=0; m<chunkSize; m++) {
            objList.removeTail();
         }
         times[i+1] = System.currentTimeMillis();
      }

      long[] samples = new long[runs];
      for (int i=0; i<runs; i++) {
         samples[i] = times[i+1] - times[i];          // in millis     
      }
      java.util.Arrays.sort(samples);        // sorted low to high

      int p99 = (runs*99)/100;                  // 99th percentile
      long duration = 0;
      for (int i=0; i<p99; i++) {
         duration += samples[i]; 
      }
      long avePerf = (1000*p99*chunkSize)/duration;  // obj/second
      long maxPerf = (1000*chunkSize)/samples[0];
      long minPerf = (1000*chunkSize)/samples[p99-1];
      long latency = 1000000/minPerf;           // millis/1000 obj
      System.out.println("Results:");
      System.out.println("   Total execution time = "
         +(duration/1000)+" seconds");
      System.out.println("   Total objects processed = "
         +(runs*chunkSize));
      System.out.println("   Average time per run = "
         +(duration/p99)+" milliseconds");
      System.out.println("   Throughput = " 
         +avePerf+" objects/second");
      System.out.println("   Latency = "
         +latency+" milliseconds/1000 objects");
      System.out.println("   Throughput (max, ave, min) = ("
         +maxPerf+", "+avePerf+", "+minPerf+")");
      System.out.println("   Latency (min, ave, max) = ("
         +(1000000/maxPerf)+", "+(1000000/avePerf)+", "
         +(1000000/minPerf)+")");

      System.out.println("1% worst runs dropped:");
      for (int i=p99; i<runs; i++) {
         System.out.println("   Run, Time, Throughput = "
            +(i+1)+", "+samples[i]+", "+(1000*chunkSize)/samples[i]);
      }
      System.out.println("Press ENTER to end...");
      try { 
         System.in.read();
      } catch (Exception e) { 
      }
   }

   static class MyObject {
      private long[] obj = null;
      public MyObject next = null;
      public MyObject prev = null;
      public MyObject() {
         obj = new long[objSize*128];          // 128*8=1024 bytes
         for (int i=0; i<objSize*128; i++) {
            obj[i] = i/2+i/3+i/4+i/5;            // some work load
         }
      }
   }

   static class MyList {
      MyObject head = null;
      MyObject tail = null;
      void add(MyObject o) {
         if (head==null) {
            head = o;
            tail = o; 
         } else {
            o.prev = head;
            head.next = o;
            head = o;
         }
      }
      void removeTail() {
      	 if (tail!=null) {
      	    if (tail.next==null) {
      	       tail = null;
      	       head = null;
      	    } else {
      	       tail = tail.next; 
      	       tail.prev = null;
      	    }
      	 }
      }
   }
}

Changes made on the test program:

Last update: 2018.

Table of Contents

 About This Book

 Heap Memory Area and Size Control

 JVM Garbage Collection Logging

 Introduction of Garbage Collectors

 Serial Collector - "+XX:+UseSerialGC"

 Parallel Collector - "+XX:+UseParallelGC"

 Concurrent Mark-Sweep (CMS) Collector - "+XX:+UseConcMarkSweepGC"

 Garbage First (G1) Collector - "+XX:+UseG1GC"

 Object References and Garbage Collection

Garbage Collection Performance Test Program

 GCPerformance.java - GC Performance Test Program

 GCPerformance.java - Program Output

 Performance Impact of Wait Time

 Performance Impact of Chunk Size

 Performance Jumps Not Related to GC

 Performance Test and System Interruptions

 "START /REALTIME" - Run JVM with Highest Priority

GCPerfP99.java - 99th Percentile Performance

 GCPerfP99.java - Output Verification

 Performance Tests on Serial Collector

 Performance Tests on Parallel collector

 Performance Tests on Concurrent collector

 Performance Tests on G1 collector

 Garbage Collection Performance Test Summary

 Outdated Tutorials

 References

 Full Version in PDF/EPUB