Java 25's new CPU-Time Profiler (1)

Java 25's New CPU-Time Profiler (Part 1) Johannes Bechberger introduces the new CPU-time profiler merged into OpenJDK 25 after over three years of development. This experimental method sampler offers clearer insights into CPU consumption than the current execution-time based profiler. --- Current Profiling Strategy in JFR The existing default profiler samples 5 Java threads plus 1 native thread at fixed intervals (e.g., every 10-20 ms). It iterates over all threads, skipping those not running Java code. This results in subsampling dependent on system parallelism: On a 32-core system, only ~19% of threads are sampled effectively, widening the effective interval to ~53 ms. The current method prioritizes Java threads over native ones, potentially skewing results. It's an execution-time profiler, measuring elapsed time rather than CPU cycles consumed. Problems With Execution-Time Profiling Execution time conflates CPU usage and waiting time (e.g., for I/O). Examples: A method sorting arrays spends CPU cycles fully; execution time aligns well with CPU time. A method waiting on network I/O spends most time idle, inflating execution time without CPU usage. Execution-time profiles can't distinguish CPU-heavy methods from I/O-bound ones, leading to misleading optimization targets. The JFR execution sampling drops failed samples (up to 33% loss), complicating interpretation. --- CPU-Time Profiling: The New Approach Samples threads based on CPU time consumed, not wall-clock time. Each thread is sampled every n milliseconds of CPU time, ensuring proportional representation without subsampling. Uses Linux kernel CPU timers (introduced in Linux 2.6.12) to fire signals at fixed CPU time intervals. Sampling every thread per CPU time interval yields accurate, stable CPU utilization profiles. Unlike third-party tools (e.g., async-profiler), this new JDK profiler safely integrates into the JVM. New event type introduced: jdk.CPUTimeSample (disabled by default), separate from jdk.ExecutionSample. Enables simultaneous recording of execution-time and CPU-time samples. Demonstration Example An HTTP requests program simulates two threads: one making 10 fast requests (10ms each), the other one slow request (100ms). Execution-time profiling shows similar time spent on both methods due to response latency. CPU-time profiling reveals the 10-request method consumes significantly more CPU cycles; optimization focus shifts accordingly. New profiler reports failed and lost samples, improving profile completeness. --- Challenges and Limitations Platform Support: Currently Linux-only, limiting use on developer Windows/Mac environments. Late inclusion in JDK 25, missing announcement videos and presentations. Follow-up issues exist: Spinlock synchronization in native code (fixed in July). Memory ordering clarifications in CPU-time sampler. Interval recomputation correctness when hardware threads change (fixed in July). Ongoing testing and bug reports encouraged. --- Key Features of the New jdk.CPUTimeSample Event Fields: stackTrace (nullable) eventThread failed (boolean): stack walk failure indicator. samplingPeriod: actual sampling interval computed by signal handler. biased (boolean): indicates safepoint-biased sampling. Uses bounded queues; lost samples counted in jdk.CPUTimeSampleLoss event. Provides detailed, precise CPU-time profiling data for both Java and native methods uniformly. --- Configuration & Usage Controlled by a throttle property: Can be a fixed CPU-time interval (e.g., 10ms per thread). Or an event rate limit (e.g., 500 events/sec), which adjusts sampling interval dynamically based on hardware threads. Example enabling CPU-time samples at launch: -