Life as a remote worker: Organization
Written on March 10, 2018 by Daniel Mitterdorfer
I work for more than two years at Elastic where I focus on benchmarking, performance analysis and performance tuning for Elasticsearch. In my daily work, I have to deal with many people: our users, various of our engineering teams, support, consulting, marketing, sales, product management and our developer relations team. As you might imagine, these are a lot of balls to juggle.
Structuring Work
Often, I get requests that I cannot immediately work on. Therefore, I need a way to organize the incoming stream of tasks. I use a variation of Personal Kanban that I have implemented in Trello.
You can see several columns (from left to right):
- Backlog: I will not work on these in the immediate future.
- Next: Ready for work. I expect to start working on them soon.
- Doing: I am actively working on these items at the moment.
- Blocked: I started working but I need to wait for some external event to happen (e.g. waiting for somebody else to finish a task)
- Done this week: This helps me to see progress.
Handling Incoming Tasks
I think the structure is pretty simple to grasp but another important part is how I interact with the board.
There are several streams from where I get new tasks:
- I get an idea for a new feature, an improvement, a blog post etc..
- A user has a question which requires research.
- Some of my colleagues approaches me on Slack with a request or a question.
- A new ticket is raised on Github (and I intend to work on it soon).
- A benchmark run reveals an interesting result and I need to investigate the cause.
Whenever this happens, I create a new Trello card. Depending on the urgency, the card either goes into the backlog, the "Next" column or I add it to "Doing" and start to work on it immediately. Thanks to my external memory I will never forget anything, even if I will not work on it for a long time.
Suppose, for example that I get an idea for a feature that I could demo for Rally (our macrobenchmarking tool for Elasticsearch):
As you can see from the screenshot, a card does not only consist of its title. Using comments helps me better organize my thoughts around that topic and document areas that I need to look into.
There are several more features in Trello that I use, e.g.:
- Due dates: are useful for tasks that have a hard deadline (like submitting proposals for conference talks).
- Checklists: help when a task has many more subtasks.
Progress Reporting
At the end of each week, we send a brief update to the rest of the team for which I have setup a notification. I then kick off a script that reads my board and provides a draft report within seconds. Before sending, I just have to prune the list so it includes only tasks that are relevant for the rest of the team. Finally, I archive all cards in the "Done this week" column. That whole process takes at most one minute per week. Another important aspect: Due to externalizing everything again, I don't need to spend any brain cycles on it.
Xprof, the forgotten HotSpot profiler
Written on August 17, 2017 by Daniel Mitterdorfer
Did you know that every HotSpot-based JVM includes a profiler? You can activate it on the command line with -Xprof
and it dumps its output to stdout.
The information about it is pretty sparse, so I decided to write a short blog post about it.
Getting Started
Let's dive right in with an example:
public class Fibonacci {
private static long fib(long n) {
if (n <= 0) {
throw new IllegalArgumentException("n must be a natural number");
} else if (n == 1 || n == 2) {
return 1;
} else {
return fib(n - 2) + fib(n - 1);
}
}
public static void main(String[] args) {
System.out.println(fib(50));
}
}
If you compile and run this code with java Fibonacci -Xprof
you'll get the following output after the program has finished:
Flat profile of 45.00 secs (3664 total ticks): main
Compiled + native Method
100.0% 3664 + 0 Fibonacci.fib
100.0% 3664 + 0 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Unknown: no last frame
This output reveals already a few things about the profiler:
- Why is
main()
not showing up in the profile? The reason is thatXprof
is a "flat" profiler. We are normally used to hierarchical profilers that show complete stack traces. However, this profiler will only show the top-most method to avoid the overhead of stack-walking. As this example is only very simple, this is not immediately clear, but we'll see this in a more complex example later. - It is a sampling profiler. Samples are captured at each "tick", and a tick occurs every 10 milliseconds (see the creation of FlatProfilerTask and the definition of the delay interval in the JDK sources)
- It prints the captured information only when a thread exits (see
before_exit(JavaThread* thread)
in the JDK sources)
If you pay close attention, you'll see that the number of ticks does not match the reported time period. For 45 seconds, it should capture 4,500 samples, given a sampling frequency of 10 milliseconds. A quick experiment (just running Thread#sleep()
) shows that the sampling interval (on my machine) is actually 12 milliseconds.
Sleep duration [ms] | Total samples |
---|---|
5,000 | 421 |
10,000 | 830 |
20,000 | 1,665 |
40,000 | 3,328 |
Before we move on, let's see what additional information we can get from Xprof
.
We can see more information about JVM threads with -XX:+ProfileVM
:
Flat profile of 0.00 secs (3665 total ticks): VM Thread
Compiled + native Method
100.0% 0 + 3665 __psynch_cvwait
100.0% 0 + 3665 Total compiled
We can see the following with -XX:+ProfilerPrintByteCodeStatistics
(this will get more interesting with a complex example):
Bytecode ticks:
Global summary of 45.00 seconds:
100.0% 3665 Received ticks
0.0% 1 Unknown code
And finally we can get even low-level info about hot methods with -XX:+ProfilerRecordPC
:
Printing compiled methods with PC buckets having more than 15 ticks
===================================================================
[CodeBlob (0x000000010c4b7890)]
Framesize: 8
Could not load hsdis-amd64.dylib; library not loadable; PrintAssembly is disabled
Apparently we need the Java disasssembler library hsdis, which you need to build yourself (see the build instructions for your platform in the JITWatch wiki). With: -XX:+ProfilerRecordPC
and export LD_LIBRARY_PATH=/path/to/hsdis
we get indeed some assembly. If you prefer Intel assembly syntax instead of AT&T, just add -XX:+UnlockDiagnosticVMOptions -XX:PrintAssemblyOptions=intel
.
Here is a shortened version of the output:
Printing compiled methods with PC buckets having more than 15 ticks
===================================================================
[CodeBlob (0x000000010bf09650)]
Framesize: 8
Loaded disassembler from hsdis-amd64.dylib
Decoding compiled method 0x000000010bf09650:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000124b60b70} 'fib' '(J)J' in 'Fibonacci'
# parm0: rsi:rsi = long
# [sp+0x40] (sp of caller)
0x000000010bf097c0: mov DWORD PTR [rsp-0x14000],eax
0x000000010bf097c7: push rbp
0x000000010bf097c8: sub rsp,0x30
0x000000010bf097cc: mov r10,rsi
0x000000010bf097cf: test rsi,rsi
0x000000010bf097d2: jle 0x000000010bf0989e
0x000000010bf097d8: cmp rsi,0x1
0x000000010bf097dc: je 0x000000010bf097e4
0x000000010bf097de: cmp rsi,0x2
0x000000010bf097e2: jne 0x000000010bf09844
0x000000010bf097e4: mov eax,0x1
0x000000010bf097e9: jmp 0x000000010bf09838
0x000000010bf097eb: mov eax,0x1
0x000000010bf097f0: jmp 0x000000010bf09833
0x000000010bf097f2: mov r11d,0x1
0x000000010bf097f8: mov QWORD PTR [rsp+0x10],r11
0x000000010bf097fd: dec r10
0x000000010bf09800: test r10,r10
0x000000010bf09803: jle 0x000000010bf098f9
0x000000010bf09809: cmp r10,0x1
0x000000010bf0980d: je 0x000000010bf097eb
0x000000010bf0980f: cmp r10,0x2
0x000000010bf09813: je 0x000000010bf097eb
0x000000010bf09815: mov rbp,r10
0x000000010bf09818: data32 xchg ax,ax
0x000000010bf0981b: call 0x000000010bf097c0 ; OopMap{off=96}
;*invokestatic fib
; - Fibonacci::fib@37 (line 10)
; - Fibonacci::fib@43 (line 10)
; {static_call}
0x000000010bf09820: mov rbp,rax
0x000000010bf09823: mov rsi,QWORD PTR [rsp+0x8]
0x000000010bf09828: data32 xchg ax,ax
0x000000010bf0982b: call 0x000000010bf097c0 ; OopMap{off=112}
;*invokestatic fib
; - Fibonacci::fib@43 (line 10)
; - Fibonacci::fib@43 (line 10)
; {static_call}
36.1% [1321]
0x000000010bf09830: add rax,rbp
0x000000010bf09833: add rax,QWORD PTR [rsp+0x10]
0x000000010bf09838: add rsp,0x30
0x000000010bf0983c: pop rbp
0x000000010bf0983d: test DWORD PTR [rip+0xfffffffffded97bd],eax # 0x0000000109de3000
; {poll_return}
27.0% [989]
0x000000010bf09843: ret
0x000000010bf09844: mov r11,rsi
0x000000010bf09847: add r11,0xfffffffffffffffe
0x000000010bf0984b: mov QWORD PTR [rsp+0x8],r11
0x000000010bf09850: test r11,r11
0x000000010bf09853: jle 0x000000010bf098c9
0x000000010bf09855: add rsi,0xfffffffffffffffd
0x000000010bf09859: cmp r11,0x1
0x000000010bf0985d: je 0x000000010bf097f2
0x000000010bf0985f: cmp r11,0x2
0x000000010bf09863: je 0x000000010bf097f2
0x000000010bf09865: mov QWORD PTR [rsp],rsi
0x000000010bf09869: mov rbp,r10
0x000000010bf0986c: mov rsi,r10
0x000000010bf0986f: add rsi,0xfffffffffffffffc
0x000000010bf09873: call 0x000000010bf097c0 ; OopMap{off=184}
;*invokestatic fib
; - Fibonacci::fib@37 (line 10)
; - Fibonacci::fib@37 (line 10)
; {static_call}
0x000000010bf09878: mov QWORD PTR [rsp+0x10],rax
0x000000010bf0987d: mov rsi,QWORD PTR [rsp]
[...]
A more complex example
For this example I have started Elasticsearch 5.5.0 with -Xprof -XX:+ProfileVM -XX:+ProfilerPrintByteCodeStatistics
and ran a macrobenchmark that stresses indexing. Here is one example:
Flat profile of 573.60 secs (31156 total ticks): elasticsearch[vWpZ2JV][bulk][T#2]
Interpreted + native Method
0.0% 0 + 10 java.lang.Thread.yield
0.0% 9 + 0 org.apache.lucene.index.DefaultIndexingChain$PerField.invert
0.0% 0 + 6 sun.nio.ch.FileDispatcherImpl.force0
0.0% 5 + 0 org.apache.lucene.document.Field.tokenStream
0.0% 4 + 0 org.elasticsearch.index.shard.IndexShard.shouldFlush
0.0% 4 + 0 org.elasticsearch.common.util.concurrent.KeyedLock$ReleasableLock.<init>
0.0% 3 + 0 org.apache.lucene.util.BytesRef.<init>
0.0% 3 + 0 org.apache.lucene.index.DefaultIndexingChain.getOrAddField
0.0% 3 + 0 org.apache.lucene.index.DefaultIndexingChain.processDocument
0.0% 3 + 0 java.util.function.BinaryOperator$$Lambda$1634.<init>
0.0% 3 + 0 org.elasticsearch.action.support.replication.TransportWriteAction$WritePrimaryResult.respond
0.0% 0 + 2 sun.nio.ch.FileChannelImpl.unmap0
0.0% 2 + 0 org.elasticsearch.indices.IndexingMemoryController.getIndexBufferRAMBytesUsed
0.0% 2 + 0 org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingTerm.<init>
0.0% 2 + 0 org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>
0.0% 2 + 0 java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor
0.0% 2 + 0 org.elasticsearch.action.bulk.BulkResponse.toXContent
0.0% 2 + 0 org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put
0.0% 2 + 0 org.elasticsearch.index.mapper.FieldNamesFieldMapper.parseCreateField
0.0% 2 + 0 java.util.concurrent.ThreadPoolExecutor.runWorker
0.0% 2 + 0 org.elasticsearch.index.mapper.BaseGeoPointFieldMapper.parse
0.0% 2 + 0 org.apache.lucene.index.DefaultIndexingChain.writeDocValues
0.0% 0 + 1 sun.nio.fs.UnixNativeDispatcher.open0
0.0% 1 + 0 org.apache.lucene.util.packed.PackedInts$Format.longCount
0.0% 1 + 0 org.elasticsearch.index.mapper.DocumentParser.parsedDocument
0.4% 101 + 22 Total interpreted (including elided)
Compiled + native Method
6.0% 1695 + 1 org.apache.lucene.index.DefaultIndexingChain.processField
5.2% 1476 + 0 org.apache.lucene.index.TermsHashPerField.add
5.1% 1460 + 0 org.apache.lucene.util.BytesRefHash.add
4.6% 1308 + 0 org.apache.lucene.index.DefaultIndexingChain$PerField.invert
2.2% 624 + 0 org.apache.lucene.codecs.compressing.LZ4.compress
1.8% 525 + 0 org.apache.lucene.document.Field$StringTokenStream.incrementToken
1.7% 478 + 0 org.apache.lucene.index.DefaultIndexingChain.processDocument
1.3% 358 + 0 org.apache.lucene.util.BytesRefHash.findHash
1.2% 335 + 20 org.elasticsearch.index.mapper.FieldNamesFieldMapper.parseCreateField
1.1% 324 + 0 org.apache.lucene.util.MSBRadixSorter.radixSort
1.1% 309 + 0 org.apache.lucene.util.IntroSorter.quicksort
1.1% 302 + 3 org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm
1.0% 280 + 2 org.apache.lucene.util.BytesRefHash.rehash
1.0% 273 + 0 org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken
0.9% 242 + 0 org.apache.lucene.index.TermVectorsConsumerPerField.start
0.8% 216 + 0 org.elasticsearch.index.mapper.DocumentParser.internalParseDocument
0.8% 214 + 0 org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken
0.7% 202 + 0 org.apache.lucene.analysis.FilteringTokenFilter.incrementToken
0.7% 194 + 8 org.elasticsearch.index.mapper.FieldMapper.parse
0.7% 195 + 1 org.apache.lucene.document.Field.tokenStream
0.7% 161 + 32 org.apache.lucene.index.DocumentsWriter.updateDocument
0.6% 176 + 0 org.apache.lucene.index.DefaultIndexingChain.getOrAddField
0.6% 158 + 1 java.util.DualPivotQuicksort.sort
0.6% 156 + 1 org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField
0.5% 138 + 1 org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm
55.2% 15335 + 364 Total compiled (including elided)
Stub + native Method
20.8% 0 + 5906 java.lang.Object.notifyAll
9.3% 0 + 2643 sun.reflect.Reflection.getCallerClass
5.5% 0 + 1573 sun.reflect.Reflection.getCallerClass
1.6% 0 + 458 java.lang.Thread.yield
1.5% 0 + 418 sun.nio.ch.FileDispatcherImpl.write0
1.1% 0 + 308 java.lang.Thread.isAlive
1.1% 0 + 306 sun.nio.ch.NativeThread.current
0.8% 0 + 214 java.lang.String.intern
0.7% 0 + 213 sun.nio.ch.FileDispatcherImpl.force0
0.6% 0 + 167 sun.nio.fs.UnixNativeDispatcher.open0
0.5% 0 + 140 java.security.AccessController.getStackAccessControlContext
0.3% 24 + 66 java.security.AccessController.doPrivileged
0.2% 0 + 57 sun.nio.ch.FileDispatcherImpl.close0
0.1% 0 + 37 java.io.UnixFileSystem.canonicalize0
0.1% 0 + 27 sun.nio.ch.FileChannelImpl.unmap0
0.1% 0 + 16 sun.nio.fs.UnixNativeDispatcher.stat0
0.0% 0 + 11 sun.nio.fs.UnixNativeDispatcher.unlink0
0.0% 0 + 8 sun.nio.ch.FileDispatcherImpl.size0
0.0% 0 + 5 java.lang.Object.hashCode
0.0% 0 + 4 java.lang.System.arraycopy
0.0% 0 + 4 java.lang.Thread.isInterrupted
0.0% 0 + 4 sun.misc.Unsafe.copyMemory
0.0% 0 + 3 sun.nio.ch.FileChannelImpl.map0
0.0% 0 + 2 sun.nio.ch.KQueueArrayWrapper.interrupt
0.0% 0 + 2 java.lang.Object.clone
44.4% 24 + 12598 Total stub (including elided)
Thread-local ticks:
8.7% 2712 Blocked (of total)
Remember that this shows a flat profile. So every method printed here was at some point in time the top-most method on the stack. We do not see anything about the caller-callee relationship of these methods.
We can see that the profiler differentiates between interpreted, compiled and JVM stub samples. For an application in steady state, we'd expect the majority of samples in compiled code (which is the case here). Most samples also show Lucene, which makes sense considering that Lucene is responsible for indexing the data in the end.
When we terminate the application, we'll see a global summary. Also the bytecode ticks summary shows some statistics now:
Bytecode ticks:
47 0 = ldc
1 0 = ldc_w
1 0 = ifgt
3 0 = if_icmplt
46 0 = goto
2 0 = ireturn
2 0 = areturn
2 0 = return
8 0 = getstatic
1 0 = putstatic
3 0 = getfield
3 0 = putfield
5 0 = invokevirtual
2 0 = invokespecial
6 0 = invokestatic
2 0 = invokeinterface
4 0 = invokedynamic
29 1 = new
91 0 = newarray
106 0 = anewarray
2 0 = checkcast
2 0 = instanceof
2 0 = monitorenter
2 0 = monitorexit
In order to understand the output, you need to glance at the source code. The left column shows the number of ticks outside of stub code and the right column the number of ticks in stub code.
However, you may find the VM summary more useful as we can get a feeling how much time the JVM has spent for compilation, GC and other internal operations.
Global summary of 592.86 seconds:
100.0% 34399 Received ticks
4.6% 1571 Received GC ticks
21.1% 7253 Compilation
0.3% 86 Other VM operations
0.0% 1 Class loader
Flat profile of 0.00 secs (34300 total ticks): VM Thread
0.3% 99 Lost ticks
Compiled + native Method
99.2% 0 + 34034 __psynch_cvwait
0.2% 0 + 76 SafepointSynchronize::begin()
0.2% 0 + 60 PromotionInfo::promoted_oops_iterate_nv(ParScanWithoutBarrierClosure*)
0.0% 0 + 15 ThreadSafepointState::examine_state_of_thread()
0.0% 0 + 13 PromotionInfo::nextDisplacedHeader()
0.0% 0 + 11 SymbolTable::buckets_unlink(int, int, int*, int*, unsigned long*)
0.0% 0 + 9 __psynch_cvsignal
0.0% 0 + 8 RelocIterator::next()
0.0% 0 + 5 JNIHandleBlock::weak_oops_do(BoolObjectClosure*, OopClosure*)
0.0% 0 + 5 __psynch_mutexwait
0.0% 0 + 4 ObjectSynchronizer::deflate_idle_monitors()
0.0% 0 + 4 pthread_getspecific
0.0% 0 + 3 Monitor::IWait(Thread*, long)
0.0% 0 + 3 CodeHeap::find_start(void*) const
0.0% 0 + 3 nmethod::get_deopt_original_pc(frame const*)
0.0% 0 + 2 __vfprintf
0.0% 0 + 2 RelocIterator::advance_over_prefix()
0.0% 0 + 2 __mmap
0.0% 0 + 2 ThreadSafepointState::roll_forward(ThreadSafepointState::suspend_type)
0.0% 0 + 2 _platform_memmove$VARIANT$Haswell
0.0% 0 + 2 ReceiverTypeData::clean_weak_klass_links(BoolObjectClosure*)
0.0% 0 + 2 __mprotect
0.0% 0 + 2 _pthread_cond_wait
0.0% 0 + 2 TemplateInterpreter::notice_safepoints()
0.0% 0 + 2 OopMapSet::find_map_at_offset(int) const
100.0% 0 + 34300 Total compiled (including elided)
Summary
So when should you use -Xprof
? It has a couple of advantages:
- Due to its low overhead (it only captures a flat profile so there is no need to walk the stack) you can enable it in production
- It is already available on every (HotSpot) JVM
However, I think the lack of information about it on the Internet is telling. In my opinion it has serious disadvantages:
- You can neither activate it on an already running JVM, nor can you force that captured samples are dumped.
- There is no flag that allows you to write the output to a dedicated file.
- It only provides a very limited view: output is dumped per-thread and the profile is only flat, so you'll have no clue which code paths called a hot method.
I'd always prefer a "proper" profiler like Java Flight recorder, Honest Profiler or Yourkit during development. If you need to quickly analyze a problem in production, periodically taking thread dumps is usually helpful (either manually or automated, e.g. with stcap). An alternative to the latter may be -Xprof
, especially if you are in an extremely limited environment and want to quickly get at least some information out of your application.
Simplicity
Written on February 06, 2017 by Daniel Mitterdorfer
I admit it: I use noscript, I block trackers like Google Analytics and all sorts of ads. If you browse the web this way, you see strange things: in the best case layouts are completely broken but in some cases pages even do not show up. And no, I am not talking about interactive pages like Google Maps. I am talking about pages that are content heavy. To me these people are just downright sloppy.
As a counter-statement I have eliminated most waste on my own page now ("the one who is without sin is the one who should cast the first stone"). From today on, this page does NOT use:
- Javascript
- Cookies
- Page tracking
- Categories
- Comments
What is left? The essentials: HTML, CSS and images. There are some old presentations that I made with reveal.js but the rest of this page will not see Javascript any time soon.
There goes my low latency: Analyzing hiccups with jHiccup, Elasticsearch and Kibana
Written on April 18, 2016 by Daniel Mitterdorfer
Image by USMC Archives ; license: CC
The other day I had to analyze search latency spikes in Elasticsearch for a customer. When latency spikes are involved, one of the first things I want to know is the "noiseness" of the host where the problem occurs.
A great tool that helps with this analysis is Gil Tene's jHiccup. It records hiccups in the application and also in a control process. One thing that bugged me about jHiccup for a long time is that you need to Microsoft Excel to visualize the data. However, I don't own an MS Office license and as much as I like jHiccup, I won't buy a license just for that.
So I thought about alternatives. At Elastic, we store our benchmark results also in Elasticsearch and visualize the data in Kibana. As I wanted to analyze multiple data sources anyway, it made sense to import the data into Elasticsearch. jHiccup comes with a tool, called jHiccupLogProcessor
that can convert jHiccup files to CSV. After a few unsatisfying experiences with a few CSV to JSON converters, I hacked together a small Python script that can convert just jHiccup's CSV file format to Elasticsearch bulk requests and imported the file with curl.
This approach worked but is also cumbersome. Apart from that I figured other people might benefit from an import tool to Elasticsearch too. So I spiced up the script a little bit and the result is hiccup2es. Although it is really basic, it is a start and allows you to import data into Elasticsearch in one step. It even creates the index for you.
It is quite easy to use (I hope). Suppose you have a CSV jHiccup log called "hiccuplog.csv" that has been created by jHiccupLogProcessor
. Then you can import the data into your local Elasticsearch cluster in one step:
python3 hiccup2es.py --input-file=hiccuplog.csv --create-index
That's it! hiccup2es
assumes a lot of defaults, so if you want to know what you can configure, run hiccup2es --help
to find out.
After the data are imported, you can easily create a nice looking visualization in Kibana. Here is a hiccup log during a benchmark I ran on my local machine:
I hope you find hiccup2es
as useful as I do. If you want to know more, just contact me on Twitter. If you find a problem or can think of any enhancement, just raise an issue on Github.
Hidden Gems in the JVM: jcmd
Written on May 27, 2015 by Daniel Mitterdorfer
jcmd
is one of those neat tools that only a few people seem to know about. It is a command line tool that is shipped with each Oracle JDK installation and provides basic information about a running Java process, which is practical for unobtrusively inspecting a running production instance.
How to Use jcmd
To use jcmd
we need to specify the process id of a running Java process. When jcmd
is invoked without any arguments, it prints a list of all running Java processes:
daniel@thebe:~ $ jcmd
10322 name.mitterdorfer.demos.jcmd.Main
10307 org.gradle.launcher.GradleMain run
10325 sun.tools.jcmd.JCmd
Next, we can list the available commands for a process by issuing jcmd <<PID>> help
:
daniel@thebe:~ $ jcmd 10322 help
10322:
The following commands are available:
JFR.stop
JFR.start
JFR.dump
JFR.check
VM.native_memory
VM.check_commercial_features
VM.unlock_commercial_features
ManagementAgent.stop
ManagementAgent.start_local
ManagementAgent.start
Thread.print
GC.class_stats
GC.class_histogram
GC.heap_dump
GC.run_finalization
GC.run
VM.uptime
VM.flags
VM.system_properties
VM.command_line
VM.version
help
For more information about a specific command use 'help <command>'.
The supported options vary from process to process (e.g. due to VM options). For this example, there are four categories of commands:
- JFR: These are commands that control Java Flight Recorder which is really great for monitoring an application and deserves its own post (or even a series of posts). Note that while you are allowed to use Java Flight Recorder during development you need to obtain a separate license from Oracle to use it on production systems 1.
- VM: Basic information about the running JVM instance
- GC: Basic information about GC behavior and commands to force a GC or finalization
- Others: help, options to control the Java management agent and taking thread dumps
There is also one option that is not listed here, called "PerfCounter.print". It prints JVM-internal performance counters which contain lots of details on the behavior of components like the classloader, the JIT, the GC and a few others. To get started, the commands listed above should do but if you are really curios just try it.
A Practical Example
Suppose an application takes ages to start on a production system although it worked fine in development. To determine what's going on, we can take a thread dump:
daniel@thebe:~ $ jcmd 10378 Thread.print
10378:
2015-04-20 11:21:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode):
"Attach Listener" #10 daemon prio=9 os_prio=31 tid=0x00007fa7ef008000 nid=0x5b03 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
[...]
"main" #1 prio=5 os_prio=31 tid=0x00007fa7ee800000 nid=0x1903 waiting on condition [0x0000000102dbf000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at name.mitterdorfer.demos.jcmd.Main.main(Main.java:8)
"VM Thread" os_prio=31 tid=0x00007fa7ec857800 nid=0x3503 runnable
"GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007fa7ee80d000 nid=0x2503 runnable
[...]
"GC task thread#7 (ParallelGC)" os_prio=31 tid=0x00007fa7ec80b800 nid=0x3303 runnable
"VM Periodic Task Thread" os_prio=31 tid=0x00007fa7ed801800 nid=0x5903 waiting on condition
JNI global references: 7
Can you see what's going on? The main thread is in TIMED_WAITING state which means it is sleeping. Although this is not a very realistic example, such a thread dump once helped me find a weird startup issue in production: The logging framework of an application was misconfigured to log everything on DEBUG level but without specifying one of our application's file appenders. Therefore, no log statement ever reached the application's log files but it seemed that the application was not doing anything for minutes. I took a few thread dumps which revealed the problem quickly: Most thread dumps revealed that a debug statement was issued deep in library code. We corrected the log configuration and the application started smoothly again.
Summary
jcmd
supports basic inspections in limited environments such as a production machine. Sure, there are more sophisticated tools like jconsole, Java Mission Control or third-party tools like AppDynamics. However, for a quick glimpse jcmd
is quite handy as it comes with every JDK installation and requires no graphical interface.
-
Disclaimer: I am not a lawyer. Please read the license terms of Java SE yourself.↩