Julien's tech blog

talking about tech stuff: stuff I'm interested in, intriguing stuff, random stuff.

Detecting low memory in Java

This seems to be a common need and difficult thing to do in Java. Here is my solution, let me know what you think.

1. The problem

I am focusing on only one aspect of the problem as I have a very specific use case. I have a piece of code that accumulates information about a (possibly big) data set. Let’s assume I have a very big file that contains a list of records encoded in JSON and that I iterate on the records to accumulate statistics about the data. This is close enough to what I’m doing. The result of the operation is a report containing the schema inferred from the data and statistics about the fields (max/min/avg size, list of unique values and count if bounded, type, …). This is implemented as an Algebraic UDF (so that it can use the Combiner) in a Pig script, so there’s a fair amount of the code that I don’t control. If the data is homogenous everything is fine and it generates a rather small report with a lot of interesting information about the data. Now if the data is random enough (dynamic keys, …) it will eventually eat up all the memory until it fails. The first thing to do is to make it configurable to limit the number of fields/values/… it will accumulate but getting this configuration right is still black magic and unsatisfying. What I really want is keep accumulating until I run out of memory and output what I can. Of course without getting to the dreaded OutOfMemoryError as it can be thrown anywhere and most often outside of my code where I can’t catch it. One important point here: I don’t want false positives.

2. Trying out things

First I looked into java.lang.Runtime.{free|max|total}Memory() and the more precise MemoryPoolMXBean. This tells me how much memory I am using and I can even set a usage threshold to be notified. Solved? Not really. The trouble is that reaching a level of usage does not mean you’re going to run out of memory. The unreachable objects do not get garbage collected until it is needed. To simplify we will first eat up all the memory then the garbage collector will free old objects and again and again. Graphing memory usage will show an upward trend with a drop every time GC kicks in. One way to make the value returned by freeMemory() more accurate is to force a garbage collection right before, but of course this slows down the application drastically; there’s a reason the garbage collector runs only when needed. Also there’s a special type of OutOfMemoryError (“GC overhead limit exceeded”) that gets thrown when you spend over a certain percentage of time in GC and you could artificially trigger it. Another way to look at this is to read the value of freeMemory() right after a (full) GC, but there’s no way to get notified of GC from the java side. It seems you need to write an agent in C which exceeded by far the complexity threshold I had set for the solution. You could also imagine polling GarbageCollectorMXBean which knows how many GCs happened (current>previous ? get freeMemory() ) but I did not try that (there was also a threshold on the time I spent on this 🙂 ) and I’m not sure how reliable it would be.

3. My solution

I settled for something very different which happens to be triggered by the garbage collector when you are close to get an OutOfMemoryError. I initialize a byte array and I set it in a SoftReference. The byte array is big enough so that freeing it will give me enough memory to finish what I’m doing gracefully. This acts like canaries in coal mines: the SoftReference “dies” as a warning that we are out of memory.

canary = new SoftReference<Object>(new byte[bufferSize])

Now in each iteration I can check canary.get() == null as a signal that I’m running low on memory before actually getting an OutOfMemoryError (remember that it can be thrown somewhere I can not catch it). You could also use a ReferenceQueue to get notified of this. A SoftReference is how you tell the JVM that you’d rather keep the object but that it can be freed when there’s a need.

Experimentation monitoring the memory and GCs shows that the SoftReference actually gets freed in last resort and not before so it fits the bill.

One big inconvenient of this is that the buffer is actually using memory that can not be used for something else. In particular the less memory is left, the more time the GC takes. The consequence is that the application will slow down a lot right before freeing the buffer, delaying the detection. My experiments showed that it was acceptable for my use case. The big advantage: this is a very simple solution.

If you have an opinion about this, please comment.

edit: I have posted an update regarding the MemoryPoolMXBean.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: