Groovy Retrospective: An Addendum - Memory usage & PermGen

I can't really have a Groovy retrospective without mentioning memory.

Over the last four years I have spent more time than any sane person should have to investigating memory leaks in production Groovy code. The dynamic nature of Groovy, and it's dynamic meta-programming presents different considerations for memory management compared to Java, simply because perm gen is no longer a fixed size. Java has a fixed number of classes that would normally be loaded in to memory (hot-reloading in long living containers aside), where as Groovy can easily change or create new classes on the fly. As a result, permgen GC is not as sophisticated (pre moving permgen to the normal heap anyway) and largely in Java if you experienced an Out-Of-Memory permgen exception then you would just increase your permgen size to support the required number of classes being loaded by the application.

To be fair, the majority of the problems encountered were due to trying to hot-reload our code coupled the setup of the application container (in this case tomcat) and having Groovy on the server classpath rather than bundled with the application (much like you wouldn't bundle Java itself with an application, however, bundling Groovy with your application is recommended).

A couple of points of interest, if you are considering Groovy (especially relevant in a long running process where you attempt to reload your code):


The MetaClassRegistry

When you meta-program a Groovy class, e.g. dynamically add a method to a class,  its MetaClass is added to the MetaClassRegistry, which is a static object on the GroovySystem class. This means that any dynamically programmed class creates a tie back to the core Groovy classes.

The main consideration to keep in mind when meta programming in a Groovy environment is that if you want to reload your classes you now have a link between your custom code and the core Groovy code so you must either 1) explicitly clear out the MetaClassRegistry; 2) reload the core Groovy classes as well (throw everything out on reload)

I think coming from a Java environment, where you would likely use the JAVA_HOME on the server for long running applications, it can often seem logical to have a similar server classpath entry for Groovy also - but actually, the easiest approach is to bundle the groovy classes with your application so is a normal candidate for reloading.

If you decide not to reload Groovy, you can add explicit code to clear out the registry - this is pretty simple code, but a note of warning, without just throwing everything away there are still plenty of risks that you can leave links that stop your classes being collected (which was the case for me for a long time until just making all the third party classes (groovy included) candidates to be thrown away and reloaded.  Even with throwing away all third party libraries you can still be caught out if you use shutdown hooks (e.g. jvm shutdown hook to clean up connections will tie your classes back to your underlying JRE, meaning that no classes can be collected until you restart your application!)

Anyway, above is code to clear the registry, it assumes you have access to the GroovyClassLoader, but you can also follow the same approach by just grabbing the MetaClassRegistry from any arbitrary Groovy class and iterate through that. Give it a try, if you play around a little you will probably find its quite easy to create a leak if you want to!


Anonymous Classes

Another thing to keep in mind in Groovy applications is the generation of classes by your application. As you would expect, as you load in Groovy classes to your application they will be compiled to class files (or if you are pre-compiling into a JAR or something) which will be added to PermGen. However, in addition to this, your code being executed may also result in additional (possibly anonymous) classes also being created and added to PermGen - so without care, these can start to fill up that space and cause OOM exceptions (although generated classes will often be very little, so might take a while before it actually errors).

An example of what might do this is loading Groovy config files - if you are loading sensibly and just doing it once then it won't be an issue, but if you find yourself re-loading the config every request/execution then it can keep adding those to PermGen.  Another example of where this happens (surprisingly) is if you are using Groovy templating. Consider the following code:

(Taken from the SimpleTemplateEngine JavaDocs)

The example is a simple example of binding a Groovy template with a map of values - maybe something that you would do to send an email or create a customized document for someone - but behind the scenes Groovy will create a class that is added to permgen for each execution. Now this isn't a  lot, however, if you are dealing with high throughput it can certainly add up pretty quickly.


Lazy Garbage Collection

Another interesting behaviour that I observed over the last few years is that, in the JVM implementation I was using, the PermGen garbage collector collected lazily.  As I mentioned, because nothing interesting traditionally happened in the permanent generation in the JVM, the garbage collectors didn't do anything interesting. Further more, because it was always assumed that the contents of perm gen were fairly static (as the name suggests) the collection happens fairly in-frequently, and often only kicks in for a full collection (which is more costly).  What this means is that even if everytime you reload you free up lots of classes for collection (say, your entire application), it might not GC permgen for several reloads as a full collection isn't required, and the JVM will just lazily perform the collection when permgen is almost full.



If you look at the diagram above, it displays a common pattern I observed - each little step in the up swing of the chart represents a full application reload, but you will see that there are multiple reloads before the permgen usage approaches the limit, and it is only when the usage is close to the limit that it actually performs the collection.

The challenges this can present is that if you push the permgen close to the limit but still not triggering a full collection, then the following reload it can once again spike the permgen (because the entire application and its associated third party code is being reloaded into memory), this can push it over the limit and cause OOM exceptions.  This was not something I ever saw on production environments, but was fairly common in desktop/development environments where less resources were available.

(another interesting observation in this particular pattern is that the GC seems to be intermittently collecting fewer classes. I never got to the bottom of that question mark: there was no difference in activity between application reload each time, so there is no change in application behaviour to trigger a leak and the middle period of reloading maintains a constant level of memory usage patterns - which also shows no leak behaviour)



Hopefully someone smarter than me about all this stuff is reading this and can shed some better insights into it, otherwise, hopefully its helpful if you are about to start doing crazy things with perm gen..

0 comments:

Groovy: A Retrospective

I have been using Groovy, in ernest, for just over 4 years now, and as I am possibly moving to using something else (Scala) I wanted to write up some notes about it all (not getting into memory leaks, as that's a whole other story!).

Over the years I have read people saying that it is not production ready, or is only really a language designed for scripting and testing etc but I have been using it in a production setting for the last four years and generally find that it is now at a stable enough level for production use (yes, it is still slower than Java, even with @CompileStatic, so if you know from the start that you are building some low-latency stuff, then maybe go with something else).

Java to Groovy

I started using Groovy 4 years ago when I joined a team that was already using it as the primary development language and came straight from an enterprise Java background.  The nice thing about Groovy (certainly versus something like Scala) is that it is, for the most part, very similar to Java syntax, and if you can write Java then you can also write Groovy.  Which is great if you either have an existing Java team or a large Java hiring pool and you want to start using Groovy. The downside is that it takes considerable discipline and code review process to ensure a consistently styled code base, as from my experience, whilst the Java-to-Groovy switch can start in an instance, the journey is a gradual one.

From my experience, both of making the transition myself and from hiring other Java developers into the team, there are three main stages of the journey:

  1. I like the return keyword! - If making an immediate switch to a Groovy environment, most former Java devs (myself included) likely end up just writing java in *.groovy files - most people find dropping the semicolon the easiest first step, but lots of Java developers wont fully embrace idiomatic Groovy, a good example of this being use of the return keyword.

    I remember early code review conversations being advised that I didn't need the return, to which I insisted that I liked it and found it clearer to read - A conversation I have since been on the opposite end of with several newer colleagues since then.  The effect of this is fairly minimal - it means as you bring more Java developers on board, you will find patches of your codebase which are distinctly Java in style (return keyword, occasional semicolons, getters & setters, for-each loops etc) - still readable and functional groovy, just not idiomatic.

  2. I can type less!? - The second phase that appears to be common is the opposite end of the spectrum - with the discovery of dynamic typing and the ability to define everything as def (or not include the type at all in the case of the method arguments).

    I'm not sure if this is a result of laziness (fewer character to type, and less thinking required as to specific types) - which I don't really buy - or whether it is just perceived as the idiomatic way.  Another common trait here is to over use scripts (they have a use, but are harder to unit-test and in my opinion, don't offer much over just using a groovy class with a simple main method which just handles the CliBuilder and instantiates the class to run).

    Again, in my opinion, typing is better - I still don't buy that typing def rather than the actual type is actually that much of a saving, and often ends up being a code-smell (even if it didn't start out as one).  The official Groovy recommendation is to type things, with the exception of local method scoped variables - but even in that case I cannot see the benefit: there is no real gain to the developer in typing those precious fewer characters and the future readability suffers as other developers have to track through the code more to be sure as to what a variable is doing.

  3. Idiomatic, typed Groovy - In a circle-of-life analogy, the resulting code is actually quite close to Java (well, it's statically typed and I don't focus on brevity of code at the cost of readability), but makes full use of the best groovy features - automatic getters and setters with dot notation for access (where brevity is good!), all the collection methods (each, findAll, collect, inject), explicit Map and List declaration, optional (but typed!) method arguments, truthy-ness, etc.

    I suspect the evolution to this stage, in part, comes from working on a sufficiently large code base, for long enough to have to start dealing with the fall out of nothing being accurately typed - for me this was having to port some groovy code to Java and found several methods with several un-typed (sometimes defaulted) arguments, and the method body code just a large if block checking the types of the passed data and handling differently.

    A week or so ago I put together a small web-scraping project in Groovy - its just a few classes, but gives a good picture of how I code Groovy now after 4 years.

The good bits

I think it was probably a combination of the above journey and Groovy's maturing (I started using it at version 1.7), but it took probably 2 1/2 years before I wanted to start using Groovy at home and for my own projects, continuing to use Java in those first few years.  But now, I am happy to recommend Groovy (performance concerns noted), its a nice syntactical sugar that can really compliment the way you would normally program in Java and there are some things that I really miss when going back to Java.

  • The explicit maps and list definition is awesome - This is a good example where brevity and readability come together, and is kind of puzzling that this hasn't been natively supported in Java anyway.  The closest I can think that Java has is its Arrays.asList(..) method, which is a nice helper, but still more clunky than it needs to be.  When I go back to Java I definitely miss the ability to just natively define nested Map/List structures
  • The extended collection methods with closure support, for iterating, collecting, filtering etc are also great, and probably the biggest part of Groovy that I miss going back to Java (I know that Java streams have started to address this area, and my limited experience of these have been pretty nice, but not quite as good).  This isn't just a nice syntax for iterating or filtering (although, again, how has there not been a better native support for find/findAll in Java until now?), but also the support for more Functional Programming techniques such as Map-Reduce.
  • Another simple nice feature that adds readability and conciseness - being able to simply assert if ( list )  or if ( map ) to determine if a collection has any values is great.
  • Automatic getters and setters - this is a nice one in theory. I agree that it makes no sense to always have to create getters and setters for everything (even if it is normally the IDE generating them), and I like the dot notation for accessing them.  At first glance it feels like it is breaking the encapsulation of the object (and really maybe they should have implemented this feature as an annotation rather than messing with the standard Java access modifier behaviour) - but really its just doing what you would do by hand - if you leave your variables as default modifier (and don't add your own getters and setters) then Groovy will add the, at compile time, and you are free to use the getter and setter methods throughout the code. The idea that you then use private modifier for variables that you want for just internal use and not exposed. However, there is a long standing bug that means this doesn't work, and you actually get getters and setters for everything! Still, a nice idea.


It's not quite a farewell, as I will continue to use Groovy (partly because I still enjoy using Spring framework so much) for the time being, and only time will tell whether Scala and some other frameworks take its place as my go to hobbyist platform of choice!

1 comments: