Search This Blog

Wednesday 19 September 2012

String Pool and SLF4J

A few days back we had a code review and one of the concerns raised was "too many logs".
The statement made was : Too many logs in code result could in memory issues.
This was the first time somebody had come with such a feedback. Most of the time it was absence of logs that was the common concern. I decided to spend some time analyzing the complaint.
The reasoning behind the complaint was not entirely out of place.The way developers write logs:
logger.debug("New record created for " + userId + " at " + time);
The problem is that at run time the JVM will concat the strings to create the final parameter for the debug function. 
Whether the log statement actually gets displayed or not is immaterial. The final string for display has been generated and the memory for the same has been used. For every different value of 'userId' and 'time' a new String will be created in the string pool.
This results in  a large number of intermediate strings being created for the execution of a single logger.debug.(Am not even thinking of the CPU cycles involved in these operations)
This is one reason why open source projects like Spring check if the log level is enabled before writing any log statements in the code.
if (logger.isTraceEnabled()) {
    logger.trace("ContextSingletonBeanFactoryLocator.getInstance(): instances.hashCode=" +
            instances.hashCode() + ", instances=" + instances);
    }
Putting this if check for every log statement in the code however is not a very easy or fun job.In the light of these concerns I find SLF4J very impressive.The log methods are slightly different to those we see in LOG4J or Commons Logging. For e.g.the signature for the logger's error method is :
public void debug(String format, Object arg1, Object arg2)
The method takes not a string but a format message or a template. So the above log statement would be rewritten as:
logger.debug("New record created for {} at {}", userId, time);
There are no more concat operators. Instead what we have is a template message and a series of parameters to use for generating the final message.
Inside the method, the logger checks if a particular level is enabled. If true it then prepares the final message that is to be logged.This is where the generation of the actual String occurs. Thus unless the log is actually enabled no unnecessary strings are created in the memory pool.
This means that for very simple statements we do not need to add the level checks.Of course if there are some complex computations involved in generating the log message it is a different story, for e.g.:
if (logger.isEnabledFor(Level.DEBUG)) {
    String mg1 = method();
    String msg2 = msg1 + timestamp;
    logger.error(mg1);
}
In this case the creation of strings msg1 and msg2 is needed only for the purpose of logging. If the DEBUG level is not enabled we do not need the method() function to execute or String msg2 to be created. Hence applying the level check is essential and beneficial here.
The SLF4J method signature that I displayed above supports only two run time parameters. But what if my template requires more than two  substitutions?
In that case we have we can use:
public void debug(String format, Object[] argArray)
For more than two parameters simply pass the parameters in an Object array and SLF4J will take care of making the substitutions work.
The code from one of the SLF4J implementations is as below:
public void error(String format, Object[] argArray) {

    if (logger.isEnabledFor(Level.ERROR)) {
      FormattingTuple ft = MessageFormatter.arrayFormat(format, argArray);
      logger.log(FQCN, Level.ERROR, ft.getMessage(), ft.getThrowable());
    }
  }
As can be seen the real string is only generated if the log level is enabled. Isn't this cool. What do you think ?

6 comments:

  1. Good post, you'll be surprised how many developers I have seen who, even knowing we used SLF4J, created methods like:

    public static void logInfo(String message) {
    if(log.isInfoEnabled()) {
    log.info(message);
    }
    }

    and call them like this, completely losing the usefulness of isInfoEnabled (which is actually not needed in most cases in SLF4J):

    logInfo("New record created for " + userId + " at " + time);

    ReplyDelete
    Replies
    1. Thanks Richard.
      What you say is true, we often don't think about the way we log or for that matter what we log. I think we have got so used to having logging, emailing, executors etc set up and easily available in our code, that we do not bother to actually check if we are using these functionalities the right way.

      Delete
  2. Nice article. Please note that as of SLF4J 1.7, the (String, Object[]) variant is replaced with (String, Object...). Interestingly enough, these two variants are 100% binary compatible.

    ReplyDelete
  3. If one of the format arguments is a primitive then they will be auto boxed, allocating memory and adding to GC. Also, every call that has more than two arguments allocates an array. Calling #isTraceEnabled is still the most efficient if you are really looking for performance.

    ReplyDelete
    Replies
    1. Hi Andy,
      Yes in case of primitives, auto-boxing will lead to object creation. Thats a good point. The array argument is also valid.
      For logs set at Trace level which are almost always suppressed, calling the #isTraceEnabled is the most efficient.
      For certain cases like the ones mentioned in the post, SLF4J comes up tops. Also the use of templates as parameter definitely checks the String pool growth.
      You need to handle the above scenarios on a case by case basis. Thanks for sharing those valid points though.

      Delete