Thursday, June 23, 2011

Java analysis and reverse engineering

Some really neat tools here:

http://www.woodmann.com/collaborative/tools/index.php/Category:Java_Tools

Java bytecode analysis

As with any language, there are always best practices. One that I wanted to take a deeper look into is from Joshua Bloch's book, Effective Java. The one in particular is item 4 - Avoid creating duplicate objects.

I wanted to take a deeper look as to what is actually happening at the byte code level. Joshua's recommendation is to never create a String as follows:

String str = new String("my string"); //never do this


and instead do:

String s = "my string";


The first statement results in an additional String instance being created.

So, taking a look at the Java bytecode using a single String instance we see:

LDC "my string" //push string "my string" onto stack
ASTORE 2 //store it in local variable 2


So it only requires two opcodes for a single String declaration and assignment. Let's compare this to the other "bad" example of creating a String instance unncessarily:

NEW java/lang/String //Make a new String object and leave a reference to it on the stack:

[ Stack now contains: objectref ]

DUP //Duplicate the object reference:

[ Stack now contains: objectref objectref ]

LDC "my string" //push string "my string" onto stack
INVOKESPECIAL java/lang/String.<init>(Ljava/lang/String;)V
//call the String instance initialization method

ASTORE 3 //and store it in local variable 3



As we can see, there is quite the difference as there are now five opcodes to achieve the same result. What I am going to be really interested in is taking a look at the translated interpreted code / native code.