StringBuilder Optimizations Demystified
Introduction
There are lots of myths around concatenating Strings in Java. Lets find out exactly which of them are true.
Myth 1: You should always use StringBuilder when concatenating Strings
When arguing about whether to use StringBuffer or StringBuilder it is usually fair to say that StringBuilder is better, because the StringBuffer has all the methods synchronized which can harm the performance. However, it is not always true that you have to use StringBuilder:
The most obvious case is concatenating final values. Joining the standard Java constants:
public static final String X = "123"; public static final String Y = X + "456"; public static final String Z = Y + 789;
...will be optimized by the compiler to:
public static final String X = "123"; public static final String Y = "123456"; public static final String Z = "123456789";
What may be surprising, the final on its own is sufficient, so this snippet:
public class SomeClass { public final String a = "123"; public final String b = a + "456"; public final String c = b + 789; }
...will be optimized to:
public class SomeClass { public final String a; public final String b; public final String c; public SomeClass() { a = "123"; b = "123456"; c = "123456789"; } }
The bytecode behind looks like this:
public class SomeClass { public final java.lang.String a; public final java.lang.String b; public final java.lang.String c; public SomeClass(); Code: 0: aload_0 1: invokespecial #18 // Method java/lang/Object."<init>":()V 4: aload_0 5: ldc #7 // String 123 7: putfield #20 // Field a:Ljava/lang/String; 10: aload_0 11: ldc #10 // String 123456 13: putfield #22 // Field b:Ljava/lang/String; 16: aload_0 17: ldc #13 // String 123456789 19: putfield #24 // Field c:Ljava/lang/String; 22: return }
What about the non-final fields? You should be a little bit concerned why do you want this, but leaving the rest to the compiler is also OK. Although the non-final fields are not optimized straight into Strings, StringBuilder is used. The following code for example:
public class SomeClass { public String x = "123"; public String y = x + "456"; public String z = y + 789; }
...will be transformed by the compiler into this:
public class SomeClass { public String x; public String y; public String z; public SomeClass() { x = "123"; y = (new StringBuilder(String.valueOf(x))).append("456").toString(); z = (new StringBuilder(String.valueOf(y))).append(789).toString(); } }
The bytecode for the intrigued:
public class SomeClass { public java.lang.String x; public java.lang.String y; public java.lang.String z; public SomeClass(); Code: 0: aload_0 1: invokespecial #12 // Method java/lang/Object."<init>":()V 4: aload_0 5: ldc #14 // String 123 7: putfield #16 // Field x:Ljava/lang/String; 10: aload_0 11: new #18 // class java/lang/StringBuilder 14: dup 15: aload_0 16: getfield #16 // Field x:Ljava/lang/String; 19: invokestatic #20 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 22: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 25: ldc #29 // String 456 27: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 30: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 33: putfield #39 // Field y:Ljava/lang/String; 36: aload_0 37: new #18 // class java/lang/StringBuilder 40: dup 41: aload_0 42: getfield #39 // Field y:Ljava/lang/String; 45: invokestatic #20 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 48: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 51: sipush 789 54: invokevirtual #41 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 57: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 60: putfield #44 // Field z:Ljava/lang/String; 63: return }
How about concatenating arguments of different types inside a method? Leave it to the compiler. This piece of code:
long a = 123; String b = "456"; int c = 789; String d = "000"; String result = a + b + c + d;
...will be turned into this:
long a = 123L; String b = "456"; int c = 789; String d = "000"; String result = (new StringBuilder(String.valueOf(a))).append(b).append(c).append(d).toString();
The bytecode:
Code: 0: ldc2_w #51 // long 123l 3: lstore_1 4: ldc #29 // String 456 6: astore_3 7: sipush 789 10: istore 4 12: ldc #53 // String 000 14: astore 5 16: new #18 // class java/lang/StringBuilder 19: dup 20: lload_1 21: invokestatic #55 // Method java/lang/String.valueOf:(J)Ljava/lang/String; 24: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 27: aload_3 28: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 31: iload 4 33: invokevirtual #41 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 36: aload 5 38: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 41: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String; 44: astore 6 46: return
Myth 2: You can rely on the StringBuilder optimization when you are concatenating inside an if statement
After reading some Java compiler specs you can fall for this. Unfortunately this is not true. Whenever you are concatenating Strings in more than one expression, you should watch out. For example this:
String x = "123";
x += "456";
x += "789";
...will be unfortunately "optimized" to this code:
String x = "123";
x = (new StringBuilder(String.valueOf(x))).append("456").toString();
x = (new StringBuilder(String.valueOf(x))).append("789").toString();
...when we expected this code:
String x = (new StringBuilder("123")).append("456").append("789").toString();
Again the bytecode for the comparison with the decompiled Java code:
Code:
0: ldc #14 // String 123
2: astore_1
3: new #18 // class java/lang/StringBuilder
6: dup
7: aload_1
8: invokestatic #20 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
11: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
14: ldc #29 // String 456
16: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
22: astore_1
23: new #18 // class java/lang/StringBuilder
26: dup
27: aload_1
28: invokestatic #20 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
31: invokespecial #26 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
34: ldc #66 // String 789
36: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
39: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
42: astore_1
43: return
Going further, the "advanced" example with an if statement will work out no better. The following:
String y = "123";
long z = 789;
boolean b = false;
y += "456";
if(!b) {
y += "789";
y += z;
}
...will NOT be magically turned into this:
StringBuilder sb = new StringBuilder("123");
long z = 789;
boolean b = false;
sb.append("456");
if(!b) {
sb.append("789");
sb.append(z);
}
String y = sb.toString();
Instead it will come down to this:
String y = "123";
long z = 789L;
boolean b = false;
y = (new StringBuilder(String.valueOf(y))).append("456").toString();
if(!b) {
y = (new StringBuilder(String.valueOf(y))).append("789").toString();
y = (new StringBuilder(String.valueOf(y))).append(z).toString();
}
As an exercise you can check the bytecode yourself.
javac YourJavaClass.java
javap -cp . -c YourJavaClass > bytecode.txt
Myth 3: StringBuilder should be used only if you are concatenating inside a loop
This myth is based on the assumption that the compiler is so clever that it can guess correctly where and how to put StringBuilder for the most of the time. After reading up to this point you should already know it is not true. StringBuilder should be used much more than expected. However, joining Strings inside a loop is a special case, where you can be 100% sure you have to use StringBuilder (or StringBuffer on rare occasions). For instance this:
String x = "123";
for(int i = 1; i < 100; i++) {
x += i;
}
...equals this:
String x = "123";
for(int i = 1; i < 100; i++) {
x = (new StringBuilder(String.valueOf(x))).append(i).toString();
}
...and we really wanted this:
StringBuilder sb = new StringBuilder("123");
for(int i = 1; i < 100; i++) {
sb.append(i);
}
String x = sb.toString();
Summary
I hope you enjoyed the article. Please, leave a comment, if you think there is something that should be added to the topic.
UPDATE: