Understanding Erasure in Generics using Java ByteCode   Leave a comment

You might have heard several times that Generics in Java is not true parameteric polymorphism and about type erasure in Java. Also we come across the terms reified generics. and the fact that Generics in Java is not reified. What does this all mean and what does reified Generics buy us? In Java Generics is nothing more than syntactic sugar and what this means is that at runtime none of the parametric information is available to the JVM. So a List of Strings declared as List<String> and a List of Integers declared as List<Integer> look the same and the type is erased and all that is left in the bytecode is the List object. There is no way to figure out whether this List was originally declared as List<String>, List<Integer> or List<Timbuktu>. To understand this further lets look at the bytecode generated by javap of the class below:

package com.basic.generics;

import java.util.ArrayList;
import java.util.List;

/**
 * Created by sasurendran on 2/8/2015.
 */
public class GenericsByteCode {

    public void checkParameterizedByteCode(){
        List<String> strings = new ArrayList<>();
        strings.add("I am added to a parameterized list");
        String value = strings.get(0);
    }

    public void checkNormalByteCode() {
        List strings = new ArrayList<>();
        strings.add("I am added to a normal(non-parameterized) list");
        String value = (String) strings.get(0);
    }
}

One of the methods uses a parameterized version of the List interface and another one doesn’t. If we had true generics then we would expected some difference in the bytecode of the two methods. Let’s check this out using the command “javap -c -v com.basic.generics.GenericsByteCode“. This shows the following Java bytecode:

Classfile /C:/Users/sasurendran/Dropbox/Practicals/Threads/out/production/Basic/com/basic/generics/GenericsByteCode.class
  Last modified Feb 8, 2015; size 980 bytes
  MD5 checksum 7c1bf69eda0a333c4e12e0673c52e559
  Compiled from "GenericsByteCode.java"
public class com.basic.generics.GenericsByteCode
  SourceFile: "GenericsByteCode.java"
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #10.#28        //  java/lang/Object."":()V
   #2 = Class              #29            //  java/util/ArrayList
   #3 = Methodref          #2.#28         //  java/util/ArrayList."":()V
   #4 = String             #30            //  I am added to a parameterized list
   #5 = InterfaceMethodref #31.#32        //  java/util/List.add:(Ljava/lang/Object;)Z
   #6 = InterfaceMethodref #31.#33        //  java/util/List.get:(I)Ljava/lang/Object;
   #7 = Class              #34            //  java/lang/String
   #8 = String             #35            //  I am added to a normal(non-parameterized) list
   #9 = Class              #36            //  com/basic/generics/GenericsByteCode
  #10 = Class              #37            //  java/lang/Object
  #11 = Utf8               
  #12 = Utf8               ()V
  #13 = Utf8               Code
  #14 = Utf8               LineNumberTable
  #15 = Utf8               LocalVariableTable
  #16 = Utf8               this
  #17 = Utf8               Lcom/basic/generics/GenericsByteCode;
  #18 = Utf8               checkParameterizedByteCode
  #19 = Utf8               strings
  #20 = Utf8               Ljava/util/List;
  #21 = Utf8               value
  #22 = Utf8               Ljava/lang/String;
  #23 = Utf8               LocalVariableTypeTable
  #24 = Utf8               Ljava/util/List<Ljava/lang/String;>;
  #25 = Utf8               checkNormalByteCode
  #26 = Utf8               SourceFile
  #27 = Utf8               GenericsByteCode.java
  #28 = NameAndType        #11:#12        //  "":()V
  #29 = Utf8               java/util/ArrayList
  #30 = Utf8               I am added to a parameterized list
  #31 = Class              #38            //  java/util/List
  #32 = NameAndType        #39:#40        //  add:(Ljava/lang/Object;)Z
  #33 = NameAndType        #41:#42        //  get:(I)Ljava/lang/Object;
  #34 = Utf8               java/lang/String
  #35 = Utf8               I am added to a normal(non-parameterized) list
  #36 = Utf8               com/basic/generics/GenericsByteCode
  #37 = Utf8               java/lang/Object
  #38 = Utf8               java/util/List
  #39 = Utf8               add
  #40 = Utf8               (Ljava/lang/Object;)Z
  #41 = Utf8               get
  #42 = Utf8               (I)Ljava/lang/Object;
{
  public com.basic.generics.GenericsByteCode();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0       
         1: invokespecial #1                  // Method java/lang/Object."":()V
         4: return        
      LineNumberTable:
        line 9: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/basic/generics/GenericsByteCode;

  public void checkParameterizedByteCode();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=3, args_size=1
         0: new           #2                  // class java/util/ArrayList
         3: dup           
         4: invokespecial #3                  // Method java/util/ArrayList."":()V
         7: astore_1      
         8: aload_1       
         9: ldc           #4                  // String I am added to a parameterized list
        11: invokeinterface #5,  2            // InterfaceMethod java/util/List.add:(Ljava/lang/Object;)Z
        16: pop           
        17: aload_1       
        18: iconst_0      
        19: invokeinterface #6,  2            // InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
        24: checkcast     #7                  // class java/lang/String
        27: astore_2      
        28: return        
      LineNumberTable:
        line 12: 0
        line 13: 8
        line 14: 17
        line 15: 28
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      29     0  this   Lcom/basic/generics/GenericsByteCode;
            8      21     1 strings   Ljava/util/List;
           28       1     2 value   Ljava/lang/String;
      LocalVariableTypeTable:
        Start  Length  Slot  Name   Signature
            8      21     1 strings   Ljava/util/List<Ljava/lang/String;>;

  public void checkNormalByteCode();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=3, args_size=1
         0: new           #2                  // class java/util/ArrayList
         3: dup           
         4: invokespecial #3                  // Method java/util/ArrayList."":()V
         7: astore_1      
         8: aload_1       
         9: ldc           #8                  // String I am added to a normal(non-parameterized) list
        11: invokeinterface #5,  2            // InterfaceMethod java/util/List.add:(Ljava/lang/Object;)Z
        16: pop           
        17: aload_1       
        18: iconst_0      
        19: invokeinterface #6,  2            // InterfaceMethod java/util/List.get:(I)Ljava/lang/Object;
        24: checkcast     #7                  // class java/lang/String
        27: astore_2      
        28: return        
      LineNumberTable:
        line 18: 0
        line 19: 8
        line 20: 17
        line 21: 28
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      29     0  this   Lcom/basic/generics/GenericsByteCode;
            8      21     1 strings   Ljava/util/List;
           28       1     2 value   Ljava/lang/String;
}

In order to understand the bytecode better we will have to look at the Java Virtual Machine Instruction Set and it would also help if you check my previous post where I have described some aspects of the Java class structure better. Check out the bytecode marked in red in the parameterized version of the method (checkParameterizedByteCode()) and the bytecode marked in green in the normal version (checkNormalByteCode).

You will see that both of them are exactly the same. Let’s see what is happening in both the methods. You will see that in line 0 of both methods the ‘new‘ instruction is called with a reference to the java.util.ArrayList class and then in line ‘4’ the ‘invokespecial‘ is called. Click on the links and check out the JVM specs for both methods. Basically the new instruction creates a object reference of the java.util.ArrayList class and puts it on the operand stack and the invokeSpecial is a method that invokes the constructor of this class. You will see in both methods the bytecode is byte to byte same i.e., it is not calling a specialized version of the class for the parameterized version.

But then how does Java satisfy the requirements of related to parameterized list?

To understand that let’s look at the lines 19 and 24 also marked in color. You will see that the bytecode for the parameterized version of the method has an invokeinterface which invokes the get() method of the List interface followed by a checkcast instruction. The invokeinterface method creates a new stackframe(more on this later but it is sufficient to say that each method call adds a new stackframe on the current thread stack)invokes the get() method and puts the return value on the operand stack of the current method. After this the checkcast instruction checks that the object reference on the top of a stack is of a particular type(in this case as ‘String’). If not it throws a ClassCastException. Click on the links and check out the JVM specs for the checkcast instruction. Hence the parameterized version has an explicit cast check that checks that the return type is the parameterized type specified for the List, which is no different from the explicit cast  that we are doing in the non parameterized version. So what we see from this is that the Java compiler under the hood is converting out generic code to non-generic code via erasure and hence all information related to what was the actual parameterized type of the List is lost. However, it doesn’t mean that Generics is useless, it does provide compile time safety and makes the code much more manageable.

Now you may ask why was this done in this particular manner?

If there were specific constructs introduced in the JVM to support Generics then it would break backwards compatibility. Image a JDK library based on version 1.4.2 with a method returning plain old lists being called to be set to a variable with a parameterized List. This would fail immediately even if the old method was returning back a list of Strings. To avoid breaking all the existing code the decision was made to go this route. However, in languages like .NET this break was made and  there are two sets of libraries one the old non parameterized and the other new generic type(So I hear :) ).

Another question you may ask is why is this a problem?

One immediate thing which is clear is performance because of the extra cast check. The other things I hear is that it would help statically typed JVM languages like Scala to support case pattern matching with types. There are some other reasons that I don’t fully understand and if you do feel free to drop me a note.

Posted February 9, 2015 by salilsurendran in Uncategorized

Follow

Get every new post delivered to your Inbox.

Join 518 other followers