Assembly Language generated for volatile variables in x86_64   Leave a comment

In our last post we saw the java bytecode generated for volatile variables and basically saw that the the java compiler didn’t do anything special with regards to generating different bytecode for access to volatile variables except mark that variable as volatile. In this post we shall look at the assembly language generated by the JVM when running on a 64 bit Intel processor.
In order to see the machine level instructions generated by the JVM in 64 bit Windows 7 we need a disassembler built specifically for this operating system and 64 bit cpu architecture. It was quite a time consuming process to find the dissasembler(hsdis-amd64.dll) but finally I could download one that works from here. The other alternative is to build the disassembler like instructed here.  Once you have the hsdis-amd64.dll copy it to the folder where jvm.dll of your java installation is(For me it was C:\Program Files\Java\jdk1.7.0_55\jre\bin\server). Once this is done we have to run the program using the following command:

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation -XX:PrintAssemblyOptions=intel,mpad=10,cpad=10,code com.salil.threads.IncrementClass

This will produce an output like this:

Java HotSpot(TM) 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output
Loaded disassembler from C:\Program Files\Java\jdk1.7.0_55\jre\bin\server\hsdis-amd64.dll
Decoding compiled method 0x00000000026d2590:
Code:
RIP: 0x26d26c0 Code size: 0x00000078
[Disassembling for mach='amd64']
[Entry Point]
[Verified Entry Point]
[Constants]
 # {method} 'main' '([Ljava/lang/String;)V' in 'com/salil/threads/IncrementClass'
 0x00000000026d26c0: cc int3
 0x00000000026d26c1: 6666660f1f840000000000 nop word ptr [rax+rax+0h]
 0x00000000026d26cc: 66666690 nop
 0x00000000026d26d0: 4881ec38000000 sub rsp,38h
 0x00000000026d26d7: 48896c2430 mov qword ptr [rsp+30h],rbp
 0x00000000026d26dc: 488bca mov rcx,rdx
 0x00000000026d26df: 49bab004266e00000000 mov r10,6e2604b0h
 0x00000000026d26e9: 41ffd2 call indirect r10 ;*iload_1 ; - com.salil.threads.IncrementClass::main@2 (line 10)
 0x00000000026d26ec: 49baa03ecdd507000000 mov r10,7d5cd3ea0h ; {oop(a 'java/lang/Class' = 'com/salil/threads/IncrementClass')}
 0x00000000026d26f6: 41ff4274 inc dword ptr [r10+74h] ;*if_icmpge ; - com.salil.threads.IncrementClass::main@5 (line 10)
 0x00000000026d26fa: 458b5a70 mov r11d,dword ptr [r10+70h]
 0x00000000026d26fe: 41ffc3 inc r11d
 0x00000000026d2701: 45895a70 mov dword ptr [r10+70h],r11d
 0x00000000026d2705: f083042400 lock add dword ptr [rsp],0h ;*putstatic j ; - com.salil.threads.IncrementClass::main@27 (line 12)
 0x00000000026d270a: 4883c430 add rsp,30h
 0x00000000026d270e: 5d pop rbp
 0x00000000026d270f: 8505ebd8b7fd test dword ptr [250000h],eax ; {poll_return}
 0x00000000026d2715: c3 ret
 0x00000000026d2716: f4 hlt
 0x00000000026d2717: f4 hlt
 0x00000000026d2718: f4 hlt
 0x00000000026d2719: f4 hlt
 0x00000000026d271a: f4 hlt
 0x00000000026d271b: f4 hlt
 0x00000000026d271c: f4 hlt
 0x00000000026d271d: f4 hlt
 0x00000000026d271e: f4 hlt
 0x00000000026d271f: f4 hlt
[Exception Handler]
[Stub Code]
 0x00000000026d2720: e93bcdffff jmp 26cf460h
 ; {no_reloc}
[Deopt Handler Code]
 0x00000000026d2725: e800000000 call 26d272ah
 0x00000000026d272a: 48832c2405 sub qword ptr [rsp],5h
 0x00000000026d272f: e9cc68fdff jmp 26a9000h
 ; {runtime_call}
 0x00000000026d2734: f4 hlt
 0x00000000026d2735: f4 hlt
 0x00000000026d2736: f4 hlt
 0x00000000026d2737: f4 hlt

To understand the assembly language code better we shall use this webpage which has more easier to understand description of the machine level mnemonics. For a more in depth understand look at this web page which has the x86_64 instruction set and how to read the instruction set is described on the home page.

 0x00000000026d26ec: 49baa03ecdd507000000 mov r10,7d5cd3ea0h ; {oop(a 'java/lang/Class' = 'com/salil/threads/IncrementClass')}
 0x00000000026d26f6: 41ff4274 inc dword ptr [r10+74h] ;*if_icmpge ; - com.salil.threads.IncrementClass::main@5 (line 10)
 0x00000000026d26fa: 458b5a70 mov r11d,dword ptr [r10+70h]
 0x00000000026d26fe: 41ffc3 inc r11d
 0x00000000026d2701: 45895a70 mov dword ptr [r10+70h],r11d
 0x00000000026d2705: f083042400 lock add dword ptr [rsp],0h ;*putstatic j ; - com.salil.threads.IncrementClass::main@27 (line 12)

Here is what the above instructions do:

  • mov r10,7d5cd3ea0h : Moves the pointer to the IncrementClass.class to register r10. IncrementClass.class lies at 7d5cd3ea0h.
  • inc dword ptr [r10+74h] : Increments the 4 byte value at the address at [r10 + 74h],(i.e. i). Increment the value of the memory address that lies at the memory address specified at register r10 plus of offset of 74h bytes, which is where i resides.
  • mov r11d,dword ptr [r10+70h] : Moves the 4 byte value at the address [r10 + 70h] to register r11d (i.e move value of j to r11d). This is essential because j is a volatile variable and the JVM has to read it’s updated value from main memory. But where are the memory barriers before reading this variable?
  • inc r11d : Increment r11d. Increment the value of j in the register.
  • mov dword ptr [r10+70h],r11d : write value of r11d to [r10 + 70h] so it is visible to other threads
  • lock add dword ptr [rsp],0h : lock the memory address represented by the stack pointer rsp and add 0 to it. This acts as a store load barrier. How does this work? After all the statement is a dummy statement is is adding 0 to the stack pointer essentially doing nothing.

The reason why there are no memory barriers before the read of a volatile variable is because x86 has a strong memory model. And therefore:

  • Loads are not reordered with other loads. This means that LoadLoad memory barrier is a no-op
  • Stores are not reordered with other stores. This means that StoreStore memory barrier is a no-op.
  • Stores are not reordered with older loads. This means that the LoadStore memory barrier is a no-op.

This means that only a StoreLoad barrier is needed after the volatile write, which is achieved by ‘lock’ instruction. On a x86 the ‘lock’ instruction locks the memory in question. It is locking the memory address at [rsp] (the stack pointer) which should not have any effect right? But any locked instructions also has certain memory ordering guarantees in the x86 which prevents any loads from being done before the stores and hence it acts as a StoreLoad barrier.

Advertisements

Posted January 24, 2015 by salilsurendran in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: