|

Hello Java

Bytecode & JVM

ICSE Computer Applications

So far, we have seen that platform independence and Write Once, Run Anywhere (WORA) promise made Java a huge hit for internet programming. So how does Java achieve this cross-platform portability for its programs?

The magic lies in Java’s Bytecode and Java Virtual Machine which is more popularly referred to by its initials JVM. In order to understand Bytecode and JVM, we first need to understand how programs are compiled and executed?

Machine Language

The single most important component of any computer is the CPU commonly referred to as the processor. A processor only understands binary digits that is 0 & 1. For the processor to perform any computation, we need to give the instructions and data as a sequence of 0s & 1s. This binary sequence that a processor understands is known as its machine language. Machine language is made up of instructions and data that are all binary numbers. Machine language of a processor differs from vendor to vendor. What it means is that processors made by Intel will have a different machine language from processors made by AMD and so on. Each processor manufacturer will have a specific machine language for a family of processors. As you can see, for a human it is very tough to write a program in machine language. To overcome this limitation, computer scientists invented assembly language.

Assembly Language

Instructions in assembly language are given in English like words so it is more friendly to humans than machine language. A program know as the assembler will take the assembly language code as input and convert it into machine language that the processor understands. Although assembly language is an improvement over machine language, still it is processor specific and writing and debugging any moderately complex program in it is very time intensive.

As computers became smaller, more powerful and more reliable there was a huge demand for software that is reusable and easy to maintain and modify. This led to the invention of High-Level languages like COBOL, Pascal, C, C++, etc.

High-Level Language

High-Level languages have a very much English like syntax. So, they are easily understandable by humans. Programs written in High-Level languages are processor independent. No modifications to the source code are required to execute it on different types of computers. This makes High-Level languages highly desirable for developing reusable, maintainable software. Now remember, our processor still loves its 0s & 1s. It doesn’t understand anything other than machine language. So, the program written in High-Level language needs to be converted into machine language. This is done by a program called the compiler. This entire process of converting high level code to machine code is called compilation. The output of compilation process is an executable file that can run our program on the computer.

So how does High-Level language code maintains portability across platforms? Even before we try to figure out the answer to this question, lets first understand what the term platform means here.

What is a Platform?

Software Platform to explain ICSE Computer Applications

In a personal computer you don’t directly execute your program on the processor. The computer has a specialized software called operating system running on it which acts as a bridge between our programs and the processor. I am very sure you are not hearing the term operating system or OS for the first time. A combination of operating system plus the processor is called the platform. Windows OS running on machines with Intel processors is one of the most common platforms in the world. Other common platforms are Linux and Intel, Mac OS X and PowerPC.

Code portability with High-Level language

Coming back to the original question of how High-Level language code is portable across platforms, the key here is the compiler. The compiler for each platform is different. Windows will have its own compiler, Linux will have its own compiler, Mac OSX will have its own compiler and so on. So if you have developed a software which you want people to install and use on different platforms like Windows, Linux and Mac, you will take that program and compile it using the compiler of each of the platform that you want to support.

Code portability with High-Level language

As shown in the picture above, you will compile your program using the Windows compiler and you will get an executable file which will run the program on Windows. Similarly, you will compile your program with Linux and Mac compilers and get two more executables for Linux and Mac platform respectively. Now you will have 3 executable files for each of Windows, Linux and Mac. You will put up all the 3 files on your website. Depending on their platform, users will download and install the appropriate executable file. You cannot take an executable file generated by a compiler for Windows and run it on a Linux system. It will not execute.

Java's magic of portability

With High-Level languages like C++ even though you don’t need to write your code again for a different platform but additional work in terms of compilation is required to run it on a different platform. You also need to maintain different executables for all the different platforms you want your program to run on. Java solves this exact problem with Bytecode and JVM. In Java the executable you get after compilation will work across platforms, no additional work needed, no need to maintain multiple executables for different platforms. You can compile you Java program on Windows and take the executable that you get as the output to Linux or Mac. The exact same executable will work on Linux and Mac too. Sounds like magic isn’t it. Now let’s try to demystify this magic by understanding the science behind it.

Bytecode

The output of a Java compiler is not an executable file but a file containing Bytecode. You can consider Bytecode as a close relative of machine code. Like machine code it is highly optimized and not human readable but unlike machine code the platform cannot understand and execute it. The cool thing about Bytecode is that it is the same for all platforms. Bytecode is understood by Java Virtual Machine or the JVM.

Java Virtual Machine (JVM)

JVM is a software that needs to be running on the computer where you want to execute your Java program. JVM takes the Bytecode as input, converts it into the machine code of the specific platform it is running on and executes it. Consider JVM as a virtual platform running on the actual platform. Let’s see how we can support multiple platforms for our Java program.

Code portability with Bytecode and JVM

We give our Java program as input to the Java compiler. The output of the compiler is the equivalent Bytecode of our Java program. Remember, Bytecode is not machine code, it cannot be executed directly on the target machine. We need some program running on the target machine which can understand this Bytecode and convert it into machine code which the target machine can then execute. This program is the JVM or Java Virtual Machine. As you would have guessed by now, JVM is platform specific, each platform has its own JVM. There is a JVM for Windows, a different JVM for Linux and yet another JVM for Mac. This JVM needs to be installed on the target machine before we can run any Java program on it. Although, the details of JVM will differ from platform to platform, they all understand the same Java bytecode. Once a JVM is available for a platform, any Java program can run on it without any modification. This is how Java achieves its magic of platform independence and provides Write Once, Run Anywhere (WORA) capabilities.