| <?xml version="1.0"?> |
| <!-- |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, |
| * software distributed under the License is distributed on an |
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| * KIND, either express or implied. See the License for the |
| * specific language governing permissions and limitations |
| * under the License. |
| --> |
| <document> |
| <properties> |
| <title>The BCEL API</title> |
| </properties> |
| |
| <body> |
| <section name="The BCEL API"> |
| <p> |
| The <font face="helvetica,arial">BCEL</font> API abstracts from |
| the concrete circumstances of the Java Virtual Machine and how to |
| read and write binary Java class files. The API mainly consists |
| of three parts: |
| </p> |
| |
| <p> |
| |
| <ol type="1"> |
| <li> A package that contains classes that describe "static" |
| constraints of class files, i.e., reflects the class file format and |
| is not intended for byte code modifications. The classes may be |
| used to read and write class files from or to a file. This is |
| useful especially for analyzing Java classes without having the |
| source files at hand. The main data structure is called |
| <tt>JavaClass</tt> which contains methods, fields, etc..</li> |
| |
| <li> A package to dynamically generate or modify |
| <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to |
| insert analysis code, to strip unnecessary information from class |
| files, or to implement the code generator back-end of a Java |
| compiler.</li> |
| |
| <li> Various code examples and utilities like a class file viewer, |
| a tool to convert class files into HTML, and a converter from |
| class files to the <a |
| href="http://jasmin.sourceforge.net">Jasmin</a> assembly |
| language.</li> |
| </ol> |
| </p> |
| |
| <subsection name="JavaClass"> |
| <p> |
| The "static" component of the <font |
| face="helvetica,arial">BCEL</font> API resides in the package |
| <tt>org.apache.bcel.classfile</tt> and closely represents class |
| files. All of the binary components and data structures declared |
| in the <a |
| href="http://docs.oracle.com/javase/specs/">JVM |
| specification</a> and described in section <a |
| href="#2 The Java Virtual Machine">2</a> are mapped to classes. |
| |
| <a href="#Figure 3">Figure 3</a> shows an UML diagram of the |
| hierarchy of classes of the <font face="helvetica,arial">BCEL |
| </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also |
| shows a detailed diagram of the <tt>ConstantPool</tt> components. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 3"> |
| <img src="../images/javaclass.gif"/> <br/> |
| Figure 3: UML diagram for the JavaClass API</a> |
| </p> |
| |
| <p> |
| The top-level data structure is <tt>JavaClass</tt>, which in most |
| cases is created by a <tt>ClassParser</tt> object that is capable |
| of parsing binary class files. A <tt>JavaClass</tt> object |
| basically consists of fields, methods, symbolic references to the |
| super class and to the implemented interfaces. |
| </p> |
| |
| <p> |
| The constant pool serves as some kind of central repository and is |
| thus of outstanding importance for all components. |
| <tt>ConstantPool</tt> objects contain an array of fixed size of |
| <tt>Constant</tt> entries, which may be retrieved via the |
| <tt>getConstant()</tt> method taking an integer index as argument. |
| Indexes to the constant pool may be contained in instructions as |
| well as in other components of a class file and in constant pool |
| entries themselves. |
| </p> |
| |
| <p> |
| Methods and fields contain a signature, symbolically defining |
| their types. Access flags like <tt>public static final</tt> occur |
| in several places and are encoded by an integer bit mask, e.g., |
| <tt>public static final</tt> matches to the Java expression |
| </p> |
| |
| |
| <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> |
| |
| <p> |
| As mentioned in <a href="jvm.html#Java_class_file_format">section |
| 2.1</a> already, several components may contain <em>attribute</em> |
| objects: classes, fields, methods, and <tt>Code</tt> objects |
| (introduced in <a href="jvm.html#Method_code">section 2.3</a>). The |
| latter is an attribute itself that contains the actual byte code |
| array, the maximum stack size, the number of local variables, a |
| table of handled exceptions, and some optional debugging |
| information coded as <tt>LineNumberTable</tt> and |
| <tt>LocalVariableTable</tt> attributes. Attributes are in general |
| specific to some data structure, i.e., no two components share the |
| same kind of attribute, though this is not explicitly |
| forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped |
| with the component they belong to. |
| </p> |
| |
| </subsection> |
| |
| <subsection name="Class repository"> |
| <p> |
| Using the provided <tt>Repository</tt> class, reading class files into |
| a <tt>JavaClass</tt> object is quite simple: |
| </p> |
| |
| <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> |
| |
| <p> |
| The repository also contains methods providing the dynamic equivalent |
| of the <tt>instanceof</tt> operator, and other useful routines: |
| </p> |
| |
| <source> |
| if (Repository.instanceOf(clazz, super_class)) { |
| ... |
| } |
| </source> |
| |
| </subsection> |
| |
| <h4>Accessing class file data</h4> |
| |
| <p> |
| Information within the class file components may be accessed like |
| Java Beans via intuitive set/get methods. All of them also define |
| a <tt>toString()</tt> method so that implementing a simple class |
| viewer is very easy. In fact all of the examples used here have |
| been produced this way: |
| </p> |
| |
| <source> |
| System.out.println(clazz); |
| printCode(clazz.getMethods()); |
| ... |
| public static void printCode(Method[] methods) { |
| for (int i = 0; i < methods.length; i++) { |
| System.out.println(methods[i]); |
| |
| Code code = methods[i].getCode(); |
| if (code != null) // Non-abstract method |
| System.out.println(code); |
| } |
| } |
| </source> |
| |
| <h4>Analyzing class data</h4> |
| <p> |
| Last but not least, <font face="helvetica,arial">BCEL</font> |
| supports the <em>Visitor</em> design pattern, so one can write |
| visitor objects to traverse and analyze the contents of a class |
| file. Included in the distribution is a class |
| <tt>JasminVisitor</tt> that converts class files into the <a |
| href="http://jasmin.sourceforge.net">Jasmin</a> |
| assembler language. |
| </p> |
| |
| <subsection name="ClassGen"> |
| <p> |
| This part of the API (package <tt>org.apache.bcel.generic</tt>) |
| supplies an abstraction level for creating or transforming class |
| files dynamically. It makes the static constraints of Java class |
| files like the hard-coded byte code addresses "generic". The |
| generic constant pool, for example, is implemented by the class |
| <tt>ConstantPoolGen</tt> which offers methods for adding different |
| types of constants. Accordingly, <tt>ClassGen</tt> offers an |
| interface to add methods, fields, and attributes. |
| <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 4"> |
| <img src="../images/classgen.gif"/> |
| <br/> |
| Figure 4: UML diagram of the ClassGen API</a> |
| </p> |
| |
| <h4>Types</h4> |
| <p> |
| We abstract from the concrete details of the type signature syntax |
| (see <a href="jvm.html#Type_information">2.5</a>) by introducing the |
| <tt>Type</tt> class, which is used, for example, by methods to |
| define their return and argument types. Concrete sub-classes are |
| <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> |
| which consists of the element type and the number of |
| dimensions. For commonly used types the class offers some |
| predefined constants. For example, the method signature of the |
| <tt>main</tt> method as shown in |
| <a href="jvm.html#Type_information">section 2.5</a> is represented by: |
| </p> |
| |
| <source> |
| Type return_type = Type.VOID; |
| Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; |
| </source> |
| |
| <p> |
| <tt>Type</tt> also contains methods to convert types into textual |
| signatures and vice versa. The sub-classes contain implementations |
| of the routines and constraints specified by the Java Language |
| Specification. |
| </p> |
| |
| <h4>Generic fields and methods</h4> |
| <p> |
| Fields are represented by <tt>FieldGen</tt> objects, which may be |
| freely modified by the user. If they have the access rights |
| <tt>static final</tt>, i.e., are constants and of basic type, they |
| may optionally have an initializing value. |
| </p> |
| |
| <p> |
| Generic methods contain methods to add exceptions the method may |
| throw, local variables, and exception handlers. The latter two are |
| represented by user-configurable objects as well. Because |
| exception handlers and local variables contain references to byte |
| code addresses, they also take the role of an <em>instruction |
| targeter</em> in our terminology. Instruction targeters contain a |
| method <tt>updateTarget()</tt> to redirect a reference. This is |
| somewhat related to the Observer design pattern. Generic |
| (non-abstract) methods refer to <em>instruction lists</em> that |
| consist of instruction objects. References to byte code addresses |
| are implemented by handles to instruction objects. If the list is |
| updated the instruction targeters will be informed about it. This |
| is explained in more detail in the following sections. |
| </p> |
| |
| <p> |
| The maximum stack size needed by the method and the maximum number |
| of local variables used may be set manually or computed via the |
| <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods |
| automatically. |
| </p> |
| |
| <h4>Instructions</h4> |
| <p> |
| Modeling instructions as objects may look somewhat odd at first |
| sight, but in fact enables programmers to obtain a high-level view |
| upon control flow without handling details like concrete byte code |
| offsets. Instructions consist of an opcode (sometimes called |
| tag), their length in bytes and an offset (or index) within the |
| byte code. Since many instructions are immutable (stack operators, |
| e.g.), the <tt>InstructionConstants</tt> interface offers |
| shareable predefined "fly-weight" constants to use. |
| </p> |
| |
| <p> |
| Instructions are grouped via sub-classing, the type hierarchy of |
| instruction classes is illustrated by (incomplete) figure in the |
| appendix. The most important family of instructions are the |
| <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to |
| targets somewhere within the byte code. Obviously, this makes them |
| candidates for playing an <tt>InstructionTargeter</tt> role, |
| too. Instructions are further grouped by the interfaces they |
| implement, there are, e.g., <tt>TypedInstruction</tt>s that are |
| associated with a specific type like <tt>ldc</tt>, or |
| <tt>ExceptionThrower</tt> instructions that may raise exceptions |
| when executed. |
| </p> |
| |
| <p> |
| All instructions can be traversed via <tt>accept(Visitor v)</tt> |
| methods, i.e., the Visitor design pattern. There is however some |
| special trick in these methods that allows to merge the handling |
| of certain instruction groups. The <tt>accept()</tt> do not only |
| call the corresponding <tt>visit()</tt> method, but call |
| <tt>visit()</tt> methods of their respective super classes and |
| implemented interfaces first, i.e., the most specific |
| <tt>visit()</tt> call is last. Thus one can group the handling of, |
| say, all <tt>BranchInstruction</tt>s into one single method. |
| </p> |
| |
| <p> |
| For debugging purposes it may even make sense to "invent" your own |
| instructions. In a sophisticated code generator like the one used |
| as a backend of the <a href="http://barat.sourceforge.net">Barat |
| framework</a> for static analysis one often has to insert |
| temporary <tt>nop</tt> (No operation) instructions. When examining |
| the produced code it may be very difficult to track back where the |
| <tt>nop</tt> was actually inserted. One could think of a derived |
| <tt>nop2</tt> instruction that contains additional debugging |
| information. When the instruction list is dumped to byte code, the |
| extra data is simply dropped. |
| </p> |
| |
| <p> |
| One could also think of new byte code instructions operating on |
| complex numbers that are replaced by normal byte code upon |
| load-time or are recognized by a new JVM. |
| </p> |
| |
| <h4>Instruction lists</h4> |
| <p> |
| An <em>instruction list</em> is implemented by a list of |
| <em>instruction handles</em> encapsulating instruction objects. |
| References to instructions in the list are thus not implemented by |
| direct pointers to instructions but by pointers to instruction |
| <em>handles</em>. This makes appending, inserting and deleting |
| areas of code very simple and also allows us to reuse immutable |
| instruction objects (fly-weight objects). Since we use symbolic |
| references, computation of concrete byte code offsets does not |
| need to occur until finalization, i.e., until the user has |
| finished the process of generating or transforming code. We will |
| use the term instruction handle and instruction synonymously |
| throughout the rest of the paper. Instruction handles may contain |
| additional user-defined data using the <tt>addAttribute()</tt> |
| method. |
| </p> |
| |
| <p> |
| <b>Appending:</b> One can append instructions or other instruction |
| lists anywhere to an existing list. The instructions are appended |
| after the given instruction handle. All append methods return a |
| new instruction handle which may then be used as the target of a |
| branch instruction, e.g.: |
| </p> |
| |
| <source> |
| InstructionList il = new InstructionList(); |
| ... |
| GOTO g = new GOTO(null); |
| il.append(g); |
| ... |
| // Use immutable fly-weight object |
| InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); |
| g.setTarget(ih); |
| </source> |
| |
| <p> |
| <b>Inserting:</b> Instructions may be inserted anywhere into an |
| existing list. They are inserted before the given instruction |
| handle. All insert methods return a new instruction handle which |
| may then be used as the start address of an exception handler, for |
| example. |
| </p> |
| |
| <source> |
| InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); |
| ... |
| mg.addExceptionHandler(start, end, handler, "java.io.IOException"); |
| </source> |
| |
| <p> |
| <b>Deleting:</b> Deletion of instructions is also very |
| straightforward; all instruction handles and the contained |
| instructions within a given range are removed from the instruction |
| list and disposed. The <tt>delete()</tt> method may however throw |
| a <tt>TargetLostException</tt> when there are instruction |
| targeters still referencing one of the deleted instructions. The |
| user is forced to handle such exceptions in a <tt>try-catch</tt> |
| clause and redirect these references elsewhere. The <em>peep |
| hole</em> optimizer described in the appendix gives a detailed |
| example for this. |
| </p> |
| |
| <source> |
| try { |
| il.delete(first, last); |
| } catch (TargetLostException e) { |
| for (InstructionHandle target : e.getTargets()) { |
| for (InstructionTargeter targeter : target.getTargeters()) { |
| targeter.updateTarget(target, new_target); |
| } |
| } |
| } |
| </source> |
| |
| <p> |
| <b>Finalizing:</b> When the instruction list is ready to be dumped |
| to pure byte code, all symbolic references must be mapped to real |
| byte code offsets. This is done by the <tt>getByteCode()</tt> |
| method which is called by default by |
| <tt>MethodGen.getMethod()</tt>. Afterwards you should call |
| <tt>dispose()</tt> so that the instruction handles can be reused |
| internally. This helps to improve memory usage. |
| </p> |
| |
| <source> |
| InstructionList il = new InstructionList(); |
| |
| ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", |
| "<generated>", ACC_PUBLIC | ACC_SUPER, null); |
| MethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, |
| Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, |
| new String[] { "argv" }, "main", "HelloWorld", il, cp); |
| ... |
| cg.addMethod(mg.getMethod()); |
| il.dispose(); // Reuse instruction handles of list |
| </source> |
| |
| <h4>Code example revisited</h4> |
| <p> |
| Using instruction lists gives us a generic view upon the code: In |
| <a href="#Figure 5">Figure 5</a> we again present the code chunk |
| of the <tt>readInt()</tt> method of the factorial example in section |
| <a href="jvm.html#Code_example">2.6</a>: The local variables |
| <tt>n</tt> and <tt>e1</tt> both hold two references to |
| instructions, defining their scope. There are two <tt>goto</tt>s |
| branching to the <tt>iload</tt> at the end of the method. One of |
| the exception handlers is displayed, too: it references the start |
| and the end of the <tt>try</tt> block and also the exception |
| handler code. |
| </p> |
| |
| <p align="center"> |
| <a name="Figure 5"> |
| <img src="../images/il.gif"/> |
| <br/> |
| Figure 5: Instruction list for <tt>readInt()</tt> method</a> |
| </p> |
| |
| <h4>Instruction factories</h4> |
| <p> |
| To simplify the creation of certain instructions the user can use |
| the supplied <tt>InstructionFactory</tt> class which offers a lot |
| of useful methods to create instructions from |
| scratch. Alternatively, he can also use <em>compound |
| instructions</em>: When producing byte code, some patterns |
| typically occur very frequently, for instance the compilation of |
| arithmetic or comparison expressions. You certainly do not want |
| to rewrite the code that translates such expressions into byte |
| code in every place they may appear. In order to support this, the |
| <font face="helvetica,arial">BCEL</font> API includes a <em>compound |
| instruction</em> (an interface with a single |
| <tt>getInstructionList()</tt> method). Instances of this class |
| may be used in any place where normal instructions would occur, |
| particularly in append operations. |
| </p> |
| |
| <p> |
| <b>Example: Pushing constants</b> Pushing constants onto the |
| operand stack may be coded in different ways. As explained in <a |
| href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are |
| some "short-cut" instructions that can be used to make the |
| produced byte code more compact. The smallest instruction to push |
| a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other |
| possibilities are <tt>bipush</tt> (can be used to push values |
| between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), |
| or <tt>ldc</tt> (load constant from constant pool). |
| </p> |
| |
| <p> |
| Instead of repeatedly selecting the most compact instruction in, |
| say, a switch, one can use the compound <tt>PUSH</tt> instruction |
| whenever pushing a constant number or string. It will produce the |
| appropriate byte code instruction and insert entries into to |
| constant pool if necessary. |
| </p> |
| |
| <source> |
| InstructionFactory f = new InstructionFactory(class_gen); |
| InstructionList il = new InstructionList(); |
| ... |
| il.append(new PUSH(cp, "Hello, world")); |
| il.append(new PUSH(cp, 4711)); |
| ... |
| il.append(f.createPrintln("Hello World")); |
| ... |
| il.append(f.createReturn(type)); |
| </source> |
| |
| <h4>Code patterns using regular expressions</h4> |
| <p> |
| When transforming code, for instance during optimization or when |
| inserting analysis method calls, one typically searches for |
| certain patterns of code to perform the transformation at. To |
| simplify handling such situations <font |
| face="helvetica,arial">BCEL </font>introduces a special feature: |
| One can search for given code patterns within an instruction list |
| using <em>regular expressions</em>. In such expressions, |
| instructions are represented by their opcode names, e.g., |
| <tt>LDC</tt>, one may also use their respective super classes, e.g., |
| "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, |
| <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, |
| the expression |
| </p> |
| |
| <source>"NOP+(ILOAD|ALOAD)*"</source> |
| |
| <p> |
| represents a piece of code consisting of at least one <tt>NOP</tt> |
| followed by a possibly empty sequence of <tt>ILOAD</tt> and |
| <tt>ALOAD</tt> instructions. |
| </p> |
| |
| <p> |
| The <tt>search()</tt> method of class |
| <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular |
| expression and a starting point as arguments and returns an |
| iterator describing the area of matched instructions. Additional |
| constraints to the matching area of instructions, which can not be |
| implemented via regular expressions, may be expressed via <em>code |
| constraint</em> objects. |
| </p> |
| |
| <h4>Example: Optimizing boolean expressions</h4> |
| <p> |
| In Java, boolean values are mapped to 1 and to 0, |
| respectively. Thus, the simplest way to evaluate boolean |
| expressions is to push a 1 or a 0 onto the operand stack depending |
| on the truth value of the expression. But this way, the |
| subsequent combination of boolean expressions (with |
| <tt>&&</tt>, e.g) yields long chunks of code that push |
| lots of 1s and 0s onto the stack. |
| </p> |
| |
| <p> |
| When the code has been finalized these chunks can be optimized |
| with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> |
| (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that |
| either produces a 1 or a 0 on the stack and is followed by an |
| <tt>ifne</tt> instruction (branch if stack value 0) may be |
| replaced by the <tt>IfInstruction</tt> with its branch target |
| replaced by the target of the <tt>ifne</tt> instruction: |
| </p> |
| |
| <source> |
| CodeConstraint constraint = new CodeConstraint() { |
| public boolean checkCode(InstructionHandle[] match) { |
| IfInstruction if1 = (IfInstruction) match[0].getInstruction(); |
| GOTO g = (GOTO) match[2].getInstruction(); |
| return (if1.getTarget() == match[3]) && |
| (g.getTarget() == match[4]); |
| } |
| }; |
| |
| InstructionFinder f = new InstructionFinder(il); |
| String pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; |
| |
| for (Iterator e = f.search(pat, constraint); e.hasNext(); ) { |
| InstructionHandle[] match = (InstructionHandle[]) e.next();; |
| ... |
| match[0].setTarget(match[5].getTarget()); // Update target |
| ... |
| try { |
| il.delete(match[1], match[5]); |
| } catch (TargetLostException ex) { |
| ... |
| } |
| } |
| </source> |
| |
| <p> |
| The applied code constraint object ensures that the matched code |
| really corresponds to the targeted expression pattern. Subsequent |
| application of this algorithm removes all unnecessary stack |
| operations and branch instructions from the byte code. If any of |
| the deleted instructions is still referenced by an |
| <tt>InstructionTargeter</tt> object, the reference has to be |
| updated in the <tt>catch</tt>-clause. |
| </p> |
| |
| <p> |
| <b>Example application:</b> |
| The expression: |
| </p> |
| |
| <source> |
| if ((a == null) || (i < 2)) |
| System.out.println("Ooops"); |
| </source> |
| |
| <p> |
| can be mapped to both of the chunks of byte code shown in <a |
| href="#Figure 6">figure 6</a>. The left column represents the |
| unoptimized code while the right column displays the same code |
| after the peep hole algorithm has been applied: |
| </p> |
| |
| <p align="center"><a name="Figure 6"> |
| <table> |
| <tr> |
| <td valign="top"><pre> |
| 5: aload_0 |
| 6: ifnull #13 |
| 9: iconst_0 |
| 10: goto #14 |
| 13: iconst_1 |
| 14: nop |
| 15: ifne #36 |
| 18: iload_1 |
| 19: iconst_2 |
| 20: if_icmplt #27 |
| 23: iconst_0 |
| 24: goto #28 |
| 27: iconst_1 |
| 28: nop |
| 29: ifne #36 |
| 32: iconst_0 |
| 33: goto #37 |
| 36: iconst_1 |
| 37: nop |
| 38: ifeq #52 |
| 41: getstatic System.out |
| 44: ldc "Ooops" |
| 46: invokevirtual println |
| 52: return |
| </pre></td> |
| <td valign="top"><pre> |
| 10: aload_0 |
| 11: ifnull #19 |
| 14: iload_1 |
| 15: iconst_2 |
| 16: if_icmpge #27 |
| 19: getstatic System.out |
| 22: ldc "Ooops" |
| 24: invokevirtual println |
| 27: return |
| </pre></td> |
| </tr> |
| </table> |
| </a> |
| </p> |
| </subsection> |
| </section> |
| </body> |
| </document> |