| \A{ndisasm} \i{Ndisasm} |
| |
| The Netwide Disassembler, NDISASM |
| |
| \H{ndisintro} Introduction |
| |
| |
| The Netwide Disassembler is a small companion program to the Netwide |
| Assembler, NASM. It seemed a shame to have an x86 assembler, |
| complete with a full instruction table, and not make as much use of |
| it as possible, so here's a disassembler which shares the |
| instruction table (and some other bits of code) with NASM. |
| |
| The Netwide Disassembler does nothing except to produce |
| disassemblies of \e{binary} source files. NDISASM does not have any |
| understanding of object file formats, like \c{objdump}, and it will |
| not understand \c{DOS .EXE} files like \c{debug} will. It just |
| disassembles. |
| |
| |
| \H{ndisrun} Running NDISASM |
| |
| To disassemble a file, you will typically use a command of the form |
| |
| \c ndisasm -b {16|32|64} filename |
| |
| NDISASM can disassemble 16-, 32- or 64-bit code equally easily, |
| provided of course that you remember to specify which it is to work |
| with. If no \i\c{-b} switch is present, NDISASM works in 16-bit mode |
| by default. The \i\c{-u} switch (for USE32) also invokes 32-bit mode. |
| |
| Two more command line options are \i\c{-r} which reports the version |
| number of NDISASM you are running, and \i\c{-h} which gives a short |
| summary of command line options. |
| |
| |
| \S{ndiscom} Specifying the Input Origin |
| |
| To disassemble a \c{DOS .COM} file correctly, a disassembler must assume |
| that the first instruction in the file is loaded at address \c{0x100}, |
| rather than at zero. NDISASM, which assumes by default that any file |
| you give it is loaded at zero, will therefore need to be informed of |
| this. |
| |
| The \i\c{-o} option allows you to declare a different origin for the |
| file you are disassembling. Its argument may be expressed in any of |
| the NASM numeric formats: decimal by default, if it begins with `\c{$}' |
| or `\c{0x}' or ends in `\c{H}' it's \c{hex}, if it ends in `\c{Q}' it's |
| \c{octal}, and if it ends in `\c{B}' it's \c{binary}. |
| |
| Hence, to disassemble a \c{.COM} file: |
| |
| \c ndisasm -o100h filename.com |
| |
| will do the trick. |
| |
| |
| \S{ndissync} Code Following Data: Synchronization |
| |
| Suppose you are disassembling a file which contains some data which |
| isn't machine code, and \e{then} contains some machine code. NDISASM |
| will faithfully plough through the data section, producing machine |
| instructions wherever it can (although most of them will look |
| bizarre, and some may have unusual prefixes, e.g. `\c{FS OR AX,0x240A}'), |
| and generating `DB' instructions ever so often if it's totally stumped. |
| Then it will reach the code section. |
| |
| Supposing NDISASM has just finished generating a strange machine |
| instruction from part of the data section, and its file position is |
| now one byte \e{before} the beginning of the code section. It's |
| entirely possible that another spurious instruction will get |
| generated, starting with the final byte of the data section, and |
| then the correct first instruction in the code section will not be |
| seen because the starting point skipped over it. This isn't really |
| ideal. |
| |
| To avoid this, you can specify a `\i{synchronization}' point, or indeed |
| as many synchronization points as you like (although NDISASM can |
| only handle 2147483647 sync points internally). The definition of a sync |
| point is this: NDISASM guarantees to hit sync points exactly during |
| disassembly. If it is thinking about generating an instruction which |
| would cause it to jump over a sync point, it will discard that |
| instruction and output a `\c{db}' instead. So it \e{will} start |
| disassembly exactly from the sync point, and so you \e{will} see all |
| the instructions in your code section. |
| |
| Sync points are specified using the \i\c{-s} option: they are measured |
| in terms of the program origin, not the file position. So if you |
| want to synchronize after 32 bytes of a \c{.COM} file, you would have to |
| do |
| |
| \c ndisasm -o100h -s120h file.com |
| |
| rather than |
| |
| \c ndisasm -o100h -s20h file.com |
| |
| As stated above, you can specify multiple sync markers if you need |
| to, just by repeating the \c{-s} option. |
| |
| |
| \S{ndisisync} Mixed Code and Data: Automatic (Intelligent) Synchronization |
| \I\c{auto-sync} |
| |
| Suppose you are disassembling the boot sector of a \c{DOS} floppy (maybe |
| it has a virus, and you need to understand the virus so that you |
| know what kinds of damage it might have done you). Typically, this |
| will contain a \c{JMP} instruction, then some data, then the rest of the |
| code. So there is a very good chance of NDISASM being \e{misaligned} |
| when the data ends and the code begins. Hence a sync point is |
| needed. |
| |
| On the other hand, why should you have to specify the sync point |
| manually? What you'd do in order to find where the sync point would |
| be, surely, would be to read the \c{JMP} instruction, and then to use |
| its target address as a sync point. So can NDISASM do that for you? |
| |
| The answer, of course, is yes: using either of the synonymous |
| switches \i\c{-a} (for automatic sync) or \i\c{-i} (for intelligent |
| sync) will enable \c{auto-sync} mode. Auto-sync mode automatically |
| generates a sync point for any forward-referring PC-relative jump or |
| call instruction that NDISASM encounters. (Since NDISASM is one-pass, |
| if it encounters a PC-relative jump whose target has already been |
| processed, there isn't much it can do about it...) |
| |
| Only PC-relative jumps are processed, since an absolute jump is |
| either through a register (in which case NDISASM doesn't know what |
| the register contains) or involves a segment address (in which case |
| the target code isn't in the same segment that NDISASM is working |
| in, and so the sync point can't be placed anywhere useful). |
| |
| For some kinds of file, this mechanism will automatically put sync |
| points in all the right places, and save you from having to place |
| any sync points manually. However, it should be stressed that |
| auto-sync mode is \e{not} guaranteed to catch all the sync points, and |
| you may still have to place some manually. |
| |
| Auto-sync mode doesn't prevent you from declaring manual sync |
| points: it just adds automatically generated ones to the ones you |
| provide. It's perfectly feasible to specify \c{-i} \e{and} some \c{-s} |
| options. |
| |
| Another caveat with auto-sync mode is that if, by some unpleasant |
| fluke, something in your data section should disassemble to a |
| PC-relative call or jump instruction, NDISASM may obediently place a |
| sync point in a totally random place, for example in the middle of |
| one of the instructions in your code section. So you may end up with |
| a wrong disassembly even if you use auto-sync. Again, there isn't |
| much I can do about this. If you have problems, you'll have to use |
| manual sync points, or use the \c{-k} option (documented below) to |
| suppress disassembly of the data area. |
| |
| |
| \S{ndisother} Other Options |
| |
| The \i\c{-e} option skips a header on the file, by ignoring the first N |
| bytes. This means that the header is \e{not} counted towards the |
| disassembly offset: if you give \c{-e10 -o10}, disassembly will start |
| at byte 10 in the file, and this will be given offset 10, not 20. |
| |
| The \i\c{-k} option is provided with two comma-separated numeric |
| arguments, the first of which is an assembly offset and the second |
| is a number of bytes to skip. This \e{will} count the skipped bytes |
| towards the assembly offset: its use is to suppress disassembly of a |
| data section which wouldn't contain anything you wanted to see |
| anyway. |
| |
| |