diy_disassembler

Sat Sep 26 12:33:40 GMT 1998

Lately, several people have been looking for disassemblers.  If you have
some programming ability, and a machine/environment to do it in, a
simple disassembler for an 8 bit CPU is easy to write.  Here are two
methods:

The first method takes about 30 lines of BASIC.  First, fill a 256
element string array with strings like "RTS", "LDA #nn", "STA hhll",
"BEQ *+rr".  Each string is for the (8 bit) opcode with the same value
as that array index.

The BASIC program goes something like this:

  Grab a byte of object code
  Pull the corresponding mnemonic string from the array
  If the string contains "nn",
    replace the "nn" with a single byte operand from the object code
  If the string contains "hhll",
    replace the "hhll" with a two byte address operand
  And so on for each type of operand
  Print the original opcode address and the edited string
  Repeat till the end of object code data

The program can be embelished to work from an object code file (instead
of PEEKing from memory), to handle a partial opcode at the end of the
file, to print out in hex instead of decimal, and so on.

This works great for prefix-less CPUs like the 6800, 6801, 8080, 6502,
and etc.  A CPU which uses prefix bytes (68HC11, Z80, 6809) needs extra
processing of the prefixes.  Another 256 element string array for each
extra opcode page is the simplest solution.

(I've written an 8080/8085 disassembler this way at least a dozen
times.  It is easier to write it again from scratch than find the old
program.)

The second method uses something like the Unix tool "sed", aka "stream
editor".  sed is programmed with a script that contains a list of
essentially text editor Find&Replace commands (and some other
commands.)  Each line of the input file is processed by all the commands
before it is printed to the output and the process repeated on the next
line.

To make a disassembler out of this, I start with a text file containing
a memory dump of the target code.  Each line of this file contains the
address (in hex), and the byte (in hex) at that address.  This file is
then passed through sed to get the mnemonics added.

The sed script is mostly just a long list of Find&Replaces, one for each
opcode.  The Finds only trigger when the line has enough hex data bytes
to fill all operand requirements.  The Replaces stuff operands into the
mnemonics as needed.  If the input line reaches the end of the script
without triggering a Replace, then the byte(s) is(are) saved to combine
with the next line of input.  The next line is passed to the Finds with
two (or more) data bytes on it.

A caveat: standard Unix sed can't do this as described.  It can't retain
the unprocessed line for combining with the next input line.  I use a
sed "clone" which has more features, which can carry forward the data.

Yes, this is quite slow, but I have a computer that is 1000 times faster
than the old 8 bit systems.  It is plenty fast enough to save
programming effort this way.  (This is sort of like putting a blown big
block in a Hummer instead of fine tuning a WWII Jeep.)

The disassemblies on my web pages were produced with this "sed" method. 
The sed output was further post processed with both shell scripts and
other sed scripts.

-- 
Ludis Langens                               ludis (at) cruzers (dot) com
Mac, Fiero, & engine controller goodies:  http://www.cruzers.com/~ludis/