M6811DIS Disassembler

Thu Jul 15 20:37:18 GMT 1999

jgwynne at mrcday.com wrote:
> 
> | relative, it won't matter if the matched code is relocated or not...
> 
> yes... depending on how you make the signature, it sound similar to
> what I was thinking but without the function boundaries. I wrote a
> couple of classes over the weekend to hold and compare assembly source
> code. At the moment, I thresholding correlation length to find common
> code.

Are you comparing or ignoring operands?  It is very likely that the same
function in two different PROMs will reference different RAM/PROM
locations.  The purpose of those locations hasn't changed, just the
absolute address has moved around between different builds.

Years ago I wanted to compare disassembled 680x0 code.  I had a linker
xref from which to obtain the "public" function names.  Local labels
(sequentially numbered) were used as much as possible within functions. 
The remaining generated (non-local) labels were replaced with a hash
derived from the previous public label.  This way all absolute code
addresses were hidden.

Because the above was done in text files, a simple text file compare
program could find differences between code revisions.  Overall, this
didn't work too well.  Usually, a one or two line bug fix would have
affects throughout a function.  The result is that I got buried in
output from the file diff utility.

Tom Sharpe wrote:
>
> As you are
> planning to identify patterns, how about translating some of them into
> macros and "shorten" the code.

The disassemblies on my web pages use macros for common code fragments. 
For 6801 code, bit (flag aka boolean) manipulation macros are useful. 
Also useful (for all 68xx) are a set of macros that clip
over/underflows.  In other words,  A := MIN(A + B, 255);  and  A :=
MAX(0, A - B);  .  This hides many branch opcodes and the labels they require.

> We can then change the
> code at will and generate new BIN files for downloading. We can change more
> than the tables......

I've got the '7170 code disassembled to the point where every memory
location, every flag bit, and every constant is referenced via an
EQUate.  The code also has ASSERT statements in every place where
EQUates are assumed to have certain relationships.  It assembles back to
the original binary.  If 2/3 of it wasn't in ROM, I'd have already been
making changes all over the place!

jgwynne at mrcday.com wrote:
> 
> This brings up a good question that I've been pondering for a couple
> of days. In comparing axxc (manual) and anht (auto), it looks like we
> could easily add a few lines of conditional assembly to have one
> commented source code that could generate several different bins (code
> segment, that is). So... what's the best was to organize and catalog
> the various routines. one source file and conditional compilation (ok
> if only a few changes between all of the codes), code library (gnu
> tools already in place), macro assembly,.... what else...

Conditional assembly is good.  That's what I've used to put all 15 '7170
PROMs into one disassembly file.  Generating readable conditional
assembly is hard to automate though.

Libraries wouldn't work good - only 1% of the code is modular enough to
place in a seperate file.  When you are curious about how a
variable/constant is used, it is _very_ useful to have all the code in
one file.  This allows a text editor's FIND command to show all the uses.

Macros are good.  I've limited my macros to only those that appear to be
CISC opcodes.  Mixing a psuedo high level language with assembly on a
statement by statement basis would be hard to read.  I won't use a macro
where a "hidden" side affect is relied upon.  Some macros actually
generate no code - they simply make these side affects visible.

A function xref and/or calling tree is useful.

All this leads to a question on a change I might make to my
disassemblies.  Because my HC11 assembler is really just a set of macros
running in a different assembler, I can change how opcodes are
assembled.  So, as ECM code gets more complex, GM is increasingly doing this:

Foo ...
Bar ...
Zot ...
    ...
    LDX #Foo
    LDAA 0,X
    LDAB 1,X
    ADDA 2,X

When fully EQUated, the code should be:

    LDX #Foo
    LDAA Foo-Foo,X
    LDAB Bar-Foo,X
    ADDA Zot-Foo,X

This gets hard to read!  At least it is easy to search for all
references to Zot.  I could have the code promise the assembler that X
will contain a certain value.  Any extended mode operand which is within
255 bytes of the index value could then get assembled to use the indexed
addressing mode.  The code would be something like this:

    LDX #Foo
    DIRECTX Foo
    LDAA Foo,X
    LDAB Bar,X
    ADDA Zot,X
    DIRECTX

Would this be a good feature?  Or would it be too hard to run through
other assemblers?

-- 
Ludis Langens                               ludis (at) cruzers (dot) com
Mac, Fiero, & engine controller goodies:  http://www.cruzers.com/~ludis/