Calling all gurus

Tue Sep 8 13:23:51 GMT 1998

"Mike Pitts" <mpitts at mail.emi.net> wrote:
> as well as tell me what the following routine does
>
> F047    3C          PSHX                ; Store X
> F048    30          TSX                 ; X =3D (SP) + 1
> F049    A3 00       SUBD        $00,x   ; D =3D D - X[0]
> F04B    37          PSHB                ; store B
> F04C    E6 00 25    LDAB        $00,y   ; B =3D Y[0]
> F04F    25 03       BLO         $F054   ; if D < 0 goto $F054

> F051    3D          MUL                 ; D =3D A * B
> F052    20 04       BRA         $F058   ; goto $F058

> F054    3D          MUL                 ; D =3D A * B
> F055    A0 00 E3    SUBA        $00,y   ; A =3D A - Y[0]

> F058    E3 00       ADDD        $00,x   ; D =3D D + X[0]
> F05A    ED 00       STD         $00,x   ; X[0] =3D D
> F05C    32          PULA                ; get A from stack (was B)
> F05D    E6 00 3D    LDAB        $00,y   ; B =3D Y[0]
> F060    3D          MUL                 ; D =3D A * B
> F061    A9 01       ADCA        $01,x   ; A =3D A + X[1] + carry
> F063    16          TAB                 ; B =3D A
> F064    A6 00       LDAA        $00,x   ; A =3D X[0]
> F066    89 00       ADCA        #$00    ; A =3D A + carry
> F068    38          PULX                ; restore X
> F069    39          RTS                 ; return

typedef /*compiler dependent type goes here*/ int16;
typedef /*compiler dependent type goes here*/ int32;
typedef unsigned int16 uint16;

uint16 Blend(uint16 numD, uint16 numX, unsigned char *ratio)
    {
    unsigned int32 temp, portionD, portionX;

    portionD = *ratio;
    portionX = 256 - *ratio;

    temp = (numD * portionD) + (numX * portionX);
    return ((temp + 128) >> 8);
    }

Yes, really!  Now lets do a blow by blow analysis:

First, put the contents of X on the stack where it can be accessed
easier.  The address of where it is on the stack is placed into X so
that $00,x can be used to access the value.

    PSHX
    TSX

Now compute the difference between the X and D inputs.  Save the low
order byte on the stack for later.  Working on the difference means that
only one 8x16 multiply will be needed.

    SUBD  $00,x
    PSHB

If D >= X, in other words, if the difference is positive, multiply the
upper byte of the difference (in A) by the blend ratio.  Afterwards, the
value in D is aligned with bits 2^23 through 2^8 of the full 24 product
(of the difference and the blend ratio.)

    LDAB  $00,y
    BLO   @0
    MUL
    BRA   @1

If D < X, multiply the upper byte of the difference by the blend ratio. 
The subtract fixes the result of the unsigned multiply instruction to
account for the signed (negative) operand.  (Essentially, 256 is added
to A to make it positive, the multiply is performed on unsigned/positive
operands, and then 256 times the multiplier (old B) is subtracted to
arrive at the proper value.)

@0  MUL
    SUBA  $00,y

Sum the portion of the difference into the original X input.  This
effectively is doing a preemptive >>8 of the (yet) unfinished product
calculation.  If the original D >= X, then the value on the stack is
increased most of the way towards the final function result.  If the
original D < X, then the value on the stack is decreased to beyond the
final result - the final stage will increase it back up as needed.

@1  ADDD  $00,x
    STD   $00,x

Now multiply the (saved) lower byte of the difference by the ratio. 
This generates a result that needs to be added to bits 2^15 through 2^0
of the full 24 bit product.  Because this product will be >>8, only the
result in A is fully needed.  The low result byte in B will be used just
for rounding.  The 68HC11 will conveniently place this rounding info
into the C flag.

    PULA
    LDAB  $00,y
    MUL

Sum the rest of the product into the X number.  Rounding is rippled up
through the upper byte.

    ADCA  $01,x
    TAB
    LDAA  $00,x
    ADCA  #$00

Clear off the intermediate value stored on the stack, and return to the
caller.  Even though this looks like a restore of X, it isn't because
that stack location has been modified.  So, the function result is in D,
X has been trashed, Y has not been modified.

    PULX
    RTS

This function is likely used by loading a new 16 bit value into D, the
old filter contents into X, pointing Y at the blend ratio, invoking the
function, and then storing D into the filter.  A blend ratio number of 0
means that the "new" value is ignored, the filter value is never
changed.  A blend ratio of 128 moves the filter half way towards the
"new" value.

When filtering 8 bit numbers, it can be useful to make the filter have 8
fractional bits.  In that case, put the 8 bit value into A, zero out B
(or set it to 128 aka half), and then use the upper byte of the filter
value as the "output".

BTW, there are several ways to write this function.  The optimal
implementation depends a lot on the target processor, desired rounding,
and so on.  The 'HC11 is not an easy architecture for this function.  GM
probably "paid" $1000 to $2000 for this one function.

-- 
Ludis Langens                               ludis (at) cruzers (dot) com
Mac, Fiero, & engine controller goodies:  http://www.cruzers.com/~ludis/