C for embedded controllers

Dan J. Declerck declrckd at cig.mot.com
Mon Nov 27 15:43:08 GMT 1995


> From: Ed Lansinger <elansi01 at mpg.gmpt.gmeds.com>

> |I have yet to see a meaningful difference in productivity for small
> |projects such as I believe we are discussing here which is the
> |result of language alone, assuming equal familiarity and practice with
> |two languages in question.  The real differences seem to come through
> |design tools, support tools such as simulators and debuggers, code re-use,
> |and (for larger projects) effective project management.  But note that
> |assembly language precludes none of that.

> Ed, I agree for the most part; however, if we again limit our scope to the
> time needed to code and debug then I still maintain that a higher
> level language such as C is superior. (IMHO --- the M stands for "my"). 

> |I agree with the sentiment that a high level debugger is a Good Thing,
> |but I disagree with "unnecessary" and I don't find it "very hard".
> |When debugging EFI in C or C++ I find I often go down to the assembly
> |language level.  

> If you are not familiar with something like gdb, it also debugs at the
> assembly language level too (as it must since ultimately C is assembly).
> My original comment should not have been interpreted as it being
> unnecessary to examine C generated assembly but rather a high level
> debugger encompasses the functionality of low level monitors while
> maintaining the higher level advantages such as symbol tables.
> The lower level monitor is replaced by the higher level debugger.

> From: "Dan J. Declerck" <declrckd at cig.mot.com>

> |Yes, but if you use a 68332 to the max of it capabilities, there are MANY
> |performance mods you can do in assembly that you can't do in C.
> |
> |Don't get me wrong, I do most of my stuff in C, and use assembly when I
> |want that extra burst of performance. for example:
> |
> |	xdef	memcpy
> |	
> |memcpy
> |	movem.l	a0-a1/d0,-(a7)
> |	move.l	(24,a7),d0	* size
> |	movea.l	(16,a7),a0	* dest
> |	movea.l	(20,a7),a1	* src
> |
> |	subq.l	#1,d0
> |loop
> |	move.b	(a1)+,(a0)+
> |	dbra	loop,d0
> |
> |	movem.l	(a7)+,a0-a1/d0
> |	rts
> |

> ok, Lets try:

> #include <string.h>

> void *memcpy(void *dest, const void *src, size_t n) {
>   do {
>     *(char *)(dest++) = *(char *)(src++);
>   } while (--n);
>   return (dest);
> }
>    7                    memcpy:
>    8 0000 4E56 0000             link.w %a6,#0
>    9 0004 226E 0008             move.l 8(%a6),%a1
>   10 0008 206E 000C             move.l 12(%a6),%a0
>   11 000c 202E 0010             move.l 16(%a6),%d0
>   12                    .L7:
>   13 0010 12D8                  move.b (%a0)+,(%a1)+
>   14 0012 5380                  subq.l #1,%d0
>   15 0014 66FA                  jbne .L7
>   16 0016 2009                  move.l %a1,%d0
>   17 0018 4E5E                  unlk %a6
>   18 001a 4E75                  rts

> execution time:

> hand assembly:
> 			head/tail/cycles
> move.b	(a1)+,(a0)+     2    2     6(0/1/1)
> dbra	loop,d0		6    -2    10(0/2/0)

> 6+10-min(-2,2)-min(2,6)=6+10+2-2=16 clock cycles

> gcc -O3:
> 			head/tail/cycles
> move.b (%a0)+,(%a1)+	2    2    6(0/1/1)
> subq.l #1,%d0		0    0    2(0/1/0)
> jbne .L7		2    -2   8(0/2/0)

> 6+2+8-min(-2,2)-min(2,0)-min(0,2)=6+2+8+2-0-0=18 clock cycles

> In this case C and assembly are the same size (the extra 2 bytes for
> the return should be added to the above assembly); however, the assembly
> is 11% faster as a result of better pipeline utilization.

It's faster than 11%! (about 20%), you forgot to REMOVE the instruction fetch cycles
for MOVE, since after the first iteration, it is no longer fetched!
I challenge you to use an emulator/logic analyzer and look at the bus cycles.

 A good
> place to use assembly.  I wrote this in maybe 45 seconds... it's
> moderately slower but is of the same size.... Dan, how long did you
> spend optimizing this by hand *AND* did you just luck-out in the
> pipeline optimization? 

No, it took me about 30 seconds to write, the most difficult aspect was getting
the stack offset correct for loading the parameters into registers (understand
that I have over 10 years experience with 68K).

I love C, and use it 99% of the time. There are places where it doesn't belong.
I wrote an OS for a Cellular phone product. This OS was 1/8th the size of VRTX,
and was a LOT faster. The OS had only 4 assembly routines, the most tricky being
the task switcher. The other 40 or so routines were in C.

In my opinion, use C, and then call assembly for ultimate performance.


The 68000 core would run both of these codes at the
> ***same*** speed!!!!!!!!! (check my math Ed :).

Yes, but we're using a CPU32 core now, aren't we?? (last I checked, the 68332 used this core)


> (****emphasis**** added for clarity)

> |I belive this snippet of code is the fastest memcpy() function on the
> planet.

> You should have a look at the memcpy in glibc. It looks at the CPU
> data bus size and does word or long transfers for the case of the m68k. It
> should be nearly twice as fast as your code for large moves. 

A majority of memcpy() calls are short, but it would be trivial to go larger
and do it faster.

No
> disrespect intended, but the memcpy written in C for glibc maybe twice
> as fast as what you perceived as optimal assembly code... I
> would have to check this, but I believe I have made my point.
> Beware of those who promote assembly language :):).

> |You can't:
> |
> |Modify the SR in C

If it's not in K&R, it's an extension and is not portable compiler to compiler.

Avoid the C programmer who cannot switch compilers in a day or less of work. :)


> yes you can... look at the source for RTEMS... It has some of the best
> embedded C techniques that I've seen.

> |Conditionally branch on some overflow conditions in C, either.

> I agree with this, but it may not be a limitation.

Try doing bit interleaving for a serial bus without it.
Try doing precise math without it.

> |Use the Table lookup and interpolate instructions in the CPU32.

> not true...

What compiler emits TBLUN instructions??


> *********************

> If this thread continues, could we try to focus on how to write better
> embedded C and/or assembly?

> loop structures, frame pointer omission, ....

In summary, I understand why automotive electronics components are so expensive:

1) Little SW/HW re-use in the industry.
2) Long development cycles.
3) No software artisans developing product.


I ought to go into business selling black boxes to the big 3...

-Dan
 



-- 
=> Dan DeClerck                        | EMAIL: declrckd at cig.mot.com      <=
=> Motorola Cellular CSD               |                                  <=
=>"The truth to CDMA... is spreading"  | Phone: (708) 632-4596            <=
----------------------------------------------------------------------------



More information about the Diy_efi mailing list