Undeliverable Message

MAILER-DAEMON at seb.varian.com MAILER-DAEMON at seb.varian.com
Thu Nov 23 01:35:53 GMT 1995


To:            <diy_efi at coulomb.eng.ohio-state.edu>
Cc:            
Subject:       Undeliverable Message

Message not delivered to recipients below.  Press F1 for help with VNM
error codes.               

	VNM3043:  David Atchley at m2000@TFS_BLDG_6
To:            <diy_efi at coulomb.eng.ohio-state.edu>
Cc:            
Subject:       C for embedded controllers

Message not delivered to recipients below.  Press F1 for help with VNM
error codes.               

	VNM3043:  David Atchley at m2000@TFS_BLDG_6

From: Ed Lansinger <elansi01 at mpg.gmpt.gmeds.com>

|I have yet to see a meaningful difference in productivity for small
|projects such as I believe we are discussing here which is the
|result of language alone, assuming equal familiarity and practice with
|two languages in question.  The real differences seem to come through
|design tools, support tools such as simulators and debuggers, code re-use,
|and (for larger projects) effective project management.  But note that
|assembly language precludes none of that.

Ed, I agree for the most part; however, if we again limit our scope to the
time needed to code and debug then I still maintain that a higher
level language such as C is superior. (IMHO --- the M stands for "my"). 

|I agree with the sentiment that a high level debugger is a Good Thing,
|but I disagree with "unnecessary" and I don't find it "very hard".
|When debugging EFI in C or C++ I find I often go down to the assembly
|language level.  

If you are not familiar with something like gdb, it also debugs at the
assembly language level too (as it must since ultimately C is assembly).
My original comment should not have been interpreted as it being
unnecessary to examine C generated assembly but rather a high level
debugger encompasses the functionality of low level monitors while
maintaining the higher level advantages such as symbol tables.
The lower level monitor is replaced by the higher level debugger.

From: "Dan J. Declerck" <declrckd at cig.mot.com>

|Yes, but if you use a 68332 to the max of it capabilities, there are MANY
|performance mods you can do in assembly that you can't do in C.
|
|Don't get me wrong, I do most of my stuff in C, and use assembly when I
|want that extra burst of performance. for example:
|
|	xdef	memcpy
|	
|memcpy
|	movem.l	a0-a1/d0,-(a7)
|	move.l	(24,a7),d0	* size
|	movea.l	(16,a7),a0	* dest
|	movea.l	(20,a7),a1	* src
|
|	subq.l	#1,d0
|loop
|	move.b	(a1)+,(a0)+
|	dbra	loop,d0
|
|	movem.l	(a7)+,a0-a1/d0
|	rts
|

ok, Lets try:

#include <string.h>

void *memcpy(void *dest, const void *src, size_t n) {
  do {
    *(char *)(dest++) = *(char *)(src++);
  } while (--n);
  return (dest);
}
   7                    memcpy:
   8 0000 4E56 0000             link.w %a6,#0
   9 0004 226E 0008             move.l 8(%a6),%a1
  10 0008 206E 000C             move.l 12(%a6),%a0
  11 000c 202E 0010             move.l 16(%a6),%d0
  12                    .L7:
  13 0010 12D8                  move.b (%a0)+,(%a1)+
  14 0012 5380                  subq.l #1,%d0
  15 0014 66FA                  jbne .L7
  16 0016 2009                  move.l %a1,%d0
  17 0018 4E5E                  unlk %a6
  18 001a 4E75                  rts

execution time:

hand assembly:
			head/tail/cycles
move.b	(a1)+,(a0)+     2    2     6(0/1/1)
dbra	loop,d0		6    -2    10(0/2/0)

6+10-min(-2,2)-min(2,6)=6+10+2-2=16 clock cycles

gcc -O3:
			head/tail/cycles
move.b (%a0)+,(%a1)+	2    2    6(0/1/1)
subq.l #1,%d0		0    0    2(0/1/0)
jbne .L7		2    -2   8(0/2/0)

6+2+8-min(-2,2)-min(2,0)-min(0,2)=6+2+8+2-0-0=18 clock cycles

In this case C and assembly are the same size (the extra 2 bytes for
the return should be added to the above assembly); however, the assembly
is 11% faster as a result of better pipeline utilization. A good
place to use assembly.  I wrote this in maybe 45 seconds... it's
moderately slower but is of the same size.... Dan, how long did you
spend optimizing this by hand *AND* did you just luck-out in the
pipeline optimization? The 68000 core would run both of these codes at the
***same*** speed!!!!!!!!! (check my math Ed :).

(****emphasis**** added for clarity)

|I belive this snippet of code is the fastest memcpy() function on the
planet.

You should have a look at the memcpy in glibc. It looks at the CPU
data bus size and does word or long transfers for the case of the m68k. It
should be nearly twice as fast as your code for large moves. No
disrespect intended, but the memcpy written in C for glibc maybe twice
as fast as what you perceived as optimal assembly code... I
would have to check this, but I believe I have made my point.
Beware of those who promote assembly language :):).


|You can't:
|
|Modify the SR in C

yes you can... look at the source for RTEMS... It has some of the best
embedded C techniques that I've seen.

|Conditionally branch on some overflow conditions in C, either.

I agree with this, but it may not be a limitation.

|Use the Table lookup and interpolate instructions in the CPU32.

not true...

*********************

If this thread continues, could we try to focus on how to write better
embedded C and/or assembly?

loop structures, frame pointer omission, ....

                                   John S Gwynne
                                          Gwynne.1 at osu.edu
_______________________________________________________________________________
               T h e   O h i o - S t a t e   U n i v e r s i t y
    ElectroScience Laboratory, 1320 Kinnear Road, Columbus, Ohio 43212, USA
                Telephone: (614) 292-7981 * Fax: (614) 292-7297
-------------------------------------------------------------------------------






More information about the Diy_efi mailing list