GP-TRICKS Logo

Scalar's GP-TRICKS Homepage

Site MapTutorial SectionWriting a Sprite Compiler


A Different Way of Blitting

Writing a Sprite Compiler



Introduction

The first graphics tool I have ever written, still back on my old CP/M computer, was a vector graphics program that generated BASIC source to directly draw lines, boxes, and circles. Although the graphics that could be done with the program - and with the computer at all - was very simple, this scheme worked amazingly well. There was no need to load a file with single pixels, and the BASIC commands could be executed reasonably fast. And since there was no standard bitmap file format, why invent one if it was disadvantageous?

Then came the time of the PC, with well standardized formats and comfortable software. I quickly abandoned self-made graphics tools and resorted to higher quality software. In the old DOS days, PCX seemed to be the format all programs supported, and so it also became my graphics exchange format of choice.

Loading a bitmap and blitting it to memory is easy. If no transparency (on pixel basis, of course) is required, it all boils down to copying chunks of memory. And even if pixel transparency is needed, in a palettized graphics mode it's still easy: You just define one of the colors available as being transparent and skip it. Here are some bytes unequal to the transparent color, draw them. Here comes one that is equal to the transparent color, skip it. Here comes another one; and another one. And so on.

This gave some smart people something to think. In order to determine if a pixel is to be drawn or not, the program has to compare the pixel's color to the transparent color. In assembler, it looks like this:

Listing 1
Assembler (x86)
lodsb      ; load byte
cmp al,255 ; compare it to transparent color
jz @skip   ; if equal, skip byte
stosb      ; otherwise draw it

Remarkable about this little piece of code, which might well be found in many inner loops, is that even if no pixel is drawn - if nothing is done - the first three instructions will still be executed. "But how should the blitter know that the pixel is transparent before checking it?" you might ask, and you are right. The program can't know, but this situation is still something to complain about. You are wasting cycles. What the program needs, of course, is foresight. And as usual in programming, this foresight has to be built in by the programmer.


Description of a Sprite Compiler

What a sprite compiler does is examine all pixels whether they have to be drawn. If so, the assembler code necessary for this task if generated, otherwise not. It's this simple: a sprite compiler converts a bitmap into executable code doing the same job of blitting. What you end up with is code that only draws the pixels that it needs to draw, and the only overhead remaining is there to calculate the memory address of the top left corner of the bitmap. And if that is fixed, too, you could even wipe it out, too. Here's an excerpt from a compiled sprite:

Listing 2
Assembler (x86)
mov ax,y                     ; top column to ax
mov bx,320                   ; line width to bx
mul bx                       ; multiply with line width
add ax,x                     ; add x position to address
les di,VGA_DirectScreen      ; Get efficent address of screen
add di,ax                    ; add pixel offset to address
mov word ptr[es:di+62],33153 ; write 33153 to address 62 bytes...
mov word ptr[es:di+64],33410 ; ...away from top left corner
mov word ptr[es:di+66],33410
mov byte ptr[es:di+68],129
@line1:
mov word ptr[es:di+378],33153
mov word ptr[es:di+380],33410
mov word ptr[es:di+382],33410
mov word ptr[es:di+384],33667


Efficiency Considerations

This is the beginning of a compiled sprite with dimensions 90 by 60. As you can see, only seven pixels are drawn in the first line, six of them are handled by word transfers. Pretty efficient, huh? Maybe my idea of generating BASIC code wasn't that bad after all. And for most sprites, the results will be similar since non-transparent pixels are consecutive most of the time: In the example above, I had 1263 word transfers and only 207 byte transfers.

The downside of compiled sprites, however, are enormous memory requirements. The beforementioned 90 by 60 bitmap takes up 5400 bytes in an 8-bit color mode. The compiled sprite, even though half of the pixels were omitted, consumes 10104 bytes of memory! This is due to the fact that a byte transfer is coded with six bytes and a word transfer with seven. You know what that means if only small parts of the bitmap are transparent...

Before we get to talk about writing a sprite compiler, let me mention that you have to consider the advantages and disadvantages of a compiled sprite carefully. A compiled sprite is good for small sprites with a lot of transparency that have to be drawn very often, like a mouse cursor. Big bitmaps, like backdrop screens, don't lend themselves to be compiled. Sprites with no or only little transparency will also love you for being blitted the normal way. And bitmaps that need to be changed during runtime are also sure compilation killers. Although it is possible to compile sprites in runtime using techniques similar to self-modifying code, the hassles and the time required to generate the code usually outweight the benefits.


Program Discussion

So, what exactly is the program doing? At the very beginning of execution, the command line parameters are checked, of which there need to be six: the name of a PCX file, the name of the Pascal file to be created, the Pascal procedure name, the X and Y dimensions of the sprite, and the transparent color. Error checking and help is coming a bit too short in the listing below, but that is only due to limited web space.

Then the program opens the output textfile and writes the procedure header and a few lines of assembler that calculate the address of the top left corner of the sprite by doing a calculation like address:=screen_address + y * line_width + x.

Next, graphics mode 13h is initialized and the PCX file is loaded. The area not occupied by the sprite is filled with the transparent color to ensure that no pixels are drawn there.

Finally, the main loop, which cycles through all lines of the sprite, begins. Within the loop, the first thing to do is to calculate the memory offset from the top left of the sprite to the beginning of the new line (offset:=line_nr * line_width). Then the pixels are processed on a two-at-a-time basis. This is done because the program employs word transfers if possible. If it were using DWORD transfers, four bytes would be examined at once. The pixels are loaded into a structure that mimics a word. There are four cases that can occur:

  1. both pixels are transparent
  2. both pixels are non-transparent
  3. the first pixel is non-transparent, the second one transparent
  4. and vice versa
The first case is what we like most: Nothing needs to be done. The second one involves copying a WORD containing both pixels onto screen, and the necessary assembler code is generated. Take notice that the pixel data is supplied as an immediate value; there is no need to load it from memory:
mov word ptr[es:di+xxxxx],yyyyy
In the third and fourth case, either the first byte or the second byte is copied to screen. Again, the byte is given as an immediate:
mov byte ptr[es:di+xxxxx],yyy
The last thing to do inside the loop is to advance the offset counter by two, because two pixels were processed. When all lines are processed, the Pascal procedure is completed with an "end" statement, and the output file is closed.


Possible Improvements

This program is by no means perfect. In this version, for example, you run into problems if the sprite has to be clipped against the top or bottom edge of the screen. Including above and below clipping is easy, though, and does almost no harm to performance; it has been implemented in the full version of the sprite compiler not published below.

Another way to improve this sprite compiler is to switch from word transfers to DWORD transfers. Then, you would have some more cases to differentiate, but that is not too hard, either. Borland Pascal just makes it very difficult to use DWORD transfer, or any 32-bit assembler code, and I had some trouble figuring out the right way to use the OPSIZE prefix without getting operand size mismatches from Borland Pascal. Of course, you could generate external ASM files and link them. With other compilers, though, there are no problems. Data and code alignment also play a crucial role on newer processors. Because of that, it might be interesting to consider alignment issues with compiled sprites. Is it advantageous to use only aligned DWORD transfers? Research in this area still needs to be done.

The cache is also one of the factors that can really slow down compiled sprites. Because of the huge code size, it might be difficult to have a sprite reside in first or second level cache without getting cache misses elsewhere.

All these things, though, are left as an "exercise for the reader". If you find out something interesting, I would love to hear from you. When I consider how old and still powerful this technique is, I really marvel about where it is going to take us...

Listing 3
Borland Pascal 7.0
program SpriteCompiler;
uses Crt,VGALight,Utils;
type TWORD = array[1..2] of byte;
var outf:text;
 pron:string;
 clipcol:integer;
 clr:TWORD;
 xsiz,ysiz,pixofs,xp,yp:word;
 a:integer;

procedure Help(s:string);
begin
 WriteLn(s,#10#13); halt;
end;

procedure WriteHeader
begin
 WriteLn(outf,'procedure '+pron+'(x,y:integer); assembler;');
 WriteLn(outf,'asm');
 WriteLn(outf,'   mov ax,y; mov bx,320; mul bx');
 WriteLn(outf,'   add ax,x; les di,VGA_Buffer; add di,ax');
end;

procedure ParamCheck;
begin
 if ParamCount<6 then Help('Error: Not enough parameters');
 if not Found(ParamStr(1)) then Help('Error: Input file not found');
 pron:=ParamStr(3);
 Val(ParamStr(4),xsiz,a);
 Val(ParamStr(5),ysiz,a);
 Val(ParamStr(6),clipcol,a);
end;

procedure Done;
begin
 WriteLn(outf,'end;');
 Close(outf); VGADone;
 TextAttr:=$07; Crt.ClrScr;
end;

begin
 ParamCheck;
 Assign(outf,ParamStr(2)); Rewrite(outf); WriteHeader;
 { initialize VGA mode 13h and load bitmap }
 VGAInit; LoadPCX(ParamStr(1),1,1,true);
 { fill unused area with clipping color }
 Bar(xsiz+1,1,320,200,clipcol); Bar(1,ysiz+1,xsiz,200,clipcol);

 { initialize buffer }
 FillChar(clr[1],2,0);

 { for every line do... }
 for yp:=0 to ysiz-1 do
 begin
  { calculate new offset }
  pixofs:=yp*320;
  { for every two X pixels do... }
  for xp:=0 to (xsiz div 2) do
  begin
   { for every buffer byte do... }
   for a:=1 to 2 do
   begin
    { fill buffer }
    clr[a]:=GetPix(xp*2+a,1+yp);
   end;
   { case 1: trans/trans; do nothing }

   { case 2: solid/solid; word transfer }
   if (clr[1]<>clipcol) and (clr[2]<>clipcol) then
   begin
    WriteLn(outf,'   mov word ptr[es:di+',pixofs,'],',WORD(clr));
   end;

   { case 3: solid/trans; set first byte }
   if (clr[1]<>clipcol) and (clr[2]=clipcol) then
   begin
    WriteLn(outf,'   mov byte ptr[es:di+',pixofs,'],',clr[1]);
   end;

   { case 3: trans/solid; set 2nd byte }
   if (clr[1]=clipcol) and (clr[2]<>clipcol) then
   begin
    WriteLn(outf,'   mov byte ptr[es:di+',pixofs+1,'],',clr[2]);
   end;

   inc(pixofs,2);
  end;
 end;
 Done;
end.

The program as such does not compile; some functions inside the unit VGALight are required, namely VGAInit and VGADone to initialize and exit the VGA mode 13h, GetPix to read a pixel from screen, Bar to draw a solid bar on screen, and finally LoadPCX to load a PCX file into video memory. If you have trouble getting those functions written, I would happily send you the VGALight library per e-mail.



Back to Top of Page


This article is © 1998-2008 by M. G. Ricken and was exclusively written for GP-Tricks.
Copyright © 1998-2008 by M.G.Ricken        E-Mail: Scalar@psynet.net     |     mgricken@gptricks.de