Optimizing DirectDraw
A short overview
Written by Daniel Kastenholz
Monday, April 19, 1999
DirectDraw IS slow, believe me. I know, there are people who proclaim the opposite. Yes, DirectDraw HAS direct hardware support,
it HAS DMA access, and: hey, how nice, it even comes with ready-to-use sprite code. And indeed, if you're planning nothing HUGE, you
might get quite lucky with DirectDraw. But have you ever noticed what happens, when video memory is exhausted and your bitmaps are loaded
into system memory? Your framerate will fall, fall, fall. Every blit from system to video memory becomes a torture. To avoid such situations,
you should follow these rules:
Always watch your memory state!
Don't place bitmaps in video memory that NEEDN'T be there, such as menu graphics! Video memory is for ingame graphics -
for sprites, fonts, background textures, not for your title screen or your fullscreen credits bitmap!
Don't blit!
This is NO typing error. I suggest you NOT to blit. (With one exception: When you're copying from video memory to video memory, grin...) But in
ALL other cases (that is: VIDMEM -> SYSMEM, SYSMEM -> VIDMEM, and SYSMEM -> SYSMEM), it is NO good idea to use the standard
blitting functions. In many cases, things will run faster if you LOCK both surfaces and use "memcpy" to copy your bitmap line by line, although this isn't the
best way to get things managed, as you will learn further below.
Don't lock!
Those damn locks are one of the most confusing things about DirectDraw. Imagine you want to copy "normal" data from "a" to "b", - a string,
for example. Do you LOCK "a" and "b" before copying? No, of course not. But why do you lock system memory surfaces before accessing them?
For good old Bill's men want you to? Hey, don't be a fool. Did you really believe the position of your bitmaps in system memory would ever change
if you don't want? A system memory surface is nothing else than some bytes in memory, previously allocated by "malloc", and it behaves exactly the
same way. Now, as you know about that, you should try the following: When you create your system memory surface, lock it, store the obtained
pointer, and unlock the surface. From now on, just use the pointer to access your surface directly, and forget about further locks! - Oh, and the
funniest thing: As you unlocked the surface after receiving its pointer, you can still use the original blit funtions (if you have to, grin...). Nice, isn't it?
But please don't try this on video memory surfaces. ;-)
Copy on your own!
If you successfully implemented the prior tricks in your game, it might already be faster at this time. Not fast enough, yet, if you ask me. In step 2,
you were told to use "memcpy" instead of blits. Of course, that was a good thing, but nothing is better than true hard-coded source. And that means:
Take an assembler like MASM (or the C++ inline assembler, if you don't have MASM, but I prefer MASM) and write your own copy functions.
If you do it the right way, you can avoid some overhead again by integrating the "y loop" (see above) and all this pitch stuff into your copy function
directly. For REAL cool results, read an article about self-modifying code!
Take advantage of common standards!
As you know, a common CPU works with 32-bit registers. But today, most CPUs are equipped with MMX technology. Therefore, it would
be a mistake not to make use of it. Using MASM, it takes just a few lines of code (and the MMX include file, grin...) to write code that copies
bitmaps using 64 bits instead of 32, as MMX registers are as double as large as a standard register. Try this technique, and you can reach speed-ups
up to 25%!
Use your display buffers wisely!
Before you start a new project, you should have a short brainstorm to explore how many display buffers you need and where to put them.
A primary ("front") buffer should be stored in video memory. Although it is possible to create a primary buffer in system memory on SOME
graphic boards, I'm not sure whether this is supported by the MAJORITY of boards on the market. If possible, you should store your secondary
buffer in video memory, too, and lock it for drawing operations. When you're finished, unlock it and perform a flip. That's the method I made best
experience with.
Stay watchful!
Don't believe everything Big Bill tells you. Still, there are dozens of chances to optimize your code - those DirectDraw sprite functions, for example,
will suddenly seem extremely slow to you if you got one day managed to write your own ones - such ones, that fit exactly to their purpose and contain
as few overhead as possible. My special tip is: Write special code for every bit depth you work with (although it isn't easy to programm a FAST sprite
engine for 24 bit mode, at all, hehe...).