The ZX-Spectrum screen layout: Part III

The code

In previous posts I’ve discussed the Spectrum’s attribute map, and screen layout. Give those posts a read over if you’ve not done so already, it’ll help you understand what I’m about to cover.

In this post I’m going to describe two ways to to find the pixel address of a screen location. Both of these solutions address the same problem: Given a pixel y address (0..191) in register B and a character x address (0..31) in register C, calculate the screen address that represents those coordinates and return it in HL. It’s assumed the subroutine will trash all registers.

The two approaches to solving this problem, are calculating the address programmatically and using a look up table. Once we’ve covered the implementations, we’ll talk about the relative performance and trade-offs in storage, time and complexity.

Calculating a screen address

The Spectrum’s screen memory starts at #4000 so the most significant three bits of our address will always be 010. The 5 least significant bits will always be the X (column) address. The 8 bits from 5 - 12 represent the pixel Y, but not in the way you might imagine.

15141312111098765 43210
0 1 0 Y7 Y6 Y2Y1Y0 Y5Y4Y3 X4 X3 X2 X1 X0

The first two bits of the y address (Y0 ) and Y1) have been picked up and dropped into the middle of the other 6 bits of the address. This is part of the reason why the spectrum screen address calculation is a strange beast.

However putting the first two bits of the screen Y coordinate into the first two bits of the upper byte of the address, is why adding #100 to the address of a character moves down one character row on the screen.

The subroutine to calculate the address from the coordinates as set out above, is:

Instruction T M Comment
ld a,b 41; Work on the upper byte of the address
and %00000111 72; a = Y2 Y1 y0
or %01000000 72; first three bits are always 010
ld h,a 41; store in h
ld a,b 41; get bits Y7, Y6
rra 41; move them into place
rra 41;
rra 41;
and %00011000 72; mask off
or h 41; a = 0 1 0 Y7 Y6 Y2 Y1 Y0
ld h,a 41; calculation of h is now complete
ld a,b 41; get y
rla 41;
rla 41;
and %11100000 72; a = y5 y4 y3 0 0 0 0 0
ld l,a 41; store in l
ld a,c 41;
and %00011111 72; a = X4 X3 X2 X1
or l 41; a = Y5 Y4 Y3 X4 X3 X2 X1
ld l,a 41; calculation of l is complete
ret 101

For a total of 105 T-States in 26 bytes of memory.

Looking up a screen address

Instead of calculating the screen address every time we need it, a better alternative may be pre-calculating the results and placing them in a lookup table.

In current programming terms we store the address of the first pixel in each screen row, in an array (let’s call it screen_map). We then calculate the address as screen_map[y*2] + x. The multiplier of 2 is because it is an array of bytes and the addresses are words.

I remember writing a basic program to print the hex addresses for the first pixel of each screen row and write it to the Sinclair printer. Then spinning up my assembler (from tape) and entering in the values by hand.

Image Copyright: Jbattersby. Open sourced

The code to perform our address translation (remember B is the Y coordinate and C the character X) becomes:

Instruction T M Comment
ld h, 0 7 2
ld l, b 4 1 ; hl = Y
add hl, hl 11 1 ; hl = Y * 2
ld de, screen_map 10 3 ; de = screen_map
add hl, de 11 1 ; hl = screen_map + (row * 2)
ld a, (hl) 7 1 ; implements ld hl, (hl)
inc hl 6 1
ld h, (hl) 7 1
ld l, a 4 1 ; hl = address of first pixel from screen_map
ld d, 0 7 2
ld e, c 4 1 ; de = X
add hl, de 11 1 ; add the char X offset
ret 10 1 ; return screen_map[Y*2] + X
screen_map: .defw #4000, #4100, #4200, #4300, #4400, #4500, #4600, #4700, #4020, #4120, #4220, #4320

That’s 99 T-States and 401 bytes of memory (17 bytes of code and 384 bytes for the lookup table.

I’m not a mean spirited guy. If you want to play along at home here’s a link to a gist that contains the code and more importantly the lookup table!

Space time trade off

So which one of these approaches is the best? The answer is, as usual, it depends. Let’s compare the results

Approach Lines of code T-States Total memory
Calculation 21 105 26
Look up 13 99 384

The calculated approach is slower (about 6%) more complex (61% longer method) but really efficient in memory (14 times less memory!). So for space constrained applications like ROMs, calculation is the approach to take.

However for games, well, speed is king. Even small margins make a difference, especially in crucial areas like screen rendering. So it’d be a rare spectrum game that didn’t use techniques like this.

As a simple example of the difference in timing of these two approaches I coded a race. On the left the contender is calculated addresses and on the right; lookup tables. Each function is tested by filling the screen with pixels many times, alternating the border colour after each iteration.

You can see that by the end of the test, the lookup table function was in the lead by about a second. Winer, Winer, Chicken Dinner :-)