The code

In previous posts I’ve discussed the Spectrum’s attribute map, and screen layout. Give those posts a read over if you’ve not done so already, it’ll help you understand what I’m about to cover.

In this post I’m going to describe two ways to to find the pixel address of a screen location. Both of these solutions address the same problem: Given a pixel y address (0..191) in register B and a character x address (0..31) in register C, calculate the screen address that represents those coordinates and return it in HL. It’s assumed the subroutine will trash all registers.

The two approaches to solving this problem, are calculating the address programmatically and using a look up table. Once we’ve covered the implementations, we’ll talk about the relative performance and trade-offs in storage, time and complexity.

Calculating a screen address

The Spectrum’s screen memory starts at #4000 so the most significant three bits of our address will always be 010. The 5 least significant bits will always be the X (column) address. The 8 bits from 5 - 12 represent the pixel Y, but not in the way you might imagine.

15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	1	0	Y₇	Y₆	Y₂	Y₁	Y₀	Y₅	Y₄	Y₃	X₄	X₃	X₂	X₁	X₀

The first two bits of the y address (Y₀ ) and Y₁) have been picked up and dropped into the middle of the other 6 bits of the address. This is part of the reason why the spectrum screen address calculation is a strange beast.

However putting the first two bits of the screen Y coordinate into the first two bits of the upper byte of the address, is why adding #100 to the address of a character moves down one character row on the screen.

The subroutine to calculate the address from the coordinates as set out above, is:

Instruction	T	M	Comment
ld a,b	4	1	; Work on the upper byte of the address
and %00000111	7	2	; a = Y2 Y1 y0
or %01000000	7	2	; first three bits are always 010
ld h,a	4	1	; store in h
ld a,b	4	1	; get bits Y7, Y6
rra	4	1	; move them into place
rra	4	1	;
rra	4	1	;
and %00011000	7	2	; mask off
or h	4	1	; a = 0 1 0 Y7 Y6 Y2 Y1 Y0
ld h,a	4	1	; calculation of h is now complete
ld a,b	4	1	; get y
rla	4	1	;
rla	4	1	;
and %11100000	7	2	; a = y5 y4 y3 0 0 0 0 0
ld l,a	4	1	; store in l
ld a,c	4	1	;
and %00011111	7	2	; a = X4 X3 X2 X1
or l	4	1	; a = Y5 Y4 Y3 X4 X3 X2 X1
ld l,a	4	1	; calculation of l is complete
ret	10	1

For a total of 105 T-States in 26 bytes of memory.

Looking up a screen address

Instead of calculating the screen address every time we need it, a better alternative may be pre-calculating the results and placing them in a lookup table.

In current programming terms we store the address of the first pixel in each screen row, in an array (let’s call it screen_map). We then calculate the address as screen_map[y*2] + x. The multiplier of 2 is because it is an array of bytes and the addresses are words.

I remember writing a basic program to print the hex addresses for the first pixel of each screen row and write it to the Sinclair printer. Then spinning up my assembler (from tape) and entering in the values by hand.

Image Copyright: Jbattersby. Open sourced

The code to perform our address translation (remember B is the Y coordinate and C the character X) becomes:

Instruction	T	M	Comment
ld h, 0	7	2
ld l, b	4	1	; hl = Y
add hl, hl	11	1	; hl = Y * 2
ld de, screen_map	10	3	; de = screen_map
add hl, de	11	1	; hl = screen_map + (row * 2)
ld a, (hl)	7	1	; implements ld hl, (hl)
inc hl	6	1
ld h, (hl)	7	1
ld l, a	4	1	; hl = address of first pixel from screen_map
ld d, 0	7	2
ld e, c	4	1	; de = X
add hl, de	11	1	; add the char X offset
ret	10	1	; return screen_map[Y*2] + X

screen_map: .defw #4000, #4100, #4200, #4300, #4400, #4500, #4600, #4700, #4020, #4120, #4220, #4320

That’s 99 T-States and 401 bytes of memory (17 bytes of code and 384 bytes for the lookup table.

I’m not a mean spirited guy. If you want to play along at home here’s a link to a gist that contains the code and more importantly the lookup table!

Space time trade off

So which one of these approaches is the best? The answer is, as usual, it depends. Let’s compare the results

Approach	Lines of code	T-States	Total memory
Calculation	21	105	26
Look up	13	99	384

The calculated approach is slower (about 6%) more complex (61% longer method) but really efficient in memory (14 times less memory!). So for space constrained applications like ROMs, calculation is the approach to take.

However for games, well, speed is king. Even small margins make a difference, especially in crucial areas like screen rendering. So it’d be a rare spectrum game that didn’t use techniques like this.

As a simple example of the difference in timing of these two approaches I coded a race. On the left the contender is calculated addresses and on the right; lookup tables. Each function is tested by filling the screen with pixels many times, alternating the border colour after each iteration.

You can see that by the end of the test, the lookup table function was in the lead by about a second. Winer, Winer, Chicken Dinner :-)

Overtaken by events

Let's play with the new shiny, shiny thing...

The ZX-Spectrum screen layout: Part III

The code

Calculating a screen address

Looking up a screen address

Space time trade off