The Demo Corner: Stretching Sprites by Pasi 'Albert' Ojala (po87553@cs.tut.fi) (All timings are in PAL, principles will apply to NTSC too) You might have heard that it is possible to expand sprites to more than twice their original size. Imagine a sprite scroller with 6-times expanded sprites. However, there is no need to expand all of them equally. Using this technique, it is possible to make easy sinus effects and constantly expanding and shrinking letters. The VIC (video interface controller) may be fooled in many things. One of them is the vertical expansion of sprites. If you clear the expand flag and then set it back straight away, VIC will think it has only displayed the first one of the expanded lines. If we do the trick again, VIC will continue to display the same data again and again. But why does VIC behave like this ? _Logic gates will tell the truth_ It is not really a bug, but a feature. The hardware design to implement the vertical enlargement was just as simple as possible. Those, who do not care about hardware should skip this part... The whole y-enlargement is handled with five simple logical ports. Each sprite has an associated Set-Reset flip-flop to tell whether to jump to the next sprite line (add three bytes to the data counter) or not. Let's call the state of the flip-flop Q and the inputs R (reset) and S (set). The function of a SR flip-flop is quite simple: if R is one, Q goes to zero, if S is one, Q goes to one. Otherwise the state of the flip-flop does not change. In this case the flip-flop is Set, if either the Y-enlargement bit is zero or the state of the flip-flop is zero at the end of a scan line. The flip-flop is reset, if both the state and the Y-enlargement are ones at the end of the line. When you clear the bit in the vertical expansion register, the flip-flop will be set regardless of the electron beam position on the scan line. If you set the bit again before the end of the line, the flip-flop will be cleared and VIC will be displaying the same sprite line again. In other words, VIC will think that it is starting to display the second line of the expanded sprite row. This way any of the lines in any of the sprites may be stretched as wanted. .---- Current flipflop state (if one, enables add to sprite pointer) | .---- Y-expansion bit. | | .--- End of line pulse (briefly one at end of line) | | | .--- Next state (What state will become under these conditions) | | | | 0 0 0 1 0 0 1 1 0 1 0 no change 0 1 1 1 1 0 0 1 Clear $D017 -> flip-flop is set 1 0 1 1 1 1 0 no change Set $D017 -> flip-flop resets at the end of line 1 1 1 0 So, simply, at any time, if vertical expand is zero, the add enable is set to one. At the end of the line - before adding - the state is cleared if vertical expand is one. _Even odder ?_ Something very weird happens when we clear the expansion bit right when VIC is adding three to the sprite image counters. The values in the counters will be increased only by two, and the data is then read from the wrong place. Normally the display of a sprite ends when VIC has shown all of the 21 lines of the sprite (the counter will end up to $3f). If there has been a counter mixup, $3f is not reached after 21 lines and VIC will go on counting and will display the sprite again, now normally. If we fool the counter only once, the counter value $3f is reached when the sprite is displayed twice. _Fiddling_ I don't think the distorted counter effect can be used for anything, but there is many things where the variable stretching could be used. When you open the borders, you can be sure that there is a constant amount of time, if you stretch the sprites to the whole lenght of the area. You may stretch only the first and last lines, stretch the other lines by a constant or using a table, or using a variable table or any of the combinations possible. _A raster routine is a must_ Because you have to access the VIC registers on each line during the stretch, you need some kind of routine which can do other kinds of tricks besides the stretch. You can open the side borders and change the background color and maybe you have to shift the screen (and the bad lines with it) downwards. [See previous C=Hacking Issues for talk about raster interrupts.] Look at the demo program. In the beginning of the raster routine there is first some timing, then a loop that lasts exactly 46 clock cycles. It takes exactly one scan line to execute. Inside the loop we first do the necassary modifications to the vertical scroll register, then we change the background color and then we open the side borders. And finally we handle the stretching using the stretch data, where a zero-bit means that the corresponding sprite will be stretched. A one-bit means that VIC is allowed to go to the next line of the sprite data. _Stretching takes time_ Besides showing the stretched sprites we need time to generate the stretching data, unless of course, the stretch is constant. We have to have 20 one-bits for each sprite in our table. It is not feasible to determine the state of each byte in the table, instead you clear the table and plot the needed bits. The routine is quite straightforward, but many optimizations may be applied to make it faster. First we load Y with the stretch of the first line (the y-coordinate of the data). Then we use it as an index to the table and plot the right bit and increase Y with the expansion value. Then we do it again until we have all of the 20 bits scattered to the table. The last sprite line will then stretch until we stop the stretching, because the last line is not allowed to be drawn. _Speed is everything_ The calculation itself is easy, but optimizing the routine is not. If all of the sprites are stretched equally (by integer amounts) and from the same position, the routine is the fastest possible. You can also have variable and smooth stretch. Smooth stretch uses other than integer expansion values and thus also needs more processor time. If each sprite has to be stretched individually, you need much more time to do it. The fastest routine I have ever written uses some serious selfmodification tricks. There are also some other tricks to speed up the stretch, but they are all secret ones.. :-) Well, what the h*ck, I will include it anyway. By the time you read this I have already made a faster routine.. You can speed up that routine (by 17%) by unrolling the inner loop, but you have to use a different addressing mode for ORA (zero-page). You also need to place some restrictions to the tables used.. If you unroll both loops, you can get ~25% faster routine than the Fore!-version. _Demo program_ I tried to collect all of the main principles of stretching and raster routines to the demo program. I use the term "raster routine" when the execution is tightly synchronized to the electron beam and to the screen display. The program may be unclear in places, but I wanted to keep it as short as possible. The routine opens the side borders, scrolls the screen vertically, changes the background color and stretches the sprites. The stretcher routine allows different y-position and amount of expansion for each sprite. This routine uses 1/8 fractions to do the counting, and so it is much too slow to use in a real demo. VIC registers are initialized from a table, instead of setting them separately. Interrupt position is one line above the sprites. The program does not open the top or bottom borders. (I usually use a NMI to open the vertical borders, so that I only need to use one raster-IRQ position.) I tried to make a NTSC version, but I couldn't get it to synchronize. There are also less cycles available so you can't stretch all of the sprites individually in NTSC (with this routine that is..). -------------------------------------------------------------------------- Fast-stretch from Megademo92 (part: Fore!) SINPOS Stretch sinus index SINSPEED Stretch sinus index speed YSINPOS Y-sinus index YSINSPEED Y-sinus index speed MASK Bit mask for passess (usually $01,$02,$04,$08,$10..) YSINUS Y-sinus table STRETCH Sprite line sizes (LSB of the address must be 0) SIZET Sprite size/2 table (LSB of the address must be 0) DATA Stretch data table (cleared before this routine) [xx] marks selfmodification. For example loop counter, bit mask and index to the stretch and size data tables are stored straight in the code. 0b90 lda #$06 ; Number of sprites-1 (here I used only 7 sprites) 0b92 sta $0b96 0b95 ldx #$[ff] ; Load counter 0b97 clc ; Clear carry for adc 0b98 lda SINPOS,x ; Stretch sinus position 0b9b sta $0bd1 ; Set low bytes of indices 0b9e sta $0bb8 0ba1 adc SINSPEED,x ; Add stretch sinus speed (carry is not set) 0ba4 and #$7f ; Table is 128 bytes (twice) 0ba6 sta SINPOS,x ; Save new sinus position 0ba9 lda YSINPOS,x ; Get the Y sinus position 0bac adc YSINSPEED,x ; Add Y sinus speed 0baf sta YSINPOS,x ; Save new Y sinus position 0bb2 tay ; Position to index register 0bb3 lda YSINUS,y ; Get Y-position from table (can be 256 bytes long) 0bb6 sec ; adc either sets or clears carry, we have to set it 0bb7 sbc SIZET[1e] ; Subtract size of the sprite/2 to get the sprite 0bba clc ; to stretch from the middle. 0bbb tay ; MaxSize/2 < Y-sinus < AreaHeight-MaxSize/2 0bbc lda MASK,x ; Get the ora-mask for this pass 0bbf sta $0bcb ; Store mask 0bc2 sta $0bdb 0bc5 ldx #$13 ; 19 lines here + 1 after 0bc7 lda DATA,y ; Load & ora-mask & store 0bca ora #[$01] 0bcc sta DATA,y 0bcf tya 0bd0 adc STRETCH[1e],x ; Add the stretch from the table (carry is not set) 0bd3 tay 0bd4 dex ; decrease counter 0bd5 bne $0bc7 ; Do the 19 lines 0bd7 lda DATA,y ; Load & ora-mask & store the 20th line 0bda ora #[$01] 0bdc sta DATA,y 0bdf dec $0b96 ; Next sprite(s) 0be2 bpl $0b95 0be4 rts Timings: ------- clear 128 bytes: 514 + 12 cycles 8.16 lines 7 passes : 3820 + 12 cycles 60.6 lines = 8.66 lines/pass The unrolled clear routine consists of one load (lda #$00) and 128 store instructions (sta $nnnn). 12 cycles are counted for JSR/RTS. Stretching of 8 sprites would take slightly less than 80 lines, which is one fourth of the total raster time. Displaying a 128-line high stretcher takes about 130 lines (counting sprite setup and synchronization), scroller couple of lines more. Total 212 lines leaves 100 lines (6300 cycles) free for other activities in a PAL system. In a NTSC system you would have only 50 lines left. A simple basic routine to create the stretch data: ------------------------------------------------- a=0:for f=0 to 127:a=a+Height*(2+sin(f*PI/64)):poke Table+f,a: poke Table+f+128,a:a=a-int(a):next f This will also handle the 'rounding'. Because of this we don't have to handle fractions in the stretcher routine. The use of a table also gives the opportunity to have a separate size for each sprite line. The table does not need to be a sinus, it could have triangle or any other 'waveform' as long as the minimum value in the table (sprite line size) is 1. A basic routine to do the size/2 table: -------------------------------------- a=0:for f=0 to 19:a=a+peek(Table+f):next f: rem get the size in position 0 for f=0 to 127:poke STable+f,a/2:a=a-peek(Table+f)+peek(Table+f+20):next f -------------------------------------------------------------------------- _Stretcher program_ YSCROLL= $CF00 ; Vertical scroll table (moves bad lines) STRETCH= $CF80 ; Stretch table COLORS= $CE80 ; Table for background colors YCOORD= $0380 ; Sprite y-positions (eight bytes) HEIGHT= $0388 ; Sprite stretches (eight bytes) YPOS= 52 ; Sprite y-coordinate SPRCOL= 2 ; Sprite colors *= $C000 SEI ; Disable interrupts LDA #$7F STA $DC0D ; Disable timer interrupts LDA #IRQ STA $0315 LDX #$3E ; We create a sprite to cassette buffer LOOP LDA SPRITE,X STA $0340,X DEX BPL LOOP LDX #7 LOOP2 LDA #$D ; Set the sprite image pointers STA $07F8,X LDA #SPRCOL ; Set sprite colors STA $D027,X DEX BPL LOOP2 LDX #$26 LOOP3 LDA VIDEO,X ; Init VIC STA $D000,X DEX BPL LOOP3 LDX #$7F ; Create the y-scroll table LOOP4 TXA ; and clear the color table AND #$07 ORA #$10 ; Non-blank screen STA YSCROLL,X LDA #$00 STA COLORS,X DEX BPL LOOP4 STA $3FFF LDX #23 ; Create a color table LOOP5 LDA BACK,X STA COLORS+8,X STA COLORS+32,X STA COLORS+56,X STA COLORS+80,X STA COLORS+96,X DEX BPL LOOP5 JSR CHANGE ; Init sprite sizes and y-positions CLI ; Enable interrupts RTS IRQ LDX #$01 LDY #$08 ; 'normal' $D016 NOP ; Timing NOP NOP BIT $EA ; (Add NOP's etc. for NTSC) LOOP6 LDA YSCROLL-1,X ; Move the screen (bad lines) 5 STA $D011 4 LDA COLORS,X ; Load the background color 4 DEC $D016 ; Open the border 6 STA $D021 ; Set the background color 4 STY $D016 ; Screen to normal 4 LDA STRETCH,X ; Stretch the sprites 4 STA $D017 4 EOR #$FF 2 STA $D017 4 ; (Add NOP for NTSC +2) INX ; Increase counter 2 BPL LOOP6 ; Loop 127 times + 3 --- LDA #1 ; Ack the raster interrupt =46 STA $D019 +17(sprites) --- JSR DOSTRETCH ; New stretch =63(whole) JMP $EA31 SPRITE BYT 0,0,0,3,$FB,0,7,$7E ; An Example sprite BYT 0,$35,$DF,0,$1D,$77,0,$B7 BYT $5D,0,$BD,$83,$7E,$EF,1,$DE BYT $BB,1,$78,$AE,3,$70,$EB,0 BYT 0,$BA,3,$60,$EE,3,$D8,$FB BYT 2,$F6,$FE,$83,$BD,$9F,$BA,0 BYT $37,$EE,0,$3D,$FB,0,7,$7E BYT 0,3,$DF,0,0,0,0 VIDEO BYT $E8,YPOS,$20,YPOS,$50,YPOS,$80,YPOS,$B0,YPOS BYT $E0,YPOS,$10.YPOS,$40,YPOS,$C1,$18,YPOS-1,0,0 BYT $FF,8,$FF,$15,1,1,$FF,$FF,$FF,0,0,0,0,0,0,0,1,10 ; Init values for VIC - sprites, interrupts, colors BACK BYT 0,$B,$C,$F,1,$F,$C,$B ; Example color bars BYT 0,6,$E,$D,1,$D,$E,6 BYT 0,9,2,$A,1,$A,2,9 DOSTRETCH LDX #31 ; Clear the table LDA #0 ; (Unrolling will help the speed, LOOP7 STA STRETCH,X ; because STA nnnn,X is 5 cycles STA STRETCH+32,X ; and STA nnnx is only 4 cycles.) STA STRETCH+64,X STA STRETCH+96,X DEX BPL LOOP7 STA REMAIND+1 ; Clear the remainder LDA #7 STA COUNTER+1 ; Init counter for 8 loops LDA #$80 STA MASK+1 ; First sprite 7, mask is $80 COUNTER LDX #$00 ; The argument is the counter LDY YCOORD,X ; y-position LDA HEIGHT,X ; Height of one line (5 bit integer part) STA ADD+1 LDX #20 ; Handle 20 lines LOOP8 LDA STRETCH+2,Y MASK ORA #$00 STA STRETCH+2,Y ; Set a one-bit STY YADD+1 REMAIND LDA #0 AND #7 ; Previous remainder ADD ADC #0 ; add to the height STA REMAIND+1 ; Save the new value LSR LSR LSR CLC ; Take the integer part YADD ADC #0 TAY ; New value to y-register DEX BNE LOOP8 LSR MASK+1 ; Use new mask DEC COUNTER+1 ; Next sprite BPL COUNTER CHANGE LDA #$00 ASL ; Sprite height changes with 2x speed AND #$3F TAY ; 64 bytes long table INC CHANGE+1 ; Increase the counter LDX #7 ; Do eight sprites LOOP9 LDA SINUS,Y LSR LSR CLC ; Use the same sinus as y-data ADC #8 STA HEIGHT,X ; Sprite height will be from 1 to 3 lines TYA ADC #10 ; Next sprite enlargement will be 10 entries AND #$3F ; from this TAY DEX BPL LOOP9 LDX #7 LDA CHANGE+1 AND #$3F TAY LOOP10 LDA SINUS,Y ; Y-position STA YCOORD,X TYA ADC #10 ; Next sprite position is 10 entries from this one AND #$3F TAY DEX BPL LOOP10 RTS SINUS BYT $20,$23,$26,$29,$2C,$2F,$31,$34 ; A part of a sinus table BYT $36,$38,$3A,$3C,$3D,$3E,$3F,$3F BYT $3F,$3F,$3F,$3E,$3D,$3C,$3A,$38 BYT $36,$34,$31,$2F,$2C,$29,$26,$23 BYT $20,$1C,$19,$16,$13,$10,$E,$B BYT 9,7,5,3,2,1,0,0,0,0,0,1,2,3,5,7 BYT 9,$B,$E,$10,$13,$16,$19,$1C