Feersum Beasts: VideoBeast
Home

VideoBeast API - v0.9

15 February 2025

Documentation is the love letter you write to your future self - Damian Conway

Foreword

This is a living document covering prototype hardware that is currently under development. Whilst there are good reasons for many choices, there are also a few "Friday afternoon" ideas in here. Tread carefully, for here be dragons.

Introduction

VideoBeast is a graphics chip for 8-bit CPUs that generates high performance arcade quality 2D graphics.

It produces video output at 320x240, 640x480 (4:3) and 424x240, 848x480 (16:9) resolutions in 9 bit (3:3:3) colour. It supports text modes with software defined fonts, tile maps and sprites with 1024 unique graphics per layer and bitmaps in 1, 4 and 8 bit per pixel modes. Up to six layers may be configured using any combination of these graphic types. VideoBeast is implemented in custom hardware and delivers high speed graphics without depending on modern co-processors or software defined video.

Example screenshot

It achieves this by providing efficient ways to manipulate large volumes of graphic data. Specifically it provides tiles, sprites and bitmaps that can be composited in layers with transparency, windowing and per-layer scrolling. It also supports raster effects and sprite and layer multiplexing. VideoBeast features 1MByte of on chip graphics memory with fast DMA transfer from SD Card, so a large amount of data can be pre-loaded. This reduces the demand on the host CPU.

VideoBeast acts as a static RAM chip, occupying 4, 8 or 16KB of host memory depending on hardware configuration with no other interfacing requirements (it can optionally generate a raster-synchronised interrupt signal). In normal operation, the CPU address space (4, 8 or 16K) is mapped to one or more pages of onboard video memory. The top 256 bytes of the address space are mapped to control registers that define page mapping, graphics modes, layer arrangements, palette data etc. The host CPU can read and write to any address in the video memory, and most registers.

Colours are defined in palettes of up to 256 values, each of which may be one of 512 colours (3:3:3 RGB) or transparent.

Most layers generate a fixed bitmap size (eg. 1024x512 pixels depending on layer type) regardless of the display mode. This means that software running on the host CPU does not have to take account of different graphics modes for altering the contents of a layer (pixel locations in memory are fixed relative to the base of the layer data). Layers are displayed within user defined windows on screen, with a scroll offset applied (wrapping in X and Y). Layer windows are fixed to 8-pixel boundaries on screen and may freely overlap.

Most layers provide reasonable off-screen areas for continuous scrolling, and depending on graphics modes may allow double buffering through switching scroll offset.

Memory Map

From the host CPU perspective, VideoBeast provides a small address space that is suitable for most 8-bit processors to access. Control registers allow the host CPU to map its smaller address range to any part of the internal 1MB video RAM. The data in the video RAM has no fixed arrangement. Instead each layer can specify where graphics and other data are stored. This allows the host CPU to allocate more or less memory to different graphic types as required by the software being executed. Layers are controlled by registers in layer config blocks.

In the ideal hardware configuration, VideoBeast presents a 16KB address space to the host CPU (14 address lines, A0-A13). Alternate configurations support smaller 8KB or 4KB address ranges where the host CPU can only dedicate a limited part of its memory map to VideoBeast graphics. If the host system cannot support the full 16KB address space, address lines A13 and A12 may be connected to +5V (high) so that the host system effectively accesses the top 8KB or 4KB of the address space.

In the rest of this document, VideoBeast addresses will be give as though the full 16KB address space is connected to the host system starting at address 0. In this configuration, the host may write or read from addresses 0 to 03FFFh, and VideoBeast will translate (map) addresses in this range to a relevant section of it's internal 1MB graphic memory, or registers.

Regardless of hardware configuration, the host address space is then divided into one or more pages, each of which may be mapped to a different area of VideoBeast's internal 1MB graphic memory. Memory mapping and other configuration is controlled through registers. Registers overlay a variable number of bytes at the top of the address space. By setting appropriate registers, page size can be configured to suit the host hardware arrangement and graphic task, and pages can be mapped to areas of VideoBeast graphic memory to control text, sprites, bitmaps and tiles on screen.

The 16KB host address space may be split into 1 16KB page, 2 8KB pages or 4 4KB pages. Each page may be mapped to any 4KB aligned offset in the video memory to write and read from the entire 1Mb address space.

The top 2 bytes (3FFEh and 3FFFh) are always mapped to control registers #FE and #FF which set the display mode, paging and register access. When register access is enabled, the top 256 bytes (3F00h-3FFFh) are mapped to registers. The rest of this document uses the convention that

#XX refers to a register at address 03FXXh.

The fact that registers and mapped pages share the same address space from the hosts perspective leads to a few compromises. It is, however a deliberate choice that maximises the features that can be achieved with a restricted hardware interface. When mapped as a single page, the register overlay means that the top two bytes in the page cannot be read or written to. However, the page mapping may be adjusted within the 1MB range of the graphics memory so that those two bytes are accessible. Page offsets in VideoBeast graphics memory are aligned to 4KB boundaries (or a 2KB boundary in the case of 4KB pages).

All word (2 byte) or long (4 byte) values are stored as little endian (least significant byte first).

Global registers

Global registers are stored in the 32 byte block #E0-#FF. Bit number X within a byte or word is shown as [X]. Bit sets, from bit X to bit Y are shown as [X:Y]. Any bits that are not explicitly defined should be left as 0.

Byte Meaning
#FF [2:0] -> Screen Mode 0=640x480 1=848x480 2-7=undefined
[3] -> Pixel doubling 0=Off 1=On
[4] -> Testcard 0=Off 1=On
[7:5] -> Page map 0=16k 1=8k 2=4k (Low) 3=4K (High), 4=Sinclair
#FE =243 (0xF3) -> Enable register page. Any other value: Hide and lock all other registers
#FD-#FC Background colour 5:5:5 RGB
#FB-#FA Line interrupt Write->Interrupt on line, Read->Current line
#F9 Bank offset 0 Page 0 (16K, 8K, 4K) base address / 4KB
#F8 Bank offset 1 Page 1 (8K, 4K) base address / 4KB
#F7 Bank offset 2 Page 2 (4K) base address / 4KB
#F6 Bank offset 3 Page 3 (4K) base address / 2KB
#F5 Lower Registers [2] -> Palette number 0/1
[1:0] -> Palette bank number (4 x 64 colours)
---
#F4-#F0 Undefined
---
#EF SPI Command/Status register
#EE SPI Data register 6
#ED SPI Data register 5
#EC SPI Data register 4 DMA transfer length high
#EB SPI Data register 3 DMA transfer length mid
#EA SPI Data register 2 DMA transfer length low
#E9 SPI Data register 1 DMA Address high
#E8 SPI Data register 0 DMA Address low
---
#E7-#E4 X * Y - Maths accelerator 32 bit product of the two 16 bit values X and Y
#E3-#E2 Y 16 bit signed value X
#E1-#E0 X 16 bit signed value Y

(Stretch goal) A fast memory fill/copy process might be run when the screen generation is idling. This would allow for rapid updates by removing the need for the host CPU to clear on-screen graphics before drawing additional frames. This might be generalised to a fast horizontal line draw, a triangle fill or a square fill to support vector graphics. Not sure what this would look like in practice.

Palette

VideoBeast produces a 9 bit (3:3:3) RGB signal. Since 9 bits are inconvenient to work with, all layers use palettes to translate a 1, 4 or 8 bit pixel value into a 9 bit output. There are (currently, at least) two palettes in VideoBeast, each of 256 colours. One palette is used for bitmap layers, and one palette is used for text, tile and sprite layers.

Where layers have 4bpp (16 colour) graphics, the layer specifies a palette index, which acts as the top 4 bits of the palette value. This allows one of 16 palettes, each of 16 colours to be chosen from the normal 256 colour palette.

Palette values are stored as two bytes, in 5:5:5 RGB format. The top bit indicates a transparent colour. Partial transparency is not supported. This means that a palette takes up 512 bytes. Palettes are accessed through the lower register space (#00 - #7F). 64 colour values are accessible, controlled through the lower register select register #F5. This selects which 64 colour bank within the palettes may be accessed. That 64 colour bank is then available at register locations #00 to #7F.

Layers

Layers are composited 'bottom up' over a background colour (registers #FC,#FD). A layer acts as a window into a larger graphic data object such as a bitmap, tilemap or textmap. Whilst the graphic data objects are a fixed size (e.g. a tilemap is always 1024 by 512 pixels), the layer window can be any size on screen, and can display a pixel offset of the graphic data it's connected to (hardware scrolling within the layer window). Multiple layers can point to the same graphic data, or separate objects if required.

Any combination of layer types may be used, however some layers require additional bandwidth from the video memory, so it is possible to exceed the available draw time. 'Expensive' layers include those with 8bpp graphics, and sprites. Some testing will be required to determine if a particular layer combination can have problems - but as the firmware stabilises there will be some guidance drawn up as to the limits of the system. If draw time is exceeded, screen corruption may occur.

Display composition

Layers are composited on the current raster line from the six layer config blocks. This means that layer multiplexing is possible - the limit is that a maximum of six layers may be used on any given line. Updating the layer configs as the display is drawn allows larger numbers of layers to be arranged on screen.

(Stretch goal) it may be possible to report "maximum load" for a given frame, so software can measure how hard the graphics system is working and detect potential problems.

Text/1bpp Layer

Text layers occupy 16K of memory, and display 128 columns by 64 rows of 8x8 pixel characters (1024 x 512 pixels). Each character is stored as two consecutive bytes representing the character index (1 byte), foreground and background colours (1 byte). A palette of 16 colours is selected by the layer config block, and can include transparent foreground or backround pixels. The layer is drawn with a 1bpp font stored in video memory (256 x 8 bytes, or 2KB).

In addition, text layers can support a one bit per pixel + attribute overlay mode, similar to some early 8-bit computers, allowing high resolution graphics with small memory footprints. To enable this mode, the layer config block specifies the address of a 64Kb high resolution image. If an image address is set, then any text cell that has a character value of 255 will show the equivalent 8x8 pixel square from the high resolution image, using that cell's attribute values. This allows a high resolution screen of 1025 x 512 pixels with each 8 x 8 pixel square using foreground and background colours chosen from the given 16 colour palette.

Note that whilst the full 1024x512 pixel screen still takes up 64Kb + 16Kb for the attributes, VideoBeast supports special memory mapping modes where a sub-section of this screen area is mapped to more compressed memory layouts. This allows the host to write (for instance) graphics in the Sinclair Spectrum format to memory and VideoBeast automatically decodes the layout to the relevant part of its internal 1024x512 pixel representation.

Tile Layer

Tile layers occupy 16K of memory for the tile map, and display 128 columns by 64 rows of tiles. Tiles are 16 colours (4bpp) 8x8 pixels (32 bytes), with the layer therefore being 1024x512 pixels in size. Each tile may use one of 16 palettes. Tile graphics can be defined for up to 1024 unique tiles (occupying 32K). Layers may share tiles, or each layer may use a different tile set as appropriate.

Bitmap Layer

Bitmap layers are either 8bpp or 4bpp (256 or 16 colours from a bitmap palette). Bitmaps take 256K and are either 1024 x 512 pixels (4bpp) or 512 x 512 pixels (8bpp). (Design choice - a 1024x512 8bpp bitmap would be significantly larger than the highest screen resolution, so a large amount of data would be unnecessarily wasted. Limiting 8bpp bitmaps to 512x512 means a full screen 8bpp image in higher resolution modes is possible only by using two bitmaps (with a lot of waste), but a partial screen is more efficient. An alternative would be to define bitmaps in vertical columns of 512 pixels, with the amount of memory used being controlled by number of columns requested for the bitmap.)

Sprite Layer

Sprites are 8 x 8 up to 32 x 32 pixels, 4bpp (16 colour) chosen from up to 4096 cells (128K data). Sprites are drawn from a sprite list, which is an array of 8-byte entries defining the position, orientation, palette and graphic to be drawn. A sprite list may have up to 256 entries (2K data). The Sprite layer specifies the number of sprites to be drawn from the list, and sprites may be individually disabled. Sprites therefore have a 'depth', with later entries in the array being drawn on top of earlier ones.

Note that sprites are drawn from the specified list on the current raster line. This means that sprite multiplexing is possible - the limitation being that each layer may define up to a maximum of 256 sprites, and too many sprites on a given line may exceed the draw time available.

Mode 7 Layer

(Stretch goal) Mode 7 layers are expensive to draw, so may restrict which other layers are used. A Mode 7 layer is like a 512 x 512 8bpp bitmap layer except that rotation, shear and transformation can be applied (an effect first used in the SNES 'Mode 7'). Raster effects are not possible with a Mode 7 layer, since pixel offsets are persisted across rows on screen.

Layer data/registers

Layers are defined in 6 x 16 byte blocks from #80 to #DF. Each 16 byte block specifies the layer type, the window on screen that the layer is drawn inside and a scroll offset. Windows are fixed to 8 pixel boundaries. Depending on the layer type, additional parameters are provided in the block, for instance to specify the memory location of the pixel data to be drawn. Layers are drawn in order (bottom to top), starting with the block at address #80.

Common block

All layer configs start with the same 8-byte data.

Byte Meaning
0 [2:0] -> Layer type 0=Disabled 1=Text 2=Sprite 3=Tile 4=8bpp bitmap 5=4bpp bitmap
1 Top of window on screen In units of 8 pixels. Gives 0-2048
2 Bottom of window Inclusive. Gives 7-2055
3 Window Left In units of 8 pixels
4 Window Right Inclusive. Gives 7-2055
5 [7:0] -> Scroll X[7:0]
6 [3:0] -> Scroll X[11:8]
[7:4] -> Scroll Y[11:8]
7 [7:0] -> Scroll Y[7:0]

Text layer

The text layer consists of a character map and a font. The text layer is 1024 by 512 pixels, or 128 by 64 characters (occupying 16Kb). The character map has pairs of bytes for each location. The first byte is the character index in the font (255 characters * 8 bytes, or 2Kb). The second is the palette value (0-15) for the foreground (bits 7:4) and background (bits 3:0).

If character = 255, pixel data is optionally taken from a separate 64Kb 1bpp bitmap, using the same attribute value. The bitmap is 1024 * 512 pixels, arranged as 128 bytes for each row (=64Kb total).

Byte Meaning
8 Character map base >> 14 16K page number for character map
9 Font map base >> 11 2K page number for font (bottom 512K)
10 [3:0] -> Palette index upper bits
[4] -> 0:Normal or 1:Sinclair palette bit layout
11 High res bitmap base >> 14 16K page number for 1bpp bitmap (0 disables 1bpp)

Bitmap layer

Byte Meaning
8 Bitmap base >> 14 16K page number for bitmap
9 Unused, leave as 0
10 [3:0] -> Palette index

Memory Usage:

265KB -> 512 x 512 pixel @ 8bpp, or 1024 x 512 pixels @ 4bpp

Tile layer

Byte Meaning
8 Tile map base >> 14 16K page number for tile Map
9 Tile graphics base >> 15 32K page number for tile graphics
10

Memory Usage:

16KB -> 128 x 64 x 2 bytes Tile map (1024x512 or 2048x1024 pixels)
32KB -> 1024 x 32 bytes Tile graphics (8x8 4bpp pixels)

Sprite layer

Byte Meaning
8 Sprite object base >> 11 2K page number for font (bottom 512K)
9 Sprite graphics base >> 15 32K page number for sprite graphics
10 Sprite count

Memory Usage:

2KB -> 256 x 4 byte sprite entries.
Up to 128Kb -> 4096 x 32 bytes Sprite graphics (8x8 4bpp pixel cells)

1BPP and Sinclair Screen mapping

In addition to the 256 colour and 16 colour bitmap modes, VideoBeast supports a 1bpp (one bit per pixel) mode with a colour attribute map. This is achieved through a text layer configuration where the normal character+attribute text map can allow individual characters to 'peek through' to an underlying high resolution 1bpp bitmap. In this mode, a 64Kb 1bpp bitmap (arranged as 1024x512 pixels, in 128 byte rows) is configured in memory. Then where the text map has a character value of 255, that character is replaced with the underlying bitmap using the specified attribute (foreground and background colour) values.

This layer type allows VideoBeast to support Sinclair style screen layouts with a special host page mapping mode. This maps the 16K page visible to the CPU to one of eight areas of a VideoBeast 1bpp bitmap, using the Sinclair memory layout.

Byte Meaning
#FF [7:5] = 4 -> Sinclair page mapping mode
...
#F9 Bank Offset 0 -> Attribute base >> 14 16K page number for character map
#F8 Bank Offset 1 -> Bitmap base >> 14 16K page number for 1bpp bitmap
#F7 Bank offset 2 - Offset 2 -> [2] -> Y8 [1:0] -> X9-X8 Screen offsets
#F6 Bank offset 3 - Offset 4 -> Offscreen base >> 14 16K page number for data outside of screen RAM

Tile Map

The tile map consists of 128 x 64, 2 byte words (16K), defining a 1024 x 512 pixel area. Each tile is 8 x 8 pixels, 4bpp, stored as 32 bytes in left to right, top to bottom order. The tile map specifies a 10 bit tile index (1024 values) to select a tile to draw.

Bytes Description
0 Tile index [7:0]
1 [7:4] -> Palette index
[3] -> 1: Visible.
[2] -> 1: Collide (sprites can hit tile)
[1:0] -> Tile index [9:8]

Sprite Data

Sprites stored as up to 256, 8 byte entries (a 2Kb block). Each entry provides the location, palette and co-ordinates to draw the sprite on screen. Sprites are made up of 8x8 pixel cells (same format as tiles), and may be 1,2,3 or 4 cells wide/high in any combination - e.g. 2x2, 4x3 etc.

Cells within a sprite are drawn left to right, top to bottom. The first cell to be drawn is given by the sprite cell index (bits 9 to 0 of the first word in the sprite entry - 1024 unique values). The layer parameter defines the base address of sprite cell data, then the start offset for drawing the sprite is calculated as the sprite cell index multiplied by the width of the sprite. This means that 1x1 sprites will be drawn with start offset at 0,1,2... (memory offset 0, 32, 64, 96 bytes), but 3x2 sprites will be draw with start offset at 0,3,6... (memory offset 0, 96, 192 bytes). This slightly complicated arrangement compensates for the fact that with 1024 cell index values, 4x4 sprites (16 cells) would normally only allow 1024/16 -> 64 unique graphics to be defined (with cell index 0, 16, 32...). Multiplying by the width (4 in this case) extends the index range to give us 256 large sprite graphics (with cell index 0, 4, 8...).

Each sprite has a palette index, allowing one of 16 palettes to be chosen. There are also bit flags that are set to enable the sprite (disabled sprites are not drawn), flip or mirror the sprite, or draw it relative to the previous sprite in the list. This allows a group of sprites to be treated as a single object, with the co-ordinates controlled by the first sprite in the group.

Bytes Description
0,1 [15:12] -> Palette index
[11:10] -> Collision 0/1 plane visible
[9:0] -> Sprite cell index
2,3 [15:14] -> Unused *
[13:12] -> Sprite width (8,16,24,32 pixels)
[11] -> Sprite X mirror
[10:0] -> Sprite X position (0-2047)
4,5 [15] -> Sprite enable
[14] -> Sprite relative
[13:12] -> Sprite height (8,16,24,32 pixels)
[11] -> Sprite Y flip
[10] -> Unused *
[9:0] -> Sprite Y position (0-1023)
6,7 Unused *

SPI Interface

To the host, the SPI interface appears as eight registers (R0-R7). These can be read to retrieve the state of the SPI interface, or written to control its function.

The SPI interface supports sending and reading registered values (streams of up to 6 bytes), and sending and reading to memory (DMA transfer).

Two SPI devices are supported - the external SD Card interface and the internal 2Mb Flash RAM.

Functions and status are accessed through R7 (the Command Register). Writing to R7 will initiate a command when the interface is not busy. Commands sent when the interface is busy will be ignored.

Bits 7 6 5 4 3 2 1 0
Command
Send registers 0 0 0 Reset CS Send 1's Byte count 1-6
Read registers 0 0 1 Reset CS Wait for
start bit
Byte count 1-6
Send memory 0 1 0 Reset CS
Read memory
from device
0 1 1 Reset CS Wait for
start bit
Set DMA address 1 1 0 - - - Set
Length
Set
Address
Set options 1 1 1 Clock CS 1 CS 0 - -

Reading R7 will return the interface status (TBD)

Status Bits 7 6 5 4 3 2 1 0
Clock CS 1 CS 0 Busy

When sending registers, the value of R0-R6 will be sent in ascending order (R0 first). If bit 3 (Send 1's) of the command register is set, then all 1's will be transmitted instead of the bit value of R0-R6. If bit 4 (Reset CS) is set, then both Chip Select lines will be reset at the end of the write.

When reading to registers, if Bit 3 (Wait for start bit) of the Command register is set, then the interface will read up to 16 clock cycles until a 0 bit is received before reading the requested number of bytes. If bit 4 (Reset CS) is set, then both Chip Select lines will be reset at the end of the read.

When setting the DMA address, R0 and R1 store the target dma address as a page index (256 byte pages - ie. A0-A7 are assumed to be 0, R0 stores bits A8 to A15, and R1 stores bits A16 to A23 of the full address). R2, R3 and R4 store the transfer length minus 1, in bytes. Note that DMA transfers are word aligned, so the transfer length is rounded down to the nearest even value. The values in the registers are only written to the internal counters if Bit 1 (set length) or Bit 0 (set address) of the Command Register are set. After a Set DMA address command is sent, the current values of the internal counters can be read from R0-R4.

Address: R1[2:0], R0[7:0], 00h Length R4[2:0], R3[7:0], R2[7:0]

The Options command sets the things like the clock speed (Bit 4: 0 sets low 400kHz speed for initialising SD Cards, 1 sets high speed (25 Mhz)), and the Chip select lines for the two devices. CS 0, Bit 2 is the internal Flash memory, CS 1, Bit 3 is the external SD Card interface. Note that setting both CS 0 and CS 1 to 0 (active) will be ignored. Clock, CS 0 and CS 1 default to 1 (high speed, no device active)