The Recreation Boy Advance is a handheld video games console created by Nintendo. It was launched in Japan in 2001 and served because the successor to the Recreation Boy Coloration. It had an ARM7TDMI clocked at 16.78 MHz, 32kb of inside work RAM, 256kb of exterior RAM, and 96kb of VRAM. It is not probably the most highly effective machine, however there are many video games for the hand held that many maintain in fond reminiscence. One sport that by no means noticed the sunshine of day although for the machine was a prototype port of Quake, a sport developed by id Software program that helped outline the first-person shooter style that we all know as we speak.
Quake is an extremely detailed sport with a implausible soundtrack and addictive gameplay, and identical to DOOM, it has been ported to virtually each single machine you possibly can consider. Its port to the Recreation Boy Advance is especially unbelievable because it doesn’t natively help 3D graphics, and Nintendo particularly marketed the hand held as being a two-dimensional gameplay expertise. That did not cease Randy Linden from growing his personal port although.
Should you’re unfamiliar with Linden, he is greatest recognized for being the developer of each bleem! (a PlayStation emulator) and the SNES port of DOOM, an accomplishment that John Romero, co-founder of id Software program, as soon as mentioned in an interview with Shacknews he did not suppose was potential. Linden’s improvement proficiency proved that if anybody was going to have the ability to make Quake on the Recreation Boy Advance a actuality, it was most likely him.
This port has come to mild due to Linden’s personal launch of it by the Forest of Phantasm venture. Forest of Phantasm is a venture aimed toward preserving the historical past of Nintendo’s video games, and Linden reached out with the intention to distribute the copy of the Quake port he discovered on a 256MB flash card in his possession.
We want to thank Randy Linden for dedicating time to answering our questions and guaranteeing the technical accuracy of this text. We might additionally wish to thank Trendy Classic Gamer for permitting us to make use of any stills from his video that had been wanted. This port has no official relation to id Software program or ZeniMax and was developed as a solo venture by Linden.
Quake’s Recreation Boy Advance port
Technically talking, it is a marvel that Quake may even run to the extent that it does on the Recreation Boy Advance. It runs at an excellent body charge and maintains the right lighting and shade palette of the unique Quake sport. Every little thing is 3D, together with weapons and monsters. Video games on the Recreation Boy Advance achieved 3D graphics usually by sprites, however this was the true deal. It would not make use of ray casting in the way in which different 3D video games did on the hand held, and it even achieves level lighting results on pre-rendered objects by way of a pallete-changing trick to realize an phantasm of dynamic lighting.
To be clear, this port is not the complete sport, and it is a prototype that Linden supposed on taking to id Software program as soon as it was accomplished to be made for launch. Nevertheless, the Recreation Boy Advance’s recognition started to wane, and as an alternative, the customized engine written by Linden later grew to become the engine of one other sport developed by Linden totally — Cyboid. Linden tells us {that a} “massive chunk of the code” remains to be the unique ARM code from the Recreation Boy Advance model. If you wish to check out Cyboid, an older model is offered on the Google Play Retailer, however the official APK is now distributed on the Amazon App Retailer as the sport has quite a lot of low-level 32-bit code.
Linden additionally shared with us a video of his code working on the iPod Video, which served to be one of many earliest variations of Cyboid. It was constructed on the identical engine code that was used for his Quake port to the Recreation Boy Advance.
The Recreation Boy Advance port of Quake would not comprise any of the sport’s official property, as Linden hasn’t reached out to both id Software program or ZeniMax about distributing the E1M1 model which accommodates official Quake property.
The sport at present being distributed can be a debug construct. Holding the R key on boot-up will convey the participant straight to the second map of the sport, and holding left on the D-pad will convey them to the third. Map swapping can be accessed when the participant dies, and monsters won’t assault the participant till the participant shoots at them first.
As for music, the demo makes use of public .S3M information and the sound mixer handles each stereo music and sound results.
Technical boundaries
There have been a variety of boundaries when it got here to the Recreation Boy Advance that made this a troublesome port. A few of the largest obstacles had been the low clock pace, the dearth of 3D graphics capabilities of the hand held, and the dearth of a floating-point unit (FPU). There have been loads of others alongside the way in which, however these had been explicit ache factors that Linden outlined to me as being problematic. Earlier than we get into it, it is essential to know the structure of the Recreation Boy Advance.
The Recreation Boy Advance has three units of RAM — one is the inner work RAM (IWRAM), one other is the exterior work RAM (EWRAM), and the third is video RAM (VRAM). The 32kb of IWRAM is used for storing ARM directions for fast execution, whereas the 256kb of EWRAM is perfect for storing Thumb-only directions and smaller chunks of information. As Rodrigo Copetti notes, EWRAM will be as much as six instances slower to entry than IWRAM. Nearly all of reminiscence within the type of EWRAM is barely accessible by way of a 16-bit bus, regardless of the Recreation Boy Advance being marketed as a 32-bit handheld. The IWRAM may very well be accessed by way of a 32-bit bus. VRAM on the Recreation Boy Advance is available in at 96kb, and whereas it is primarily for storing graphics knowledge, it is discovered within the CPU’s reminiscence map and can be utilized as regular reminiscence storage, too.
Thumb directions are a subset of 32-bit ARM directions, and are a set of directions encoded into 16-bit phrases. They’ve all the advantages of 32-bit directions with out taking over as a lot house, making them environment friendly for optimized improvement. Which means that whereas EWRAM is slower to entry, Thumb directions being environment friendly can usually nonetheless find yourself simply as quick as ARM directions saved in IWRAM, although the draw back of Thumb directions is that generally there is not fairly the Thumb equal of an ARM instruction you wish to execute. The EWRAM was used for storing the output of the 3D math transformation logic which was mainly the record of polygon edges that had been then traced out scanline-by-scanline by the rasterization code.
As Linden tells me, probably the most complicated and troublesome a part of all the port was the scanline renderer. It consists of over 10,000 strains of highly-optimized ARM meeting code which is designed to attract a set of pixels to VRAM. The scanline renderer used up a lot of the 32kb IWRAM. The sides closest to the digital camera are lively and rendered, and it is primarily a big Binary House Partitioning (BSP) tree. VRAM was used to retailer the outcomes of the polygonal transformation output into edge tables as a result of there wasn’t sufficient IWRAM, however VRAM on the Recreation Boy Advance remains to be sooner than EWRAM. The graphics had been additionally saved and displayed right here.
He spent quite a lot of time specializing in optimizations to make sure that it was in a position to acquire the quickest execution time potential. Three issues that he did to hurry up that execution time included the next:
- Self-modified the code earlier than it was executed, so fewer directions had been required
- Used a collection of look-up tables for issues like reciprocal, sine, cosine, tangent, and so forth.
- Switched the CPU “mode” to achieve entry to further registers (which can be like “variables”) with out having to save lots of and restore the registers’ values.
Switching the CPU modes to achieve further registers is an extremely intelligent maneuver that permits fast entry to values near the CPU in order that they are often retrieved in a single clock cycle. As Linden tells me, it was potential to change registers and retrieve a price in a single clock cycle, versus storing a price within the RAM of the Recreation Boy Advance, which takes longer. The CPU itself is a 16.78 MHz processor, which means it may well full 16780000 cycles per second. That feels like quite a bit, however when it is advisable to calculate and draw each pixel on the display screen, these rapidly add up and it turns into essential to shave as many operations off as you possibly can.
The above is the record of normal registers of the ARM7TDMI chipset that is contained in the Recreation Boy Advance. Sometimes, builders would solely ever entry the registers throughout the “System and Consumer” mode and resort to utilizing regular variables outdoors of that. Nevertheless, he made use of registers in all seven modes of the chipset, and the very best half about it’s that switching modes nonetheless retain the values within the registers of the opposite modes, so he may swap between them.
Funnily sufficient, Linden additionally talked about how his technique of financial institution switching unearthed a bug within the Nanoboy Advance emulator. Because it turned out, that emulator didn’t help utilizing the opposite modes of the CPU for saving in registers and switching, and his Quake demo was the primary recognized sport to truly do it.
Linden shared a photograph with us of among the notes he created and defined how he optimized his floating-point calculations in absence of a correct FPU.
The above picture is one which Linden shared with us from his notes, and what’s notably fascinating is the “miscellaneous ARM cycle instruction counts”. He devised a technique to optimize the cycles for calculations in order that he may cut back the variety of clock cycles for a calculation. As he described it to me, an 8-bit quantity may very well be multiplied in a single clock cycle, a 16-bit quantity in two clock cycles, a 32-bit quantity in three clock cycles, and a 64-bit quantity in 4 clock cycles.
“There have been two or three phases of execution [in the ARM processor]. Say for instance I multiply register one by register two and put the outcome into register three. If I knew that register two was a 16-bit quantity as an alternative of claiming multiply register one by register two, I might flip it and I might say multiply register two by register one as a result of that might save me a clock cycle.”
He informed me that the explanation he did this was to squeeze each little bit of efficiency out of the Recreation Boy Advance, as a clock cycle saved right here and there actually provides up when quite a lot of calculations are being carried out. As for the self-modifying code, I requested Linden to clarify it.
“This system comes from [storage], it transfers an enormous block of this system into inside RAM for execution as a result of it is sooner. Every RAM entry is far, a lot slower so I do a DMA [Direct Memory Access] of an enormous block from ROM into RAM, after which I alter the precise program code. For instance, ARM has the flexibility to shift operands left or proper or it may well masks off sure bits as a part of the instruction set. The instruction specifies which bits you are going to masks or what number of bits you are going to shift by. So, I might generate code that might modify what was nearly to be executed primarily based on what number of bits I wanted to shift. One other instance is close to 3D matrix multiplication. There are an entire bunch of multiplications concerned there. I might generate the precise directions which can be doing the multiplications into the inner RAM after which execute them so the code form of constructed parts of itself whereas it was working.”
Self-modifying code has its personal downsides, particularly relating to debugging. It removes the necessity for department directions too, the place the code would soar to a different execution sequence and might deprive the principle thread of valuable computation time. Linden additionally informed us that the look-up tables are completely aligned within the ROM in order that they’re an ideal a number of of an eight-bit worth shifted left. The scale of the look-up desk is immense and would not match into RAM, and the alignment additionally avoids the necessity for an additional load instruction to get the bottom handle of the desk.
All in all, the ultimate prototype was developed over almost two years.
The way forward for Randy Linden’s Quake port
I requested Linden what would occur to the way forward for the Quake port, and he informed me that he was placing consideration into asking ZeniMax and id Software program about releasing the model with official Quake property. He additionally informed me sooner or later he’ll launch the supply code, however at present, it would not construct because it requires an older laptop.
I requested Linden why he selected Quake, and he informed me that he liked the sport and he liked the problem of this being the “unimaginable venture”, because it was off the again of his DOOM for SNES port. He additionally talked about that whereas he doesn’t imagine all the sport may have been ported resulting from house constraints, the overwhelming majority of the sport may have been in the identical engine.
Should you’re concerned about trying out Quake for the Recreation Boy Advance, make sure to try the discharge of it on Forest of Phantasm, which you’ll be able to try under.