The 65C816 XL/XE OS revision
WDC 65C816S, or shortly 65C816, is a microprocessor designed and produced by Western Design Center. The CPU is a successor of the 6502. Various versions of the 6502 were used in computers like Apple II, Atari 400/800/XL/XE, Commodore C-64, C-128 etc.
The difference between 6502 and 65C816 is that while the former one is a fully 8-bit CPU, the latter one, eventhough it is equipped with 8-bit data path to the memory, has an internal structure that allows 16-bit data processing. 16-bit wide registers, up to 64 kB of the stack, many new addressing modes and instructions (including memory-to-memory moves, unknown to the 6502). And, which is one of the most important things, thanks to the 24-bit address bus the CPU allow you to use 16 MB of directly addressable memory. It can also work with higher frequencies, up to 14 MHz.
All this stuff makes the 65C816 an interesting extension, that we can put into 8-bit Atari. This page discusses some aspects of such an extension.
65C816 has two modes of operation. The first one is 6502 emulation mode. The CPU "wakes up" being in it, to keep the backward compatibility with systems written for 6502. The other one is 65C816 native mode offerring most important extensions, such as 16-bit registers. Unfortunately, the XL/XE ROM must be modified first, if you want to use the native 65C816 mode, for the reasons explained below. This necessity made me think, that it could be worth the effort to prepare not a patched XL/XE OS, but rather something like a new XL/XE OS release, written in 65C816 code, with some long-awaited bugfixes and extensions which seemed necessary, or at least useful.
All the works done on the OS I do on my Atari 65XE equipped with 65C816, internal SpartaDOS X, 256 kB XE-compatible RAM, 2,5-inch 2.1 GB hard drive Toshiba MK2103MAV (attached through the KMK/JŻ IDE interface), and of course the ROM described here. The system is written and compiled with MAE 1.2 made by John Harris.
The ROM modification is done to achieve the following things:
II. User's manual
They way you use the new system doesn't practically differ from the way you use the XL OS. The only noticeable difference is the function of the console keys:
1) Default boot drive
Unlike in the XL OS, the default boot drive number is determined before, and not after the PBI initialization. As a result, the hard drive handler has a possibility to change the boot drive number without doing any tricks, which are necessary under XL OS. The KMK/JŻ interface uses this possibility.
2) Boot menu
The menu allows to select the boot drive number. The selection is done using either cursor keys (up/down arrows) or hitting a key with a letter visible at the left edge of the screen. After each change the computer tries to communicate with the selected drive and prints a brief information about it:
If you don't want the turbo mode for a reason, press the "T" key. This toggles the turbo mode off and on for the preselected drive. The drive selection is confirmed with RETURN key. Hitting it causes the menu to be left, and the system initialization continues.
The menu is 100% "E:" device compatible, so it should be visible and working even if an external video device is attached to the computer via the parallel bus interface. I tried to make the menu to be in classic Atari style, so it is stylized after DOS 2.5.
There is also a small, but noticeable difference in the boot procedure. If the default drive is not connected, or if the user has selected a nonexistent drive in the boot menu (this is allowed), the system then tries to find a first bootable drive searching through all possible units. This means, that when there is no drive attached, and no cartridge, that would prevent booting, is inserted either, the boot process takes long time - searching all the drives takes about 30 secs.
4) F1-F4 function keys
The XL OS contains routines to handle function keys F1-F4; there are no such keys on the console, though. There are some hardware solutions allowing you to install and use the keys, such solutions however usually don't take into account the fact, that the routine handling the F1-F4 conflicts with 130XE-style memory expansions. Shortly, pressing some F1-F4 key combinations causes memory banks to be switched in uncontrolled way, and this usually sooner or later results in a system crash.
The reason is that the 1200XL, 1400XL and 1450XLD computers have console LEDs, which signalize certain OS states you can select with F1-F4. These LEDs are controlled by bits of the PORTB register - the same register controls the memory banking in the 130XE computer (and compatibles). The keyboard interrupt routine, when it detects certain F1-F4 key combos, tries to setup the LEDs accordingly, not realizing these do not exist.
This has beed fixed in the new system: using the F1-F4 function keys is safe and doesn't cause any side-effects on computers with memory expansions. At the other hand, on computers without RAM expansions everything will work traditionally - i.e., if there is no 130XE compatible memory, the routine assumes, that the PORTB controls LEDs.
III. Interrupt considerations
The native interrupt services impose a good deal of problems. Because of the lack of free ROM space, we basically have to put things together so that the same interrupt routines serve for both native and emulation modes.
We presume that:
At the other hand, the interrupt overhead is reduced in the emulation mode thanks to new instructions and addressing modes. For example, the NMI serivce routine, which takes 32 cycles on 6502, is shortened to 26 cycles on 65C816. Similarly the EXITVBL routine is reduced from 23 to 19 cycles, and other savings take place inside the actual SYSVBL. All this does not balance the increased interrupt overhead in the native mode, reduces it however.
The DLI is somehow a special case, because this interrupt is extremely time-critical. According to some sources, there are only 104 CPU cycles per one ANTIC scanline, and thus such an overhead as 94 cycles - the "save everything" option - is not acceptable.
Because of that the DLI service routine is entered with the CPU state mostly being undetermined. The PBR and DBR are set to 0, and the accumulator size is 8 bits. Nothing else is known or saved. The routine must be ended with RTI, the OS will restore the DBR for you.
Notice: the accumulator is not saved, you have to save it before use (with PHA) and restore before RTI. Notice that if you want to use the full 16-bit accumulator, your code won't work in emulation mode.
Notice: the X/Y registers are not saved. Moreover, their size is unknown. Remember, that if you're gonna use them, you must save their values first with PHX and PHY respectively, and then determine the required size. Consider the following examples:
; Example 1 ; Use of the X register, size 8-bits. ; Note that you have to save them BOTH, even if ; you use ONLY ONE of them as 8-bit size register!!! rep #$10 ;set both registers to 16 bit phx ;you must save them BOTH before switching to 8-bit phy sep #$10 ;switch to 8-bits ... your code here ... rep #$10 ;switch back to full size ply plx rti
The reason for switching X/Y to 16-bit before setting to 8-bit is that the register size is not determined while the DLI service routine is being entered (see above). Omitting the REP instructions would save 10 cycles, if the registers are already set to 8-bit size (2*3 cycles for two REPs, and 4*1 cycles for pulls and pushes); but if they're NOT 8-bit, the PLY/PLX instructions at the end will pull wrong number of bytes from the stack causing your program to crash.
; Example 2 ; Use of the X register only, size 16-bits. ; In this case you only save the registers ; which you use. Notice that this code won't ; work in emulation mode. rep #$10 phx ... your code here ... plx rti
Similarly, if you want to use zero page registers, you most probably will have first to save and reset the D register value. This is done so:
; Example 3 phd ;save the D register pea $0000 ;reset it pld ... your code here ... pld rti ; Example 4 ; This gains 5 cycles, if you use 16-bit accumulator ; for other purposes too. rep #$20 ;16-bit accumulator pha phd lda #$0000 tcd ... your code here ... pld pla rti
However, saving, resetting and restoring the D register is avoidable if:
That last approach is probably most profiteous, as long as we don't have multitasking.
In the native mode, the VBL routine is entered with D=0, DBR=0 and PBR=0. All registers are saved except for the MSB of the accumulator: if you're going to use 16-bit accumulator or the XBA instruction in your VBL routine, you must first push the entire 16-bit accumulator to the stack and pull it before exit.
The top of the stack looks identical in both native and emulation modes: the 8-bit Y register is the topmost, then below go 8-bit X register and 8-bit accumulator, the P register, and 16-bit return address.
However, in the native mode there are additional data yet below, such as 24-bit return address, additional register values etc. Everything this is pulled *AFTER* the EXITVBL routine is executed.
In the native mode, the IRQ routine is entered with D=0, DBR=0 and PBR=0. All registers are saved except for the MSB of the accumulator: if you're going to use 16-bit accumulator or the XBA instruction in your IRQ service routine, you must first push the entire 16-bit accumulator to the stack and pull it before exit.
The top of the stack looks identical in both native and emulation modes: the P register is the topmost, then below goes the 16-bit return address.
However, in the native mode there are additional data yet below, such as 24-bit return address, additional register values etc. Everything this is pulled *AFTER* the RTI is executed.
Exact interrupt overhead
Exact interrupt overhead, in cycles, in comparison to XL OS. The NMI overhead is calculated from the NMI signal to the moment, when the first user instruction (pointed to by DLIV and VVBLKI) is fetched by the CPU; and from RTI completion to the actual return from interrupt.
The IRQ overhead is calculated from the IRQ signal to the moment when the interrupt source recognition routine (SINRDYI) is reached; and from RTI completion to the actual return from interrupt.
Note that in native mode both interrupt and RTI each take 1 cycle longer, and this time is already added here.
Also note, that both NMI and IRQ interrupts have own master vectors in RAM, which are jumped through first, before the system routines are executed. This adds 6 cycles to the overhead, and this time is already taken into account here (if such vectors existed in XL OS, the XL OS interrupt overheads would be 23, 42 and 19 cycles for DLI, VBL and IRQ respectively).
IV. New system call mechanism
The old system call mechanism present in the XL OS, based on a jump table and JSR calls, has a limitation that makes its usage problematic with new, 65C816 programs, which can possibly store the code beyond the 64k boundary.
Namely, the JSR instruction has a 16-bit argument, and thus cannot cross the 64k bank boundary. As a result, when you make a JSR $E459 call while your program is executing in the bank 0 (and on 6502 this is always the case), the call will reach its destination, i.e. the $00E459, where the actual ROM procedure resides. However, doing the same in bank 1 makes the call go to $01E459, which is not the place where it should actually find itself.
At the other hand, the 65C816 offers a JSL call, jump to subroutine long, which accepts 24-bit address as an argument and stores a 24-bit return address on the stack. However, the ROM routines expect a 16-bit return address to be stored, and we have to keep this behaviour to maintain compatibility with older programs. Thus, the JSL instruction cannot be used to call the system.
Providing an alternative jump table (for long calls) would solve the problem, but first of all it would be a great waste of ROM space. Not to mention the fact, that the jumptable metod is not considered very flexible, as you can never change the vectors which are in ROM, thus you can't patch the OS with software, when necessary.
The new calling method
The new calling method is based on the COP instruction handler. COP generates an unmaskable software interrupt, quite similar to the TRAP instruction on the Motorola 68k. The COP accepts a constant (immediate) argument. The instructions with the arguments of the value from $80 to $FF are reserved by the WDC for future usage, possibly for new instructions. The rest, i.e. the range from $00 to $7f, is available for our definition.
The new OS vectors in RAM
The COP handler residing in ROM defines five new vectors in the memory. Most of them are 24-bit long. These are:
VCOPE $000251-$000252 WORD
Two COP interrupt vectors, for the emulation and native mode respectively. The first is not used by OS for now, it simply points to the RTI instruction. The reason for that is, that being in emulation mode you do not really need the new calling mechanism, you may happily use the old one, as you cannot run code outside the first 64k anyways.
The other vector, VCOPN, points to the system handler. If you change this vector, you must end your code with a JMP to the old location, or RTI if you bypass the ROM completely.
VCOP0 $000262-$000264 LONG
These are secondary vectors jumped through by the system handler (the one pointed to by VCOPN). The code pointed to by them is called with a JSL instruction and must be ended with RTL (or a JMP to the old location).
The first vector is called, when a COP #$00 is executed. This instruction is reserved for the usage of the operating system, details below.
The second vector is called when any COP instruction with an argument range $01-$7F is executed. The third vector is called when the reserved COP instructions are executed (i.e. argument range $80-$FF).
The long vector at $000018 points to the argument of the instruction that caused the call (i.e. if COP #$00 was executed, the vector points to the "#$00" part of the instruction).
All the secondary vectors are called in native mode, with D=0, PBR=0, DBR=0, and with 16-bit register sizes. The contents of the CPU registers is unmodified and should be the same as when the COP interrupt occurred (except for VCOP0, where registers reflect the content returned by OS). The handler residing at the primary vector (VCOPN) is responsible upon exiting for restoring the CPU context to the state being actual prior to the call.
Special meaning of the COP #$00 instruction
The COP #$00 instruction has been defined as a system emulation interrupt. The ROM handler residing behind it accepts a 16-bit address pointing to a location within the bank 0 (first 64k), calls it with a JSR instruction, and returns the results of the call to you. Upon exit, the registers contain values returned by the OS. The P register bits returned by OS are preserved except for bits M and X, which are restored to the state prior to the call.
Here's an example of a system call using new calling method:
sep #$30 ;set 8-bit registers ldx #$10 ;select IOCB #1 lda #$0c ;command: CLOSE sta >iccmd,x ;store pea jciomain ;push the OS function address cop #$00 ;call the OS pla ;pop the argument off the stack pla
Notice that this is the only safe method when calling the system in native mode.
If you use the traditional method, you ought to call the system in emulation mode *only*. Even if the OS itself should work perfectly in both native and emulation modes, I take no responsibility for externally loaded device drivers (especially DOS-es), which can contain code that is not compatible with the native mode.
V. New SIO functions
The availability of most functions is indicated in the @:SYSDEF file. Relying on ROM version number is not a good idea - all stuff explained below about version numbers has informative character only.
As of ROM version BB 02.08 the SIO is able to read data (e.g. disk sectors) directly to the high-memory past the first 64k, and to write data from that area. The DCB has been extended to 16 bytes, the most significant byte of the 24-bit address is located at $00030E.
However, for backward compatibility with old software, and, first of all, with existing parallel bus devices, setting a 24-bit address in the extended DCB won't have any effect. This is so because a device designed to do I/O within 16-bit address space may not understand 24-bit addresses, and thus the data would be stored to (or fetched from) improper memory location. There must exist an additional mechanism that prevents this.
For this purpose, three new virtual devices have been defined inside SIO; the codes are as follows:
Calling SIO with such a code in DDEVIC tells the SIO handler to take the full 24-bit address from the DCB, and that the upmost byte of this address is at $00030E (the SIO assumes otherwise, that the upmost byte of this address is 0). Except that virtual devices operate identically to the traditional $31, $40 and $50; and the virtual device code is translated to the real one before sending out the command via the serial port.
Accepting the command the SIO does not check or prove in any way, that the extended memory exists. It is assumed, that the program knows already, that the memory exists, before it requests a read/write operation on this memory; it can do that e.g. with the memory management functions.
VI. New CIO functions
As of version BB 02.03 the CIO now accepts 0 (zero) as device's unit number, for example such a name as "D0:" will be decoded as "device D:, unit 0". In previous versions of the system such a number was silently changed to unit 1. Notice, that this extension is backported from the system developed in Atari Inc. for the 1450XLD computer.
As of version BB 02.05 the ROM-contained CIO interface features a new function: it can now automatically search the IOCB channels for a free (closed) one. A call like this:
ldx #-1 jsr jciomain
will return status in the Y register (1 - success, or a negative error code otherwise), and the channel number multiplied with 16 in the X register.
As of version BB 02.08 the ROM-contained "K:" handler features four XIO functions as follows:
As of version BB 02.08 the ROM-contained "K:" handler features new "scanning" mode of operation. In this mode, when you call the GET function of the keyboard, and no data is available (= the user did not press any key), the function does not block and busy wait for data availability, but returns immediately instead.
The function gets activated for a particular stream by opening it for keyboard with bit 0 of ICAX2 set to 1, for example:
After that, any GET #1 returns status 1 in Y register and the ASCII code of the key only if a key was pressed. If no key was pressed, the function returns status code -78 (178, $B2), and the accumulator value is meaningless.
Enabling this mode for a "K:" channel does not affect other channels and does not affect the console device ("E:") either.
Unfortunately, this mode cannot be easily used with BASIC, because the status -78 is considered an error by the BASIC interpreter, and it aborts program execution.
VII. The "N:" device
As of version 1.93 the ROM contains handler for a new device - "N:" (null). It works as follows:
VIII. The "@:" device
"@:" is a pseudo-disk which contains only one file named "SYSDEF". This file's length is 128 bytes. The file contains some information to be used by programs which would like to know f.e. if it is safe to switch into 65C816 native mode.
The "@:SYSDEF" contents is as follows:
If there's no such device or no such file (error 138, 170 or similar while opening "@:SYSDEF"), bytes 10-127 shall be assumed to be zero.
The remaining bytes are for now reserved and are read as zeros.
You can set the read position inside this file using XIO 37 operating similarly as in SpartaDOS X (this function is mandatory for the "@:" device). Setting the value beyond the EOF causes errors at subsequent reads.
IX. Memory management
As of version 02.04 the system defines four memory management functions, comprising a subsystem called shortly "kmem", which stands for "kernel memory manager". The first one - called kpsize - is explained below. The second's one - kmalloc - purpose is to allocate memory blocks, the third one - kfree - to release them.
If it was C, the function prototypes would look as follows:
unsigned int kpsize(void);
int kmalloc(int size, unsigned int mode);
int kfree(unsigned int page);
The smallest memory unit known to these functions is a page. The size of this page - in bytes - is returned by the kpsize() function. Current implementation will return 256 here, and this is quite small amount, considering page sizes on larger systems; for example m68k systems use 8192 or even 16384 bytes per page.
Thus, the kmalloc()'s size argument is not a number of bytes to allocate, but a number of pages. The returned value is not an address of the allocated block either, but rather a number of the page boundary where the block starts. Similarly, such a page number must be given to the kfree() function as its page argument. In current implementation - where a page consists of 256 bytes - the page number is equal to the two highest bytes of the full, 24-bit address of the allocated memory block.
If we pass -1 (or $FFFF) to the kmalloc() as its size parameter, it won't allocate anything, but it will calculate and return the actual number of free pages instead.
Such memory management costs per-block memory losses (upto 255 bytes per allocated block), but its main advantage is that the address fits in 16 bits; this in turn makes passing parameters and receiving results easier, not to mention the function's internal operation; this also simplifies and shortens the global memory map.
Speaking of the map, it currently contains 64 slots (or 42 slots in older versions of the OS). This means, that the system can dynamically manage 64 independent blocks of memory, any number of pages each. If a program would want to allocate 32 kB (128 pages) calling kmalloc() 128 times, it will 64 times succeed, but the 65th time will fail with "Out of memory" error, despite the fact that only 16 kB have been allocated! Thus, the kmalloc() has to be used sparingly: in this example the best approach would be to alloc the entire 32k at one time - id est, with one kmalloc() call.
The mode argument of the kmalloc() modifies its operation according to current requirements. It is a 16 bit word, each bit being a flag switching internal kmalloc() functions on and off. Currently the following bits are defined:
The remaining bits are reserved and should be kept zeros for upward compatibility.
The kfree() function accepts the kmalloc()-returned page number as an argument. If there's an allocated and non-resident memory block, it will be deallocated.
The functions explained above are called with COP #$01 instruction, passing arguments on the stack:
rep #$30 pea $0000 ;mode pea $0100 ;size (256 pages = 64k) pea $0001 ;kmalloc() function code cop #$01 plx ;remove arguments plx plx
The function returns status (error code) in the Y register. If it contains a positive value, the accumulator contains then the number of the first page belonging to the allocated memory block (the page number, or two higher bytes of its address). If the Y contains a negative value, then an error has occurred and the accumulator contents is meaningless.
Other functions are used similarly, except that kpsize() doesn't expect any arguments. The function codes are as follows:
The rest of function codes for COP #$01 is for now reserved. Calling them returns error -110 (in older OS revisions the $0003 was assigned to a function, which was removed as of version 2.12).
CAUTION: All kmem functions work (1) only in the native mode, (2) only in the memory above the address 65535 and (3) only when this memory does exist. Emulation mode calls have no effect (the COP handler does exist, but consists of loading an error code to Y, and RTI). When no additional memory exists, all functions return negative error code in the Y register.
X. Memory map changes and enhancements
XI. Other changes
As of version 2.10 the CHARSET 2 has been modified, and the characters assigned to ATASCII codes 125, 126 and 127 changed their shapes. These are console control characters, CLR/HOME, DEL and TAB respectively.
The new shapes are:
The characters retain their functions as control codes, and still can not be displayed but in an escape sequence (directly after the ESC character, ATASCII 27).
ATASCII codes unfortunately don't match ASCII codes of these characters (left brace - 123, right brace - 125, tilde - 126). The reason is that the ATASCII 123 in CHARSET 2 is already assigned to an international character, namely the German "A umlaut". To properly transfer a text file containing these characters to a PC, conversion must be done (and it is to be done anyways because of different EOL character codes in ASCII and in ATASCII).
This OS will only work with XL/XE hardware, it is NOT compatible with the Atari 400/800 series (won't even start on such a machine).
From the user point of view there are following changes:
Simultaneously such things as the international character set (CHARSET2) and the routines that handle the 1090XL module have been preserved, for I consider that these can be useful in some (near) future.
From the programs point of view 100% compatibility is kept for such (and only for such) programs, which use legal system calls via jump table or vector tables. Programs, which use some ROM locations directly, won't work correctly (or won't work at all).
To get such a program to work you have to patch it so that it would use legal calls, thus making it generally more compatible with various Atari ROM's around.
In the XL/XE ROM there were some locations employed only to keep compatibility with old 400/800 programs, which use illegal system calls. I think that it is good occasion to make a cut on this - all such stuff has been removed. If a program wants to run equally well on 400/800 OS, XL/XE OS and this 65C816 ROM, it must use legal system calls only.
A compatibility list is available here.