Inside - INTRODUCTION

Section 1

FOREWORD

Around the middle of February, Frank Hogg asked me to do a "little something" on Level 2 OS-9 for the CoCo-3. This is the result, a compilation of old and new notes I and others had made for ourselves.

Organizing anything about OS-9 is tough, since each part of it interacts closely with the rest. In the end, I decided to simply present information as a series of essays and tables. Some of these are ones that I had made for Level 1, but apply equally well to Level 2. Maybe in a half year or so we'll come out with a second edition, but we really wanted to help people out NOW.

To me, at least, it is very like being blind not knowing exactly what occurs during the execution of a program that I have written. For that reason, I have taken a look at OS-9 on the CoCo from the inside out.

The idea is that if you can figure out what's happening on the inside, you have a better chance of knowing what to do from the user level. In essence, this whole collection is a reference work for myself and my friends out there like you.

Level 2 wasn't out yet at the beginning of this writing, and I had not seen the Tandy manual until the end, so please bear with me if things have changed somewhat.

In general, I will not duplicate explanations provided by the Tandy manuals, Microware manuals or the Rainbow Guide. Instead, my intention is to enhance them. You should get them, too. Dale Puckett and Peter Dibble are working now on a book about windows for the user. I will be doing more on drivers soon.

This reference work is the result of many hours of studying and probing by myself and others. Hopefully, it will save you at least some of the time and trouble that we have had. Since this is meant as part tutorial, part quick reference, some tables may occur more than once as I felt necessary.

Special thanks are due to Frank Hogg, for publishing this and for being "patient" with delays. I also owe a lot to the many people on CompuServe's OS-9 Forum, who keep asking the right questions.

Thanks also to Pete Lyall for letting me use his excerpts on login, Kent Meyers for much help on internals, and to Chris Babcock for delving into the fonts for us.

And, of course, none of this would have been done without the support and love of my dearest friend and sweetheart, Marsha. Thank you, Sweet Thang!

I hope it helps. Best wishes, and Have Fun. Kevin K Darling - 30 March 1987

OVERVIEW OF OS-9

The following is all of OS-9 in one spot:

UNIVERSAL SYSTEM TABLES
Direct page vars	table pointers, interrupt vectors
Memory bitmaps	maps of free/in-use memory
Service dispatch tables	vectors for SWI2 system calls
Module directory	pointers to in-memory modules
Device table	info on used devices (/D0, /P etc)
IRQ polling table	vectors interrupts to drivers

PROCESS INFORMATION
Process descriptors	process specific information
Path descriptors	I/O open file information
Driver static storage	device driver constant memory

PROGRAM MODULES
User programs	your program
Kernel	handles in-memory processing
Ioman	controls I/O resources
File Mgr	file handling and editing
Drivers	data storage and transfer
Device descriptors	device characteristics

SIMPLE SYSTEM MEMORY MAP

00000-01FFF    System Variables
02000-7DFFF    Free memory, bootfile, video memory
7E000-7EFFF    Kernel
7F000-7FFFF    I/O and GIME

THE MAIN PLAYERS

Modules              Responsibilities
-----------------------------------------------------------------
REL, BOOT            Reset hardware and boot

OS9P1                Initialization of system
OS9P2                Handling of most SWI2 service calls (except I/O)
                     Memory management and process control
                     Module directory upkeep, module searching
                     Allocation of process descriptors

IOMAN                Handling I/O related SWI2 service calls
                     Allocation of path descriptors
                     IRQ polling table entries
                     Device IRQ polling
                     Device table entries for desc, driver, file manager
                     Queuing processes trying to use same path desc
                     Allocation of driver static memory
                     Copying device desc init table to path desc
                     Calling file manager for I/O calls

RBFMAN               Allocation of data buffers
SCFMAN               File & directory allocation and management
PIPEMAN              Edit, seek, read, write of file
                     Queuing processes trying to use same device

CC3DISK              Allocation of verify buffer
CC3IO                Read / write of data buffers from / to device
PRINTER              Device interrupt handling
RS232                Device status / error monitoring
------------------------------------------------------------------

REL                  Resets hardware, calls OS9p1
INIT                 Data module containing system constants
BOOT                 Load OS9Boot if initial dir's, paths fail
CC3GO                CHX CMDS, Startup, Autoex, Shell
CLOCK                System timekeeping, VIRQ's, Alarm calls

------------------------------------------------------------------

Process Descriptors  Info on each process
Path Descriptors     Info local to each I/O path
Device Table         Device memory, desc, filmgr, driver
Polling Table        Device status address, driver IRO vector
Module Directory     Address, user count of program modules

Section 2

MULTI-TASKING PRINCIPLES

The power of the 6809's addressing modes enables the m/1 programmer to easily write code that will execute at any memory address. Furthermore, if the code is written to access program variables by offsets to the index registers, more than one user can execute that code as long as he has his own data area.

The point of all this is that the 6809 made it easy for Microware to write an operating system that can load a program anywhere there is enough contiguous memory, assign the user a data space, and through SWI2 (trap) calls, access system I/O and memory resources.

Now, since we know that we can be processing code and sharing the 64K memory space with other programs, we can allow more than one program / user at more or less the same time by switching between the processes fast enough to appear to each user that he has his own computer.

How often is fast? In some other multi-tasking systems, each process is responsible for signaling to the operating system kernal that it was ready to give up some of its CPU time. The advantage of this method was that time-critical code wasn't interrupted. (OS-9 users can simply shut off interrupts if this is necessary.) But this method depends on the user to write the switching signal into his code so that it was hit often enough to give other processes a chance to run.

In OS-9, there is always a system clock that interrupts the 6809 about 10 times a second, and causes the next process to be given a CPU time slice (Actually 60 times/second on the CoCo, but a process time slice is considered to be 6 'ticks', or 1/10th second). Other interrupts from any I/O devices needing service cause the system to execute the interrupt service routine in the driver for that device, and quickly resume the original process.

Switching between processes is the easy part. Each process has a process descriptor, holding information about it. When the 6809 is interrupted, the current address it is at in the program, and the CPU's registers are saved on the system stack in the process's data area. The stack pointer's value is saved in the current program's process descriptor for later retrieval.

The Kernel then determines who gets the next time slice according to age and priority. The stack pointer of the new main process is loaded from its process descriptor, and since the stack pointer is now pointing to a 'snapshot' of its process's registers, a RTI instruction will cause the program to continue as if nothing had ever stopped it.

So, in essence, each process thinks that it is alone in the machine with its own program and data area limits defined, although if needed, it can find limited info on the others. Besides device interrupts and normal task-switching, two other events may have an effect on a program's running without its knowing about it: I/O queuing and untrapped signals.

MULTI-TASKING PRINCIPLES: PROCESS QUEUES

These are just what they sound like - an ordered arrangement of programs. They are kept in a linked list, that is, each has a pointer to the next in line. When a process changes queues, the process descriptor itself isn't moved, just the pointers are.

A process is always in one of three major queues (except for the current process):

Active - Normal running; gets its turn in varying amounts of the total processor time according to its age, priority, and state.
Sleeping - A program has put itself to Sleep for a specified tick count, or until it gets a signal. (As in waiting for its I/O turn)
Waiting - Special Sleep state that terminates on a signal or child's death / FSExit. Entered via FSWait.

MULTI-TASKING PRINCIPLES: STATES

The P$State byte in a process's descriptor has different bits set depending on what the program is doing, where it is currently executing, and what external occurences have affected it.

A process has one or more of these state attributes:

SysState (%1000 0000) Is using system resources, or is being started/aborted by the kernal.
TimSleep (%0100 0000) Asleep: awaiting signal, sleep over.
TimOut (%0010 0000) Has used up its time slice. This is a temporary flag used by the kernal.
Suspend (%0000 1000) Continues to age in active queue, but is passed over for execution. Used in place of Sleep and Signal calls in some Level 2 drivers.
Conden (%0000 0010) Has received a deadly signal, dies by a forced FSExit call as soon as it is no longer in a system state.
Dead (%0000 0001) Is already unexecutable, as its data and program areas have been relinquished by an F$Exit call. The process descriptor is kept so that the death signal code may be passed to the parent on F$Wait.

The System State is a privileged mode, as the kernal doesn't make the process give up the next time slice, but instead lets it run continuously until it leaves the System state.

The reason for this is that the process is servicing an interrupt, changing the amount of free memory, or doing I/O to a device, and thus should be allowed to run until it is safe to change programs, or it has released the device for other use.

It is because of the System State that interrupts are allowed almost always. Any driver interrupt code acts as an "outside" program that temporarily takes over the CPU, but the current process is not changed and will continue when the driver is finished taking care of the interrupt source.

MULT-TASKING PRINCIPLES: I/O

If two or more processes want to do input/output/status operations on the same device, all except the first will have to wait in line (queue). Under OS-9, IOMan and the file managers are responsible for this control.

Each open path has a path descriptor associated with it. This is a 64-byte packet of information about the file. Because OS-9 allows a path that has been opened to a file or device to be duplicated, and used by another process, several programs may be talking about the same path (and path descriptor). Provision must be made to queue an I/O attempt using the same path. (The most common instance of this is with /TERM.)

Since all I/O calls pass through the system module IOMAN, the I/O manager, it checks a path descriptor variable called PD.CPR to see if it is clear, or not in use. If it is in use, the process in inserted in a queue to await it's turn.

Here the process descriptor plays a part. Two of its pointers are used here: P$IOQP (previous link) , and P$IOQN (next link). P$IOQP is set to the ID of the process just ahead of this one, and the P$IOQN of the process ahead in line is set to this one's ID, forming a chain (linked list) of process ID pointers waiting to use this particular device.

When a process has made it through a manager to the point that the manager must do I/O through a device driver, it checks a flag in the driver's static (permanent) storage called V.BUSY. If it is clear, no one is using the device at that instant, and V. BUSY is set to the process's internal ID number.

If V.BUSY is not clear (another process got there first and is waiting for it's call to finish), the manager inserts the process in an I/O queue to wait its turn.

When the process (executing the file manager) is through with the device, it clears V.BUSY, and all the processes waiting in line are woken up to try again. As far as I know, V.Busy only becomes very important if a driver has put it's process to sleep, as otherwise the program would have exclusive access while within a system call anyway.

Thus a process seeking use of a device and its driver must wait FIRST for the path to be clear, and THEN for the device used by that particular path. If two processes are talking to two different files, or have each opened their own paths and the file is considered shareable, they will only have to wait in line for device use.

Again, it should be noted that once one process has started I/O operations, it has near-total use of the CPU time, except of course for interrupt routines or if it goes to sleep in the driver or a queue.

MULTI-TASKING PRINCIPLES: SIGNALS

Signals are communication flags, as the name implies. Since processes operate isolated from each other, signals provide an asynchronous method of inter-process flagging and control.

Commonly used signals include the Kill and the Wakeup codes. Wakeup is essential to let the next process in an I/O queue get its turn in line at a path or device.

OS-9 has a signal-sending call, F$Send, which sends a one byte signal to the process ID specified, and causes the recipient to be inserted in the active process queue. Any signal other than Kill or Wake is put in the P$Signal byte of its process descriptor.

If it was the Kill signal, the P$State byte in the process descriptor has the Condemned bit set to alert the Kernel to kill that process. A Wake signal clears the P$Signal byte, since just making the destination an active process was enough.

Signals are not otherwise acted upon until the destination process returns to the User state. (It'd be unwise to bury a process in the midst of using the floppy drives, for instance.) However, drivers and the Kernel may take note of any pending signals and alter their behavior accordingly.

When the Kernel brings a process to the active state, the P$Signal byte in the descriptor is checked for a non-zero value (Kill=0, but the Condemned bit was set instead, causing a rerouting to the F$Exit 'good-bye' call as soon as the killed process enters a non-system state). The process is given a chance to use the signal right off.

If the program has done a F$Icpt call to set a signal trap, a fake register stack is set up below the process's real one, holding the signal, data area and trap vector: P$Signal, P$SigDat, P$SigVec. The Kernel then does its usual RTI to continue the program where it left off.

Instead, the program picks up at the signal vector where it usually stores the signal in the data area for later checking when convenient (totally up to the programmer, though). The trap routine is itself expected to end with a RTI, thus finally getting back to the normal flow of execution by pulling the real registers that are next on the stack.

If the program has NOT done a F$Icpt call, the Kernel drop-kicks it into F$Exit, the same as a Kill signal does.

SIGNALS :

O      S$Kill    Abort process (cannot be trapped)
1      S$Wake    Insert process in Active process queue
2      S$Abort   Keyboard abort (Break Key)
3      S$Tntrpt  Keyboard interrupt (Shift-Break)
4      S$Window  Window has changed
5-255            User-definable so far

The sequence for OS-9 FORK (initiating a process) is summarised below.

P$ -- process descriptor
D. -- Direct Page Variable
    
#  VAR     MODULE     ACTION

1  P$ID    OS9        Allocates a 64-byte process descriptor.
   P$User             Copy parent's user index
   P$Prior            ..and priority
   P$Age              Age set to zero.      
   P$State            State of process is System State.
   D.Proc             Current process descriptor is now this one. 
   P$DIO              Copies parent's default directory pointers.

2  P$PATH  IOMAN      Called three times to I$Dup the first 3
                      paths of the parent (std in, out, error).

3  P$SWI   OS9        Make these 3 vectors = D.UsrSvc (0040).
   P$SWI2
   P$SWI3

   P$Signal           Clear process's signal signal vector.
   P$SigVec

4  P$PModul           F$Link to desired program module.
4a P$PModul IOMAN     F$Load from xdir if not in memory.

5           OS9       Error end if not Program/System module.

6  P$ADDR   OS9P2     F$Mem request to >= data area needed.
   P$PagCnt

7           OS9       Copy parameters to top of new data area.
   P$SP               Set stack pointer to RTI stack registers.
                      Set up RTI stack with register values:
                           PC - module entry point
                            U - start of data area
                            Y - top of data area
                            X - parameters pointer
                           DP - start of data area
                            D - length of parameters passed
                      SP-> CC - interrupts okay, E flag for RTI

8  D.Proc             Put back parent as current process.
   P$CID              Get PARENT's other child, and
   P$SID              make it new proc's sibling link.
                      (PARENT's new P$CID = new P$ID)
   P$PID              Copy parent's ID to new proc desc.

9  P$State            State of new is no longer System State.
                      Return new child's ID to parent.
   P$Queue            F$AProc - insert process in active queue.

Opening an OS-9 device/file takes the following general steps:

PD. -- path descriptor vars     V$  -- device table
 V. -- device static storage    Q$  -- IRQ poll table
P$  -- process descriptor

 
#  VAR     MODULE     ACTION
1  PD.PD   IOMAN      Allocates a 64-byte block path descriptor
   PD.MOD             Sets access mode desired.
   PD.CNT             Sets user cnt=l for this path desc.

2  PD.DEV  IOMAN      Attaches the device (drive) used.
   V$STAT             Allocates memory for device driver (CCDisk)
   V.PORT             Sets device address in driver static memory

3  V.xxxx  DRVER      The driver's init subroutine is called to
   V.xxxx             initialize the device and static memory.
                      If device uses IRQ's uses F$IRQ call:

4  Q$POLL  OS9        Sets up IRQ polling table entry.
   ...                ( address, flip & mask bytes, service add,
   Q$PRTY               static storage priority of IRQ)

5  V$DRTV  IOMAN      Sets up rest of device table.
   V$DESC             ( module addresses of desc driver, mgr)
   V$FMGR
   V$USRS             Sets user count of device=1

6  PD.OPT  IOMAN      Copies device desc info to path desc.
   ...                ( default values: drive f, step rate,
   PD.SAS               sides baud rate lines/page, etc.)
                      Calls file managir Open subroutine :

7  PD.BUF  FLMGR      Allocates buffer for file use.
   PD.DVT             Copies device table entry for user.
   PD.FST-            Opens file for use, and sets up
   PD.xxx             file mgr pointers and variables.

8  P$PATH  IOMAN      Puts path desc in proc desc I/O table.
                      Returns table pointer to user as path number.

2, 3, 4, 5 only if first time for that device,
              else V$USRS = V$USRS + 1
                   PD.DEV = Device table entry
4.         only if device uses IRQ's

Section 3

GIME DAT

The memory management abilities of the CoCo-3 are the source of its ability to run Level 2. To help explain what a DAT is, and its usefulness, here's a text file I first posted on the OS-9 Forum on 5 August 86.

Q: What is the difference between the 512K boards that are sold now and the 512K CoCo-3?

LOGICAL VS PHYSICAL ADDRESSES

To understand the difference, you must first keep in mind that the 6809, having 16 address lines, can only DIRECTLY access 64K of RAM. The only way for the CPU to use any extra memory is to externally change the address going to the RAM.

The address coming from the CPU itself is called the Logical Address. The converted address presented to the RAM is called the Physical Address.

For instance, the CPU could read a byte from $E003 in it's 64K Logical Address space, but external hardware could translate the SE003 into, say, a Physical Address of $1B003, by looking up the entry for the 4K block $E in a fast RAM table.

A coarser, but more familiar, example to CC owners is the $FFDF (64K RAM) "poke'. The SAM chip can address 96K of Physical memory (64K RAM and 32K ROM). When that register was written to, the SAM translated all accesses to memory in the Logical (CPU) range of $8000-SFEFF to Physically point to the other 32K bank of RAM, instead of the ROM. A similar example is the use of the Page Bit register, to translate Logical accesses to $0000-$7FFF into using the other Physical 32K bank of RAM.

MEMORY MANAGEMENT

The hardware that does the actual translation between the Logical --> Physical addresses is called a Memory Management Unit (MMU). In the case above, the SAM was the MMU. One common type of hardware MMU is called a DAT, for Dynamic Address Translation. ADAT consists of a Task Register and some fast look-up RAM. It's called Dynamic partly because the translation table is not fixed, but can be modified. I'll go into more detail on a DAT later.

THE COCO-2 BOARDS

The memory expansions sold for the CC2 are an extremely simple form of a DAT. Most only allow the upper or lower 32K of Logical Addresses to access a different upper or lower 32K bank of Physical Memory. Leaving out I/O addresses and ROM for the moment, their 64K modes simplistically look like: (for 256K)

   v-- Logical (CPU) addresses
$FFFF   +------+------+------+------+
        |      |      |      |      |
        | U0   |  U1  |  U2  |  U3  | Upper 32K Banks
        |      |      |      |      |
$8000   +------+------+------+------+
        |      |      |      |      |
        | L0   |  L1  |  L2  |  L3  | Lower 32K Banks
        |      |      |      |      |
$0000   +------+------+------+------+
         $0xxxx $1xxxx $2xxxx $3xxxx <-- Physical (RAM) hex addresses

example: CPU access of $0100 using Bank 2 = L2 + $0100 = RAM address $20100.

The Physical memory that the CPU addressed is chosen from a combination of (L0 or L1 or L2 or L3) AND (U0 or U1 or U2 or U3). Some boards would mostly only allow the selection of Banks in number pairs (eg.: L1+U1, L2+U2), or keeping L0 constant, and varying the Upper (U0-U3).

The important point here is that you could not 'mix & match' the Banks (Upper appear as Lower, Lower as Upper, or say, map U2 from $0000-$7FFF and U3 as $8000-SFFFF).

To use data from one bank to another generally required the copying of that data. This is why most applications of the extra memory were as Ram Disks, or extra data storage, NOT as programs. (Tho you could have four different copies of the Color Basic ROMS for example, or four different OS-9 '64K machines' running one at a time.)

THE COCO-3 DAT

To make the most economical use of the available RAM, and make the most use of reentrant (used by more than one process at a time) and position-independent (runnable at any address, possibly using a different data area) programs or sections of data, the DAT has to be much more flexible than the Bank switching schemes above.

For instance, in the example given of four copies of the Basic ROMS, what if you had not modified the Extended Color ROM? You would have wasted 24K of RAM (3 banks x 8K) on extra copies. (Actually, you wasted 32K, since it'd be even better just to keep the original ROM in place'.) Or what if you really wanted one ROM copy and seven 32K RAM program spaces? Or you need to temporarily map in 32K of video RAM? Or keep seven different variations of the Disk ROM, which would all (at least on a CoCo-2) need to made to appear at $CO00 up?

And we haven't even discussed OS-9 yet!

What have we figured out? We need both smaller translation 'blocks' and a way of making those physical blocks appear to the CPU at any logical block size boundary.

What size should a block be? So far, it seems that the smaller the better for a programmer or operating system, because that could leave more 'free blocks' left over for other use. This will become apparent later, in the Level 2 discussion. Many Level 2 machines use a 4K block. The CoCo-3 uses an 8K block size. In most cases, this may not be restrictive, except perhaps on a base 128K machine.

And so we come to the CoCo DAT. Here's a simple diagram:

                                +--------+                  Video Addr (19 lines)
+--------+                      | Task#  |                     |
|        | A15:13      R2:0     +--------+                 +---+---+
|  CPU   +------/------------>--+        |  P18:13         |       |
|        | (3 address lines)    |  DAT   +-----/------->---+ RAM   |
|        |                      |  RAM   |  (6 lines)      | ADDR  |    512K
|        |                      +--------+                 | MUX   +--> RAM
|        | A12:0                            P12:0          |       |
|        +-------------------------------------/--------->-+       |
|        | (13 address lines)                              |       |
+--------+                                                 +-------+
                                <============ GIME ================>

As shown, the DAT RAM would be 8 six-bit words x 2 tasks (explained below).

From left to right, the Logical Addresses from the CPU are translated into a extended Physical Address to access the RAM.

The upper 3 CPU lines (A13-A15) are used to tell the DAT which 8K Logical Block is being used (1 of 8 in a 64K map) and act as DAT RAM address (R0–R2) lines. At that Logical Block address in the DAT is a 6-bit data word, which forms the extended Physical Address lines P13-P18. The lower CPU address lines are passed thru as is to point within the 8K RAM block (out of the 512K RAM) selected by P13-P18.

Note that 6 bits can form 64 block select words. Multiply 64 possible blocks by 8K per block, and there's your 512K RAM. You may write any 6-bit value to each of the 8 DAT RAM locations, thus choosing which of the 64 8K-blocks you wish to appear within the 8K address block the CPU wishes to access. You could even write the same value several times, making the same 8K physical RAM show up at different logical CPU addresses.

The Task number acts as the DAT R3 address line, and simply allows selection between 2 sets of eight DAT RAM words. This makes it simpler to change between 64K maps. Normally, you can software select the Task number.

AN ANALOGY

Okay, this has been rough on some of you, and my explanation may need some explaining (grin) so a simpler analogy is in order:

Let's say you have a fancy new TV cabinet with 8 sets from bottom to top in it. You can watch all 8 at a time. (This makes you the CPU, and each screen is 8K of your logical 64K address space.)

Ah, but each set also has 64 channels. So you can tune each set to ANY of the channels, or several to the SAME channel. (Each channel is like one 8K block out of the 64 available to you in a 512K machine.) When you tune in a program, you are said to have "mapped it in".

An analogy to the Task Register would be if each set had TWO channel selectors A and B, and you had one switch to select whether ALL the sets used their A or B setting. This is generally called "task switching". If you wanted to switch to a C,D, or E task, you'd have to get up and retune all 8 sets on their A or B selectors (all A or all B), possibly from a list (called a "DAT Image") you had made from TV Guide.

Get it now? The Coco-2 512K expansions would then be like the same cabinet, only the top or bottom four sets always tune together and only have 8 selector positions; the same eight channels per same position. Which would you buy?

NOW HAVE IT -- BUT WHAT USE IS ALL THIS?

So far, we've seen that the 64- 8K blocks can be arranged any which way that you'd like to see them, 8 at a time. As a quick example of what could be done, let's see how a text editor might work. We'll assume the upper 32K is RSDOS always, and not to be touched, to keep this simple.

This leaves us with 32K, or four 8K blocks for our program and data (the text). In Our example, we'll make the editor code itself just under 24K long, which leaves us only 8K for text. So, here's the map:

E000-FFFF logical block   7    hires cmds & I/O
C000-DFFE                 6    disk basic
AO00-BFFF                 5    color basic 
8000-9EFE                 4    extended basic
6000-7FFF                 3    editor
4000-5FFF                 2    editor
2000-3FFF                 1    editor
0000-1FFF                 0    text

(Note that this is kind of unrealistic, since you'd probably not want to have the text down in RSDOS variable territory, but this is just an extremely simple example, okay?)

Okay, you type in 8K of text. Normally, that'd be all you could do, but remember that we can make any Physical 8K Block map into any Logical 8K Block. So the editor, when it realizes that it's buffer is almost full, could tell the GIME MMU to make a different RAM block (out of the 64, minus those used by Basic for text, etc) appear to the CPU in our logical block 0 (from S0000$1FFF).

Even if Basic uses up 8 actual RAM blocks for it's own use, and the editor uses 3, we still could use (64-11) or 53x8K blocks. That's over 400K of text space. By swapping real (physical) RAM into our 64K (logical) map like this, the only limitation on spreadsheets, editors, etc, is that the programmer must respect the 8K block boundaries.

Hmmm... you say. I could even swap in different editor programs, if I had to, couldn't I? You bet. Now you're starting to get an inkling of how Microware did Level 2.

OK, WHAT ABOUT OS-9 LEVEL 2?

Level 2 gives each process up to 64K to work with. It allocates blocks of memory (you got it - up to eight 8K blocks!) for that process to use as program or data areas.

Having 512K of memory does NOT mean you could do a "basic09 #200k" command line. The CPU can still only access 64K at a time, but the space not used by Basic09 (which itself is about 24K long) is usable for data. So about 64K minus 24K is about 40K, which is very big for a Basic09 program.

Notice a gotcha here, though. If Basic09 was 25K long, then you'd have much less data area possible. Why? Remember the 8K blocks. A 25K program would map in using four 8K blocks (three wouldn't be enough), using up 32K of your 64K map. The same goes if you asked for 9K of data space. You'd get two 8K blocks of RAM mapped in, taking up 16K of CPU space. Aha! Now you understand why the smaller the block size the better.

Back to the good parts. Remember that most OS-9 programs are reentrant and position independent. This means that no matter how many processes or terminal-users want to use a certain program, only ONE copy needs to be in memory. (Check the difference: if you had 10 Basic09 programs running, each needing 30K of data space - they'd need only 24K for B09 + 10*30K, versus 10*(24K+30K), a 216K savings.) The Amiga's programs, for example, aren't reentrant. It'd need 540K.

As far as making 200K virtual programs, there ARE ways of doing that. You could start other processes (Forking), or map in different data modules. Even better, you can pre-Load modules, and by Linking and Unlinking them, they will swap in and out of your 64K address space, a technique much faster than using RamDisks. (A Loaded module is off in RAM somewhere, but not in your map until Linked to.) This is what Basic09 does, by the way, so by writing a program that calls lots of small subprograms, each would get swapped in automatically as you needed them. Instant 400K basic!

TOO MUCH TO SAY

Well, there's about a zillion other things I wanted to put in here, like how the page at $FE00-$FEFF is across all maps, to make moving data easier (some move code is there); or how each Level 2 process or block of programs has a DAT Image associated with it, that can be swapped into the DAT RAM; or that up to 64K is allocated to the System Task, where the Kernel and Drivers and buffers are; or the neat tricks you could do using the DAT; or show you a possible memory map using the DAT; or about how interrupts switch to the System Task.

(Some of this IS covered in this new collection - Kevin)

Section 4

DAT IMAGES and TASKS

It may seem that we're spending a lot of space on the DAT, but it's very important to the whole of Level 2. So...

As you now know, the DAT in the CoCo-3 allows you to specify which of up to eight blocks will appear in the 6809's logical address map when their numbers are stored and enabled in the GIME's MMU or DAT.

Ideally, an MMU would have enough ram to handle the maps for any conceivable number of programs, modules or movement. But ram that fast is expensive and uses lots of power. So a compromise was made -- in the GIME's case, two sets of DAT registers. That is, two complete 64K maps can be stored and switched between at will.

You will surely need one map for the system plus another for a shell at least. So how does OS-9 handle the needs of all the other programs you want to run? By swapping sets of block numbers into the DAT as needed.

The set of block numbers is stored in a packet of information called a DAT Image. Because various OS-9 machines use different size blocks (2K, 4K, 8K, are most frequent) and have differing amounts of memory blocks available, a DAT Image can vary in size even though a process descriptor has 64 bytes available for one.

On the CoCo-3, it's 16 bytes long, made up of 8 two-byte entries. The first byte of each entry is usually zero, while the second byte is the physical block number. The exception is when an entry contains a special value of $333E, which is used to indicate that that logical block is unused as memory for that map.

When expanding the amount of blocks allocated to a map, OS-9 checks for the special $333E flag bytes. That's how it knows where to place new blocks in the DAT Image.

DAT Images are created for several purposes. The one that affects you the most is the image stored in a process descriptor. Whenever a process comes up in the queue for running, it's DAT image is copied to one of the two sets of GIME task map registers. Then that set is enabled by setting the task register select. Instantly the new logical map is the one seen by the CPU. When a process' timeslice is up, it also gives up the use of the task number.

The task register number used for the process DAT image is usually the same number stored in the P$Task byte in other Level 2 computers. On the CoCo-3 however, P$Task contains the number of a virtual or fake DAT task map. There are 32 of these, which make it appear as though the GIME had 32 sets of map registers.

If the images are already in the process descriptors, why have virtual tasks? Because it's simpler for the system to look them up in a known table versus searching all over.

The first two virtual DAT tasks (0 and 1) are reserved for the system's use. The first is for the usual kernel, drivers, descriptors, buffers. The second is for GrfDrv's screen and buffer access.

So on the CoCo-3, the task number refers to a table entry that points to the DAT Image to be used. Except for special cases, the pointer is to the image within a process descriptor.

Another use for the images is in the module directory. Unlike Level 1, where the entry could simply contain the module's address within the 64K you had, Level 2 entries point to a DAT Image of the block or blocks containing the module and any others loaded with it.

While a module file is being loaded, OS-9 temporarily allocates a process descriptor and a task number for it. The file is then read into blocks of memory that F$Load has requested. Then the descriptor & task are released, leaving the modules in a kind of "no-man's-land", waiting to be mapped into a program's space.

The visible residue of loading a file of modules is that the free memory count goes down, and any new modules found are entered into the system map's module directory. Otherwise, they don't directly affect a process map until linked into it.

Each Module Directory entry is made up of:

00-01 MD$MPDAT - Module DAT Image Pointer
02-03 MD$MBSiz - Block size total
04-05 MD$MPtr - Module offset within Image
06-07 MD$Link - Module link count

A program such as Mdir can use these to display what it does about the modules in memory. First, it gets the module directory using FSGModDr. Then by using the DATImage and offset associated with an entry, Mdir FSMove's the header and name from the blocks where the module has been loaded.

The Mdir example illustrates a third common usage of images, moving data into your program's map for inspection.

Anytime you need to "see" memory external to your process (sorry, you can only legally read it; no writes), you can create a DAT image of your own and use it with F$Move. OS-9 will take the offset and amount you pass, and copy that amount over to your map from the offset within the image you made.

In the case of Mdir, the image was moved over by F$GModDr along with the module directory entries. So there's no need to build an image in that case. Just use the MD$MPDAT pointer.

You may also in some cases request movement of data between maps using a reference to a Task number instead. OS-9 itself will internally index off the tasks' images for you.

Notice that throughout this section, the image is used over and over simply to allow the cpu to read or write to extended memory.

In the next section, we'll see some examples of DAT Images and maps.

Section 5

LEVEL 2 IN MORE DETAIL

I will be using "Level 1" and "Level 2" for the two different versions. Other word definitions I use here are (loosely):

space- any 6809 logical 64K address area.
mapping, mapped in -causing blocks to appear in a space.
a map - a space containing mapped-in modules/RAM blocks.
system map - the 64K map containing the system code. task - a particular map with a certain program and data area
task number - number of a particular task map.
DAT map - a task ready to use thru the hardware/software enable of the task number's map.
task register - task number stored here to enable a DAT map.
user code - the programs/data you use (applications).
system code - the programs/data the system uses (file mgrs, drivers, descriptors, and the kernel F$ & I$Calls, IRQ handlers, and scheduling codes).

LEVEL 1 VS LEVEL 2: General

The core of understanding Level 2 is in understanding the separation and handling of 8K blocks, and their use in logical 64K spaces. And why.

DAT

Under Level 1, you only had 64K of contiguous physical RAM in one 64K logical map. Level 2 uses the DAT to map any physical 8K blocks of RAM containing program and data modules into a 64K logical address map. When a program's turn to run comes up, the block map data (called a DAT Image) for its 64K space is copied to and/or enabled in the GIME's DAT.

SWI's

Level 2 was designed to run most programs written for One, which is possible since system calls are made using a software interrupt call, passing parameters (via cpu registers pushed on a memory stack) that are pointed to by the 6809's SP register. This gives two advantages over Level 1:

Virtually none of the system code has to reside in the 64K space containing the user's program and data areas. The system map is switched in place of the caller's map.
OS/9 needs only to know the caller's SP and task number (both kept in the caller's process descriptor in the system map) to access the parameters passed, or to move data between the two maps.

(Note that a kernel could be written to do simply this on any CoCo that had the Banker or DSL Ram expansion, etc. But you'd lose the advantage of the smaller flexibly-mapped blocks provided by the GIME's DAT.

The corollary advantage, and the "why" of Level 2, is that each user program can have almost an entire 64K space to itself and its data area, as can also the system code.

THE SYSTEM TASK MAP:

Up to 63.75K of kernel, bootfile (drivers, mgrs, etc)
I/O buffers.
Descriptors.
System vars & tables.

System calls and other interrupts temporarily "flip" the program flow into this task map. User parameters and R/W data copied from/to system ram for drivers and file managers to act upon.

EACH USER TASK MAP:

Up to 63.5K total for each program and it's pgmdata area. Each task map made out of up to 8 module or pgmdata blocks (8K each) that are mapped in from the 64 (minus those used by the system task or other user tasks) blocks available in a 512K machine.

THE SYSTEM MAP

Oddly enough, the system map is close to what you're used to under Level 1. Memory is allocated for buffers and descriptors in pages just as before. The main difference is that no user programs (should) share space here, as they did under Level 1.

You still have the DirectPage variables from $0000-00FF along with other system global memory just above it up to $1FFF. Towards the top (????-FEFF) we run into descriptors, buffers, polling tables, and finally the I/O modules and the kernel. A CoCo-3 Level 2 System Map looks like this:

0000-0FFF  Normal Level 2 System Variables
1000-1FFF  New CC3 global mem and CC3IO tables
2000-Xxxx  free ram
xxxx-DFFF  Buffers, proc descs, bootfile
E000-FDFF  REL, Boot, OS9
FE00-FEFF  Vector page (top of OS9p1)
FF00-FFFF  I/O and GIME registers

Some areas of special interest include:

The Vector Page RAM: This page of RAM is mapped across ALL 64K maps. This "map-global" RAM is necessary so that no matter what other blocks are mapped in place of the system code, there is always a place for interrupts (hardware or software) to go and execute the special code in OS9p1 that switches over to the system task.
BlockMap: In a 512K CoCo OS/9 has 64 RAM blocks of 8K each to choose from (8K x 64 = 512K). Each is known by a number from 00-3F. The blockmap is a table of flags indicating the current status of each of these blocks, which could be ...
- FREE RAM = Ram blocks not in use as Module/PgmData areas.
- RAM IN USE = Ram blocks in use for either:
  - Modules - Blocks that contain program, subroutine, or data modules. MDIR will show these. Before a module is used, it will have been loaded into free ram blocks. On link or run, those blocks are then mapped into (made to appearin) any task's space. A data module mapped into several maps can provide inter-task vars. Subroutine mods (like for RUNB) can be linked/unlinked, in/out of a task map.
  - Data - Free ram that has been mapped into a task space for use as pgm data areas. Normally these blocks are only mapped into one task space (unlike module blocks). These blocks will be released to the free RAM pool when the program using them exits.
DAT Images: Since each task map requires knowing which (of up to 8) blocks are to be mapped in for that process (yes- system code execution is also a process), AND since OS-9 must know in which blocks that program modules have been loaded into, OS-9 keeps individual tables or "images" of those block numbers.

Each Image has 8 slots, two bytes each. A special block number, $333E, is used to designate an unused logical block for that task.

Module Directory: In Level 1, the module directory simply had to point to the module's address. Under Level 2, it points to the DAT Image table showing the block(s) the module is physically in and it's beginning offset within the DAT Image logical 64K map.
Process Descriptors: A descriptor contains pretty much the same info as it did under Level 1, but adds the DAT Image for that process, which will be set into the DAT when it's turn to run comes up.

There is also a local process stack area, used while in the system state (executing system code after a system call). This is because the process's real stack is of course in another map, and a local stack is needed if the process were interrupted or went to sleep.

SYSTEM MEMORY ALLOCATION

As I said above, the system map is still allocated internally in pages. However, when you first boot up, it usually will only have about 5 blocks mapped in. Something like:

    Logical    Physical
    Address    Block (s)
    0000-1FFF  00              - block 00 is always here
    2000-7FFF                  - no ram needed here yet
    8000-DFFF  01, 02, 03      - this is your boot file, first vars
    E000-FEFF  3F              - block 3F always contains the kernel

The system process descriptor of course has the DAT Image that corresponds to this block map,

Any RAM left over in blocks allocated for loading the bootfile is taken by page for system use. For instance, the device table normally is just below the bottom of the boot.

Once you begin running several processes and opening files, the system must allocate more RAM for descriptors and buffers. When all the pages that are free in the blocks already mapped in are used up, OS-9 maps in another block, which is then also sub-allocated by page.

Page allocation is still used because buffers, descriptors and tables usually are a page or two size, just as under Level 1. So it's still the best use of available memory.

USER MAPS

MODULE and DATA AREAS

Each user process has the use of a map made up of up to eight 8K blocks. However, it is seldom that all eight are in use (certain basic09 and graphics programs excepted).

More likely, each task map will look like:

Logical     Physical
Address     Block(s)
-------     -------
0000-1FFF   ??       - 8K data area
2000-DFFF            - no ram needed here yet
E000-FEFF   ??       - block containing program

Again, the process descriptor DAT Image has a copy of the block numbers actually used (instead of ??).

Unlike Level 1, RAM for a user process is NOT allocated by page. There's no need to, for two reasons. First, the data area is not shared with any other process.

Second, no memory can be used from any left over in the program block. Many people ask why not? Hey, they say, since you can map a block anywhere, why can't some other program take advantage of the unused RAM? The answer is basically that it would just take too many resources to keep track of what module should stay because part of the block was being used for data.

Even more importantly, what if a program requested more memory while it was running? You'd be stuck, as data areas must be contiguous and any modules within that block would be in the way. One more reason: Level 2 was designed to take advantage of modules in ROM. So there's no way to assume that RAM is available in that block.

So, the upshot is that data areas are allocated from any free RAM blocks in the machine, and always 8K at a time. Even if your program only needed two pages to run in, it still gets a block. Now you can see that the smaller the block the better, as in this case having 4K blocks would leave more free RAM for other programs to use.

Just like in Level 1, programs end up at the highest logical address possible in a map, and data areas at the bottom. For the same reason as in One, this is done to allow the data area to grow as much as possible if needed.

One very important point to make at this time: since all modules that were loaded together are also mapped into spaces together, it pays to keep module files close to an 8K boundary. More details on this are in the MISC TIPS section at the end of the book

SWITCHING BETWEEN MAPS

Okay, now we come to the nitty-gritty of Level 2. This is where we tie together all we've talked about so far. But it's not tough, so don't worry. Let's say that a program is running in it's own map, and wishes to use a system call for I/O. How does the code get over to the system map where the drivers are?

An OS-9 system call is simply a software interrupt. What that means is that what the program is doing and where it's at is saved in the process' memory on a stack of variables.

Then, like all interrupts, program flow is redirected (by reading the CoCo's BASIC ROM, specially mapped in just long enough to get the addresses) to the vector page at logical address FE00 which is at the top of all maps.

The code within that page is part of OS9p1 and it knows that it should change the GIME task register select to task 0, which is always the system map. As soon as it does that, all the kernel, file managers, drivers etc are accessible to the CPU, which will come down out of the vector page to complete your system call. If needed, OS-9 will go back to code located in the vector page where it can map in your user task long enough to get and put data.

At the end of the call, the system code jumps back up into the vector page, maps your process DAT Image back into the GIME's task map 1, then enables task register 1 which allows your program space to reappear to the CPU.

Then the saved registers are taken back off the stack in your map, and your program continues.

If you want to, you can think of Level 2 as really giving your program 128K of RAM, as the net effect compared to Level 1 is just that... under Level 1, your program had to share space with the drivers and kernel, and any system calls stayed within the same old 64K map. Under Level 2, your program jumps between 64K maps when you make a system call.

One side note: because of the manipulation of the GIME's MMU and the necessity of copying much data between maps, Level 2 is normally slower than Level 1. However, the CoCo-3 makes up for this as it runs at twice the speed of our older CoCo's.

EXAMPLE MAPS

Here are some example process, module and memory maps generated by the programs I've included in the back of this book. Study them and you can see the relationship between what is reported by each utility. They should help give you a better feel as to what's going on in your machine.

EXAMPLE ONE:

I had two shells running, and of course the particular utility that was printing out at the time.

ID  Prnt User Pty Age St Sig .. Module  Std in/out
=== === ==== === ===  == === == ======= ========
2     1    0 128 129  80   0 00 Shell   <TERM >TERM
3     2    0 128 129  80   0 00 Shell   <W7 >W7
4     3    0 128 128  80   0 00 Proc    くW7 >D1

Below's my PMAP output. The numbers across the top (0123 etc) are short forms of (00001FFF, 2000-3FFF) addresses in each task's logical map. Notice that there are indeed eight 8K block places in each map, but only those blocks that are needed are mapped in (and are in the DAT Image of that process, which by the way, is where the map information is gotten by PMAP).

ID     01 23 45 67 89 AB CD EF Program
----   -- -- -- -- -- -- -- -- -------
  1    00 .. .. 04 01 02 03 3F SYSTEM
  2    05 .. .. .. .. .. 06 .. Shell
  3    07 .. .. .. .. .. 06 .. Shell
  4    0A .. .. .. .. .. .. 08 PMap

Now, notice that in the SYSTEM map is Block 00 = system global variables, Block 3F = kernel, Blocks 01,02,03 = bootfile, and Block 04 plus probably part of 01, = system data and tables.

In the shell and pmap lines, we see that Blocks 05,07,0A are being used for data. Block 06 must contain the Shell, and Block 08 must contain Pmap. We can confirm all this by looking at the module directory output below and comparing block numbers:

 Module Directory at 00:03:51
Blk Ofst Size Ty Rv At Uc Name
 3F  D06  12A C1  1 r  00 REL    - the kernel
 3F  E30  1D0 C1  1 r  01 Boot
 3F 1000  ED9 C0  8 r  00 OS9p1
 O1  300  CAE C0  2 r  01 OS9p2  - boot modules
 01  FAE   2E C0  1 r  01 Init
 .. ....  ... ..  . .  .. ....
 01 6947  1EE C1  1 e  02 Clock
 01 6B35  1AE 11  1 .  01 CC3Go
 06    0  5FC 11  1 r  03 Shell  - the Shell file
 06  5FC  2E7 11  1 r  00 Copy
 .. ....  ... ..  . .  .. ....
 06 1E10   2D 11  1 r  00 Unlink
 08    0  28E 11  1 r  01 Proc   - my cmds file
 08  435  1B1 11  1 r  00 MMap
 08  5E6  1F8 11  1 r  00 PMap
 08  7DE  1D5 11  1 r  00 SMap
 08  9B3  136 11  1 r  00 DMem
 08  AE9  240 11  1 r  00 Dump
 09    0 1FFC C1  1 r  01 GrfDrv - grfdrv is alone

Using my MMAP command, we can see below how many blocks are left for the OS-9 system to use. Take notice of the block 3E being allocated... that's the video display ram block.

RAM for video is allocated from higher numbered blocks, since there is a better chance of finding contiguous RAM that way. Normally, blocks don't have to be together for OS-9 to use them, but the GIME requires that screen memory be that way for display.

   0 1 2 3 4 5 6 7 8 9 A B C D E F
#  = = = = = = = = = = = = = = = =
0  U U U U U U M U M M U _ _ _ _ _   <-- blocks 00-0A
1  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3  _ _ _ _ _ _ _ _ _ _ _ _ _ _ U U   <-- 3E, 3F

Number of Free Blocks: 51
 Ram Free in KBytes: 408

EXAMPLE TWO:

This real example I ran off the other day. I had five shells, all of which had started another process (by me typing it in).

 ID Prnt User Pty Age  St Sig .. Module  Std in/out
  2    1    0 128 129  80   0 00 Shell   <TERM >TERM
  3    2    0 128 130  80   0 00 Shell   <W7   >W7
  4    3    0 128 129  80   0 00 Shell   <W4   >W4
  5    4    0 128 129  80   0 00 pix     <W4   >W4
  6    2    0 128 129  80   0 00 pix     <TERM >TERM
  7    3    0 128 129  80   0 00 Shell   <W5   >W5
  8    7    0 128 128  80   0 00 pix     <W5   >W5
  9    3    0 128 129  80   0 00 Shell   <W6   >W6
 10    3    0 128 128  80   0 00 Proc    <W7   >D1
 11    9    0 128 129  C0   0 00 Ball    <W6   >W6

Note the high block numbers in most of the programs. Each window was showing an Atari ST picture in it, and process #11 had Steve Bjork's bouncing ball demo running.

True windows that use GrfInt and Grfdrv are NOT mapped into a program's space. But this was special, as I was running many VDGInt screens, which usually ARE mapped in (on purpose) so that the programs could directly access the video display.

Notice also that my System task had fully been allocated by block. The SMAP later shows what part of them was free.

ID      01 23 45 67 89 AB CD EF   Program
--      -- -- -- -- -- -- -- --   -------
 1      OO 31 11 O4 O1 02 03 3F   SYSTEM
 2      05 .. .. .. .. .. 06 ..   Shell - see note below
 3      07 .. .. .. .. .. 06 ..   Shell
 4      09 .. .. .. .. .. 06 ..   Shell
 5      OE .. .. 3A 3B 3C 3D 0D   pix
 6      0F .. .. 36 37 38 39 0D   pix
 7      10 .. .. .. .. .. 06 ..   Shell
 8      12 .. .. 32 33 34 35 0D   pix
 9      13 .. .. .. .. .. 06 ..   Shell
10      18 .. .. .. .. .. .. 19   PMap
11      14 16 17 .. .. .. 31 15   Ball

The other point to note is that the Tandy-provided shell file (block 06) goes over the block size-512 byte limit, and thus cannot be mapped into the top block slot, because it would fall on top of the vector page and I/O area from FE00-FFFF.

Here's the MMAP output. Lots of video ram allocated, huh?

  0 1 2 3 4 5 6 7 8 9 A B C D E F
# = = = = = = = = = = = = = = = =
O U U U U U U M U M U M M M M U U
1 U U U U U M U U U M _ _ _ _ _ _ 
2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
3 _ U U U U U U U U U U U U U U U

Number of Free Blocks: 23
Ram Free in KBytes: 184

And just to show how close I was to a real limit, here's the SMAP utility output. It shows in pages how much memory is left in the system task map. The 32x16 old-style VDG text screens and all the process descriptors (two pages each!), plus a page for each window's SCF input buffer made things rather tight.

  0 1 2 3 4 5 6 7 8 9 A B C D E F
# = = = = = = = = = = = = = = = =
O U U U U U U U U U U U U U U U U
1 U U U U U U U U U U U U U U U U
2 U U U U U U U U U U U U U U U U
3 U U U U U U U U U U U U U U U U
4 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
5 _ _ U U U U _ U U U U U U U U U
6 U U U U U U U U U U U U U U U U
7 Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս Ս
8 U U U U U U U U U U U U U U U U
9 U U U U U U U U U U U U U U U U
A U U U U U U U U U U U U U U U U
B U U U U U U U U U U U U U U U U
C U U U U U U U U U U U U U U U U
D U U U U U U U U U U U U U U U U
E U U U U U U U U U U U U U U U U
F U U U U U U U U U U U U U U U .

Number of Free Pages: 19
Ram Free in KBytes: 4