This page, which is still being written, explains how Open744 works. It is aimed at the programmer/hacker -- I assume you know C and some assembly language.
I start by introducing the PS1 program as its designed to run, out of the box, in a DOS/Windows environment.
PS1 is a 16-bit real-mode application written for the DOS environment in Turbo Pascal. It uses the x86 segmented memory model. The core program code is packaged in the executable file PS.EXE; additional subroutines are offloaded in the file PS.OVR, formatted as a Borland overlay file. These subroutines are loaded into memory on demand by the overlay manager.
The core program code is organised into 11 segments, starting at offset 0x0 (all offsets in this document are given in relation to the start of the first code segment). Code in the overlay file is organised into fifteen segments. Unlike main code segments, these do not have a fixed location in memory, thus the segment value is unknown until the overlay is loaded into memory. There is an overlay header in core memory for each segment in the overlay file.
All procedure calls in PS1 are 'far', which means that the segment word and procedure offset word are pushed onto the stack during the call, and popped off the stack upon return. Calls from within the same segment require an additional 'push cs' (push current code segment word) prefix. Inter-segment calls supply the full segment:offset address. Calls into and between overlay code segments involve the overlay header, which contains a table of exported subroutines for each segment. Each table entry contains a software trap instruction (bytes 0xcd 0x3f) and a 16-bit offset word. The overlay manager 'snaps' the trap instruction into a far jump instruction, using the offset word and supplying the (now-known) segment word.
There is a single data segment, located at offset 0x5fff0, for global variables. There is a single stack segment which is allocated by DOS at startup. Static constants are compiled into the code segments, immediately preceding the procedures in which each is used in. Local variables use the per-procedure stack frame (BP-based, in assembly-speak).
During the initialisation phase of program execution, PS1 allocates memory for the audio buffer, track plotter buffer, and the graphics cut-buffer (used for backing up small areas of the video screen). The overlay manager also allocates memory for loading overlay segments. The VGA 640x480x16 video mode is set using the standard BGI (Borland Graphics Interface) InitGraph routine.
The floating point emulation module is initialised. This involves redirecting software interrupts 0x34 to 0x3e to the FP emulator handler. Floating point operations in PS1 are coded as emulator traps that utilise this set of interrupts. Except for int 0x3e, these traps are 'snapped' into actual x87 FPU opcodes at runtime by the emulator, if a hardware FPU is present (this is universally true for all processors since the 486DX), in a manner similar to that of overlays discussed above. There is thus no performance penalty when executing FPU code after the first passthrough.
The 747PS13.CFG file is parsed and internal variables for soundcard, joystick, airline configuration and initial situation files are set. If 747PS13.CFG is not found, the user is prompted to answer a series of questions on soundcard and joystick settings, then the file is created, with default values filled in for other variables.
The 'main()' procedure is then called, and this contains the main program loop which handles all simulator logic. The main program loop comprises an outer 'fast' section, whose frequency is not regulated (typically runs at 30,000 to 70,000 times a second), and an inner 'slow' section, whose frequency is maintained at 18.2 Hz by reference to the system timestamp returned by the GetTime function. The 'slow' section is further divided into subsections that run on even timerticks, odd timerticks, every 4 ticks, etc.
[The rates of the program loop are indicated within PS1 on the "SIMULATOR INTERNAL" CDU page under "CPU PERF". The first figure is the rate of the 'fast' section (internal variable Frame), and the second figure is the rate of the 'slow' section (internal variable Frame2). Frame2 should be 18 (with a carry to 19 every 5 seconds, reflecting the overflow of 0.2x5) for optimal sim performance. Frame is a reflection of wasted CPU cycles, as no meaningful logic is processed in this loop, other than reading keyboard and mouse events.]
Before going on to describe the execution environment of Open744, I will explain its two-process design, and the Inter-Process Communication between the two.
Open744 allocates an area of shared memory using mmap(), then forks into two processes, very early after starting. The parent process, proc_PS1, is the systems engine which executes the PS1 code. The child process, proc_SDL, is concerned with the user interface -- video, audio, keyboard, mouse, joystick and other devices. proc_SDL, as its name implies, loads the SDL library and sets up an event handling loop. Input from devices is immediately placed in the shared memory buffer (ipc_mapbuf) where it can be read by proc_PS1 during its 'fast' loop.
proc_SDL receives BGI drawing commands from proc_PS1 through an unnamed pipe created prior to the fork. [It is a simple matter to replace the pipe with a socket, which implies that the video output can possibly be sent to a remote machine on the network.] The use of a pipe allows the two processes to be loosely synchronised (as a full pipe would cause the writer to block) without extra code. Using the shared memory buffer for this purpose would require some form of mutex or semaphore to synchronise the two processes.
proc_SDL receives audio commands from proc_PS1 through the shared memory buffer (at MAP_AUDIOCMD). Because audio events occur with a relatively low frequency (not more than 4 or 5 per second) and the possibility of a full buffer is almost zero, the use of the shared memory buffer is appropriate here. [XXX edit later: Care needs to be taken re: sequencing of sound bites in ATC output.]
[Future feature: proc_SDL will also handle network connections. It will peek and poke data (a la 747IPC) in the shared memory buffer (which is mapped to PS1's data segment).]
I now describe how Open744 provides the execution environment that PS1 expects, and explain why it runs more efficiently here than in the original DOS environment. This is provided by proc_PS1.
Linux/FreeBSD/Windows programs run in x86 protected mode. The bit-ness of a piece of code or data is determined by the descriptor of its memory segment. Native programs use 32-bit segments, but it is possible on all these platforms to craft descriptors for 16-bit code and data segments. Descriptors specify the memory address at which a segment begins, and the limit, which is the size of the segment. Segments are referenced using a selector, which is an index into a table of descriptors.
Open744 first reads PS.EXE and PS.OVR into memory, then allocates descriptors for each of the 11 segments in PS.EXE, and 15 segments in PS.OVR. Descriptors are also allocated for the global data segment, the stack segment, the system segment (for native library procedure calls) and the thunk segment (see below). The data segment, unlike the others, resides in the mmap()'d shared memory area (ipc_mapbuf) which is accessible by proc_SDL.
We have thus completely done away with the system of code overlays, which was a way of swapping code in and out of memory, a legacy of memory limitations in DOS systems. All code now remains in memory and is efficiently and directly accessed by far calls. Yet another level of efficiency can be achieved by doing away with the use of segments altogether, so that all calls are near, however this is impossible with 16-bit segments with a size limit of 64 kb.
After all segments are loaded into memory, Open744 performs a FPU un-emulation operation on a specified list of functions (see funcs.c). This involves translating every int 0x34 to 0x3d software trap (and its associated trailing 1 or 2 bytes) to native x87 FPU opcodes (see fpu_conv()). int 0x3e is the Borland 'shortcut call' and is not directly used in PS1 code, but is used by the Pascal FP math library, for which I have written native routines (sin, cos, ln, exp).
FPU un-emulation gives a very slight performance improvement over the original PS1 code, when first executing FPU code, as there is no penalty of running emulator code.
The next task is that of segment fixups. Every single reference in the original code to memory segments must be converted from real mode segments, to protected mode selectors. For PS.EXE, there is a huge table located near the beginning of the file, which contains segment:offset vectors to the locations in PS.EXE code that require fixup. For each overlay in PS.OVR, the list of fixups is located after the code, and contains only offset vectors (no segment) to the locations within that overlay for fixup.
Many of the fixups are actually far calls to Pascal library routines, some of which are safe for use in protected mode, there being no 'segment arithmetic', DOS calls, nor direct hardware access. For these, we easily perform a simple segment fixup (with the 'system' selector) that calls the original Pascal routine. Many other Pascal library routines obviously will not work in a protected mode environment. These calls need to be redirected to routines that we write ourselves, in C of course. Because PS1 code runs in a 16-bit segment, and our routines reside in a 32-bit segment, execution must transition through a thunk, which is a small piece of 16-bit interface code that switches the stack, copies sufficient numbers of arguments from the 16-bit to the 32-bit stack, then performs a far call to the 32-bit routine. Thunk routines are written in assembly language (see thunk.s). Upon return from the 32-bit routine, the thunk switches back to the 16-bit stack, clears up the stack arguments, then transitions back to the PS1 caller, preserving the return value of the routine in register AX.
[Segment fixups are called 'reloc's in xwinps1.c, as originally in DOS, the segments were 'relocated' by simply adding a fixed value to each segment -- segment arithmetic.]
The basic framework for executing PS1 is now in place. In order to extend the functionality of PS1 beyond its original design, we need to lay traps or hooks in PS1 code at strategic locations so that execution is diverted into our new code, and we can alter the outcome according to our wishes. Setting traps requires a knowledge of PS1 source code; this is not available to us. You are however free to disassemble PS.EXE and PS.OVR and explore their contents. Doing so would allow you to find appropriate locations for laying traps, which would not otherwise disturb program flow (eg messing up jump destinations, corrupting the next instruction).
Because a trap is a far call that takes 5 bytes, we generally look for pre-existing far calls that we can hijack. This provides the added convenience, upon returning from the trap, of simply resuming execution at the original far call target address. It is also possible to set a trap over any sequence of instructions that total 5 or more bytes (extra bytes being padded with 'nop'), provided that your routine eventually performs the operations that were obliterated by your trap.
Byte-patching is a less challenging task of altering PS1 code, usually to bypass a function completely by patching a 'retf', or altering single byte values.
[See set_traps() for examples of trap-setting, and patch_code() for examples of byte-patching.]
In the final step of Open744 startup, the 32-bit segment registers are saved for later use (by the thunks). Open744 is now fully setup and ready to jump into PS1's entry point.