MZ Executable image is a type of DOS program which came since PC-DOS 1.0 was released. PC-DOS .EXE-program always starts with MZ or ZM (ASCII WORD) signature. This is a main sign for image detection.

Every DOS program marked as .EXEcutable binary has two critical important parts for OS loader.

MZ Header

Here is an structure written on Rust language for example.

struct MzHeader{
    // Standard "DOS Header"
    pub e_sign: Lu16,       // ZM or MZ
    pub e_cblp: Lu16,       // Last Block                   (*)
    pub e_cp: Lu16,         // Count of pages/blocks        (*)
    pub e_relc: Lu16,       // Count of relocations         (*)
    pub e_cparhdr: Lu16,    // Header's size (in blocks)    (*)
    pub e_minep: Lu16,      // Minimum required RAM in blocks
    pub e_maxep: Lu16,      // Required by program RAM in blocks
    pub ss: Lu16,           // (I8086) Stack Segment        (*)
    pub sp: Lu16,           // (I8086) Stack Pointer        (*)
    pub e_check: Lu16,      // Checksum of image [ignores]
    pub ip: Lu16,           // (I8086) Instruction Pointer  (*)
    pub cs: Lu16,           // (I8086) Code Segment         (*)
    pub e_lfarlc: Lu16,     // Relocation table
    
    // "Extended DOS Header" or "MZ-Header" 
    pub e_ovno: Lu16,        // Current overlay's number [ignores]
    pub e_res0x1c: [Lu16; 4],// [ignores]
    pub e_oemid: Lu16,       // [ignores]
    pub e_oeminfo: Lu16,     // [ignores]
    pub e_res_0x28: [Lu16; 10], // [ignores]
    pub e_lfanew: Lu32,         // MS-DOS 4.+ application starts to have it non-zero 
}

“Clear” DOS images hold e_lfanew field zero. Sunflower bases on it to define executable image as true MS-DOS images.

All (*) marked values not takes by DOS loader as it holds in data structure! MS-DOS takes those values and computes actually system information about program.

Also all [ignores]-marked values are ignores by PC-DOS and MS-DOS loaders.

Overlays

Overlays for DOS programs also have MZ-Header but fields, which ignores by loader and e_lfanew field equals zero.

The “Perfect Binary” as a concept means binary image, which holds on right optional (ignoring by loader) information.

If e_ovno field has non-zero value in main .EXE part, this may warn you about malware victim or special corrupted image. (???)

So, how DOS defines overlays and manipulates them?

          ...
          ...
          ...
          ; Allocate overlay memory
          mov     bx,1000h        ; 64 KB (4096 blocks)
          mov     ah,48h          ; 48h = allocate block
          int     21h             
          jc      __error         ; if allocation error
 
          mov     pars,ax         ; address
          mov     pars+2,ax       ; overlay segment
 
                                  ; entry point segment
          mov     word ptr entry+2,ax
 
          mov     stkseg,ss
          mov     stkptr,sp
 
          mov     ax,ds           ; ES = DS
          mov     es,ax
 
          mov     dx,offset oname ; DS:DX = filename
          mov     bx,offset pars  ; ES:BX = parameter block
          
          ; Special DOS API call here
          mov     ax,4b03h        ; <-- EXEC 0x03
          int     21h
 
          mov     ax,_DATA        ; new "_DATA segment"
          mov     ds,ax
          mov     es,ax
 
          cli
          mov     ss,stkseg       ; restore .STACK
          mov     sp,stkptr
          sti
 
          jc      __error
 
                                  ; overwise no errors
          push    ds              ; save data to segment
          
          ; Calling overlay
          ; call of overlay's API always FAR
          call    dword ptr entry
          pop     ds              ; restore .DATA
          ...
          ...
          ...
  oname   db      'OVERLAY.OVL',0 ; filename
 
  pars    dw      0               ; segment address
          dw      0               ; relocations
 
  entry   dd      0               ; overlay's API (entry point)
 
  stkseg  dw      0               ; save Stack Segment
  stkptr  dw      0               ; save Stack Pointer

MS-DOS represents EXEC function for loading programs in memory. This function has 0x4B ordinal. If BYTE parameter for EXEC will be zero (means AL=0), function will load independent program module instead overlay..
If BYTE parameter for EXEC will be 3 (means AL=0x3) then function will load overlay part. Overlay is not program! This is a part of program module, that’s why MS-DOS skips .PSP-segment creating.

MZ Relocation Table

Count of relocations tells e_relc field and this is a main flag to correctly read relocations for DOS image.

The relocations have 16:16 format or (more well-known) FAR pointer format.

struct MZRelocation {
    pub r_offset: u16,
    pub r_segment: u16,
}
/// Rust pseudo-code for reading relocations
fn read_relocs(...) -> Vec::<MZRelocation> {
    let rel_vec: Vec::<MZRelocation> = Vec::<MZRelocation>::new();
    
    reader.seek(e_lfarlc, 0);
    if e_lfarlc = 0 {
        return rel_vec;
    }

    for i in 0..e_relc {
        let rel: MZRelocation = reader
            .read_bytes(4)
            .to_struct::<MZRelocation>();
        
        rel_vec.push(rel);
    }

    return rel_vec;
}

But for real, MZRelocation datastructure is a comfortable view of relocations in image. PC-DOS and MS-DOS catch relocations like 16:16 raw value (r_segoff: u32).

Sunflower uses datastructure way to show raw not prepared by DOS relocations when PC/MS-DOS MZ plugin works.

Loading process

DOS has a special memory blocks for transistent programs loading. Transistent programs are a user applications.

+----------------------------+ <--Transistent segment
| Upper Memory Block / Video |      |
+----------------------------+      | Needs for user applications
|                            |      | and programs which system doesn't need
|   Program Memory           |      | when it loads.
|                            |      | 
|                            | <----+
+----------------------------+ <-Resident segment
| (COMMAND.COM)              |   |
+----------------------------+   | 
| MS-DOS (MSDOS.SYS)         |   | Needs for system parts which must load
+----------------------------+   | when PC starts
| BIOS (IO.SYS)              |   |
+----------------------------+   |-0x0400
| Interrupt vectors          |   |
+----------------------------+ <-Begin

Programs which must have more than one segment for code or data starts from MZ ASCII (means non-terminated ASCII) string. While program is loading, operating system is setting up a I8086 registers, based on MZ header.

Firstly, DOS tries to compute basic parameters for next memory management:

  • image’s size (image_size);
  • image’s base (image_base);
  • expected (by program) CPU register values;
  • relocations (rel_vec).
let image_size = (e_cblp = 0) match {
    true => e_cp * 512
    false => (e_cp - 1) * 512 + e_cblp
}

let image_base = psp_offset + e_cparhdr + 0x10;

DOS never takes raw values from MZ relocations table! Those values points to unknown or zeroed spaces, because PC-DOS/MS-DOS processes those addresses array following next logic

for rel in rel_vec {
    let target_ptr = base_address + rel.offset as usize;
    let value = read_u16(target_ptr);           // <-- current position
    write_u16(target_ptr, value + base_segment);// <-- corrected position
}