MZ-Executable | DOS Executable
MZ Executable image is a type of DOS program which came since PC-DOS 1.0 was released.
PC-DOS .EXE
-program always starts with MZ
or ZM
(ASCII WORD
) signature.
This is a main sign for image detection.
Every DOS program marked as .EXE
cutable binary has two critical important
parts for OS loader.
MZ Header
Here is an structure written on Rust language for example.
struct MzHeader{
// Standard "DOS Header"
pub e_sign: Lu16, // ZM or MZ
pub e_cblp: Lu16, // Last Block (*)
pub e_cp: Lu16, // Count of pages/blocks (*)
pub e_relc: Lu16, // Count of relocations (*)
pub e_cparhdr: Lu16, // Header's size (in blocks) (*)
pub e_minep: Lu16, // Minimum required RAM in blocks
pub e_maxep: Lu16, // Required by program RAM in blocks
pub ss: Lu16, // (I8086) Stack Segment (*)
pub sp: Lu16, // (I8086) Stack Pointer (*)
pub e_check: Lu16, // Checksum of image [ignores]
pub ip: Lu16, // (I8086) Instruction Pointer (*)
pub cs: Lu16, // (I8086) Code Segment (*)
pub e_lfarlc: Lu16, // Relocation table
// "Extended DOS Header" or "MZ-Header"
pub e_ovno: Lu16, // Current overlay's number [ignores]
pub e_res0x1c: [Lu16; 4],// [ignores]
pub e_oemid: Lu16, // [ignores]
pub e_oeminfo: Lu16, // [ignores]
pub e_res_0x28: [Lu16; 10], // [ignores]
pub e_lfanew: Lu32, // MS-DOS 4.+ application starts to have it non-zero
}
“Clear” DOS images hold e_lfanew
field zero. Sunflower bases on it to define
executable image as true MS-DOS images.
All (*)
marked values not takes by DOS loader as it holds in data structure!
MS-DOS takes those values and computes actually system information about program.
Also all [ignores]
-marked values are ignores by PC-DOS and MS-DOS loaders.
Overlays
Overlays for DOS programs also have MZ-Header but fields, which ignores by loader
and e_lfanew
field equals zero.
The “Perfect Binary” as a concept means binary image, which holds on right optional (ignoring by loader) information.
If e_ovno
field has non-zero value in main .EXE
part,
this may warn you about malware victim or special corrupted image. (???)
So, how DOS defines overlays and manipulates them?
...
...
...
; Allocate overlay memory
mov bx,1000h ; 64 KB (4096 blocks)
mov ah,48h ; 48h = allocate block
int 21h
jc __error ; if allocation error
mov pars,ax ; address
mov pars+2,ax ; overlay segment
; entry point segment
mov word ptr entry+2,ax
mov stkseg,ss
mov stkptr,sp
mov ax,ds ; ES = DS
mov es,ax
mov dx,offset oname ; DS:DX = filename
mov bx,offset pars ; ES:BX = parameter block
; Special DOS API call here
mov ax,4b03h ; <-- EXEC 0x03
int 21h
mov ax,_DATA ; new "_DATA segment"
mov ds,ax
mov es,ax
cli
mov ss,stkseg ; restore .STACK
mov sp,stkptr
sti
jc __error
; overwise no errors
push ds ; save data to segment
; Calling overlay
; call of overlay's API always FAR
call dword ptr entry
pop ds ; restore .DATA
...
...
...
oname db 'OVERLAY.OVL',0 ; filename
pars dw 0 ; segment address
dw 0 ; relocations
entry dd 0 ; overlay's API (entry point)
stkseg dw 0 ; save Stack Segment
stkptr dw 0 ; save Stack Pointer
MS-DOS represents EXEC
function for loading programs in memory. This function has 0x4B
ordinal.
If BYTE
parameter for EXEC
will be zero (means AL=0
), function will load independent program
module instead overlay..
If BYTE
parameter for EXEC
will be 3 (means AL=0x3
) then function will load overlay part.
Overlay is not program! This is a part of program module, that’s why MS-DOS skips .PSP
-segment creating.
MZ Relocation Table
Count of relocations tells e_relc
field and this is a main flag
to correctly read relocations for DOS image.
The relocations have 16:16
format or (more well-known) FAR
pointer format.
struct MZRelocation {
pub r_offset: u16,
pub r_segment: u16,
}
/// Rust pseudo-code for reading relocations
fn read_relocs(...) -> Vec::<MZRelocation> {
let rel_vec: Vec::<MZRelocation> = Vec::<MZRelocation>::new();
reader.seek(e_lfarlc, 0);
if e_lfarlc = 0 {
return rel_vec;
}
for i in 0..e_relc {
let rel: MZRelocation = reader
.read_bytes(4)
.to_struct::<MZRelocation>();
rel_vec.push(rel);
}
return rel_vec;
}
But for real, MZRelocation
datastructure is a comfortable view of
relocations in image. PC-DOS and MS-DOS catch relocations like 16:16
raw value (r_segoff: u32
).
Sunflower uses datastructure way to show raw not prepared by DOS relocations when PC/MS-DOS MZ plugin works.
Loading process
DOS has a special memory blocks for transistent programs loading. Transistent programs are a user applications.
+----------------------------+ <--Transistent segment
| Upper Memory Block / Video | |
+----------------------------+ | Needs for user applications
| | | and programs which system doesn't need
| Program Memory | | when it loads.
| | |
| | <----+
+----------------------------+ <-Resident segment
| (COMMAND.COM) | |
+----------------------------+ |
| MS-DOS (MSDOS.SYS) | | Needs for system parts which must load
+----------------------------+ | when PC starts
| BIOS (IO.SYS) | |
+----------------------------+ |-0x0400
| Interrupt vectors | |
+----------------------------+ <-Begin
Programs which must have more than one segment for code or data
starts from MZ
ASCII (means non-terminated ASCII) string.
While program is loading, operating system is setting up
a I8086 registers, based on MZ header.
Firstly, DOS tries to compute basic parameters for next memory management:
- image’s size (
image_size
); - image’s base (
image_base
); - expected (by program) CPU register values;
- relocations (
rel_vec
).
let image_size = (e_cblp = 0) match {
true => e_cp * 512
false => (e_cp - 1) * 512 + e_cblp
}
let image_base = psp_offset + e_cparhdr + 0x10;
DOS never takes raw values from MZ relocations table! Those values points to unknown or zeroed spaces, because PC-DOS/MS-DOS processes those addresses array following next logic
for rel in rel_vec {
let target_ptr = base_address + rel.offset as usize;
let value = read_u16(target_ptr); // <-- current position
write_u16(target_ptr, value + base_segment);// <-- corrected position
}