Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

XPT File Structure

This page provides a detailed overview of the XPT V5 file structure.

Overall Structure

An XPT file consists of a library (file) level and one or more member (dataset) levels:

graph TB
    subgraph "XPT V5 File"
        LH["Library Header<br/>80 bytes"]
        
        subgraph "Member 1 (Dataset)"
            MH1["Member Header<br/>80 bytes"]
            DC1["DSCRPTR Header<br/>80 bytes"]
            DD1["Dataset Descriptor<br/>160 bytes"]
            NSH1["NAMESTR Header<br/>80 bytes"]
            NS1["NAMESTR Records<br/>140 bytes × n"]
            OH1["OBS Header<br/>80 bytes"]
            OBS1["Observation Data"]
        end
        
        subgraph "Member 2 (Optional)"
            MH2["Member Header"]
            MORE2["..."]
        end
        
        LH --> MH1
        MH1 --> DC1
        DC1 --> DD1
        DD1 --> NSH1
        NSH1 --> NS1
        NS1 --> OH1
        OH1 --> OBS1
        OBS1 --> MH2
        MH2 --> MORE2
    end

Header Records

All headers are exactly 80 bytes with a distinctive pattern:

HEADER RECORD*******<type> HEADER RECORD!!!!!!!<numbers>

Library Header

HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000  

This header identifies the file as an XPT transport file.

Member Header

HEADER RECORD*******MEMBER  HEADER RECORD!!!!!!!000000000000000001600000000140  

The numbers indicate:

  • 00000016 (hex) = 22 bytes for version information
  • 00000140 (decimal) = 140 bytes per NAMESTR record

DSCRPTR Header

HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000  

Introduces the dataset descriptor records.

NAMESTR Header

HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000000000000000000000000  

The variable count is embedded in positions 54-57.

OBS Header

HEADER RECORD*******OBS     HEADER RECORD!!!!!!!000000000000000000000000000000  

Introduces the observation data section.

Dataset Descriptor

The dataset descriptor spans two 80-byte records (160 bytes total):

First Record (80 bytes)

OffsetSizeFieldExample
0-78sas1SAS
8-158sas2SAS
16-238saslibSASLIB
24-318version9.4
32-398osX64_10HO
40-478blanks
48-6316created01JAN24:00:00:00
64-7916modified01JAN24:00:00:00

Second Record (80 bytes)

OffsetSizeFieldExample
0-78dsnameAE
8-158sasdataSASDATA
16-238version9.4
24-318osX64_10HO
32-398blanks
40-7940labelAdverse Events

NAMESTR Section

After the NAMESTR header, each variable is described by a 140-byte NAMESTR record:

graph LR
    subgraph "NAMESTR Layout (140 bytes)"
        A["Type Info<br/>0-7"] --> B["Name<br/>8-15"]
        B --> C["Label<br/>16-55"]
        C --> D["Format<br/>56-69"]
        D --> E["Informat<br/>72-83"]
        E --> F["Position<br/>84-87"]
        F --> G["Reserved<br/>88-139"]
    end

See NAMESTR Records for the complete byte-by-byte layout.

NAMESTR Padding

NAMESTR records are packed into 80-byte physical records. Since 140 bytes doesn’t divide evenly into 80:

  • 5 NAMESTRs = 700 bytes = 8.75 records → pad to 720 bytes (9 records)
  • Formula: ceil(n_vars * 140 / 80) * 80

Observation Data

After the OBS header, data is stored in row-major order:

[Row 1]──[Var 1][Var 2][Var 3]...[Var N]
[Row 2]──[Var 1][Var 2][Var 3]...[Var N]
...
[Row M]──[Var 1][Var 2][Var 3]...[Var N]
[Padding to 80-byte boundary]

Row Length Calculation

#![allow(unused)]
fn main() {
fn row_length(variables: &[Variable]) -> usize {
    variables.iter().map(|v| {
        if v.is_numeric() {
            8  // Always 8 bytes for numerics
        } else {
            v.length  // 1-200 bytes for characters
        }
    }).sum()
}
}

End-of-File Padding

The file ends with space padding (0x20) to reach an 80-byte boundary.

Byte Order

All multi-byte integers are big-endian:

#![allow(unused)]
fn main() {
// Reading a 16-bit integer from XPT
let value = i16::from_be_bytes([bytes[0], bytes[1]]);

// Writing a 16-bit integer to XPT
let bytes = value.to_be_bytes();
}

Character Encoding

RegionEncodingNotes
FDA (US)ASCIIRequired
PMDA (Japan)Shift-JISExtended
GeneralLatin-1Common

[!IMPORTANT] For FDA submissions, use ASCII only. xportrs validates this when Agency::FDA is specified.

Example File (Hex Dump)

00000000: 4845 4144 4552 2052 4543 4f52 442a 2a2a  HEADER RECORD***
00000010: 2a2a 2a2a 4c49 4252 4152 5920 4845 4144  ****LIBRARY HEAD
00000020: 4552 2052 4543 4f52 4421 2121 2121 2121  ER RECORD!!!!!!!
00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 2020  00000000000000  

Multi-Member Files

XPT files can contain multiple datasets (members). Each member has its own:

  • Member header
  • Dataset descriptor
  • NAMESTR section
  • Observation data
#![allow(unused)]
fn main() {
use xportrs::Xpt;

// Reading all members
let datasets = Xpt::read_all("multi.xpt")?;
for ds in datasets {
    println!("Dataset: {}", ds.domain_code());
}
}

[!NOTE] For FDA submissions, it’s common practice to use one dataset per file, but the format supports multiple.