Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

xportrs

xportrs is a Rust library for reading and writing SAS Transport (XPT) files, the standard format for regulatory submissions to the FDA, PMDA, and other health authorities.

Why xportrs?

Clinical trial data submitted to regulatory agencies must be in XPT V5 format. While SAS has traditionally been the tool of choice, modern data pipelines increasingly use Python, R, and Rust. xportrs provides:

  • Full CDISC/FDA compliance — Correct NAMESTR structure, IBM floating-point encoding, and metadata handling
  • Type safety — Rust’s type system prevents common errors at compile time
  • Performance — Zero-copy parsing where possible, efficient memory usage
  • Validation — Built-in checks for FDA, PMDA, and NMPA requirements

Quick Example

use xportrs::{Column, ColumnData, Dataset, Format, Xpt};
fn main() -> xportrs::Result<()> {
// Create a dataset with full CDISC metadata
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
    Column::new("STUDYID", ColumnData::String(vec![Some("ABC123".into())]))
        .with_label("Study Identifier")
        .with_format(Format::character(20)),

    Column::new("USUBJID", ColumnData::String(vec![Some("ABC123-001".into())]))
        .with_label("Unique Subject Identifier")
        .with_format(Format::character(40)),

    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)]))
        .with_label("Sequence Number")
        .with_format(Format::numeric(8, 0)),
])?;

// Write with FDA validation
Xpt::writer(dataset)
    .agency(xportrs::Agency::FDA)
    .finalize()?
    .write_path("ae.xpt")?;
Ok(())
}

Compliance Matrix

RequirementStatusImplementation
Variable names ≤8 bytes, uppercaseValidated
Variable labels ≤40 bytesValidated
Dataset names ≤8 bytesValidated
Character length 1–200 bytesValidated
Numeric = 8 bytes IBM floatEnforced
ASCII-only for FDAAgency rules
File splitting at 5GBAutomatic
SAS epoch (1960) datesHandled

Installation

Add to your Cargo.toml:

[dependencies]
xportrs = "0.0.8"

Next Steps

Quick Start

Get up and running with xportrs in 5 minutes.

Installation

Add xportrs to your Cargo.toml:

[dependencies]
xportrs = "0.0.8"

Reading an XPT File

use xportrs::Xpt;

fn main() -> xportrs::Result<()> {
    // Read an XPT file
    let dataset = Xpt::read("ae.xpt")?;
    
    // Basic info
    println!("Domain: {}", dataset.domain_code());
    println!("Rows: {}", dataset.nrows());
    println!("Columns: {}", dataset.ncols());
    
    // List columns
    for col in dataset.columns() {
        println!("  - {}", col.name());
    }
    
    Ok(())
}

Creating a Dataset

use xportrs::{Column, ColumnData, Dataset};

fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
    Column::new("USUBJID", ColumnData::String(vec![
        Some("001".into()),
        Some("002".into()),
        Some("003".into()),
    ])),
    Column::new("AESEQ", ColumnData::F64(vec![
        Some(1.0),
        Some(1.0),
        Some(2.0),
    ])),
    Column::new("AETERM", ColumnData::String(vec![
        Some("HEADACHE".into()),
        Some("NAUSEA".into()),
        Some("FATIGUE".into()),
    ])),
])?;

println!("Created {} with {} rows", dataset.domain_code(), dataset.nrows());
Ok(())
}

Writing an XPT File

use xportrs::{Column, ColumnData, Dataset, Xpt};

fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;

// Write to file
Xpt::writer(dataset)
    .finalize()?
    .write_path("ae_output.xpt")?;

println!("Wrote ae_output.xpt");
Ok(())
}

Adding Metadata

For regulatory submissions, include metadata:

use xportrs::{Column, ColumnData, Dataset, Format, Xpt};

fn main() -> xportrs::Result<()> {
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())]))
        .with_label("Unique Subject Identifier")
        .with_format(Format::character(40)),
    
    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)]))
        .with_label("Sequence Number")
        .with_format(Format::numeric(8, 0)),
    
    Column::new("AETERM", ColumnData::String(vec![Some("HEADACHE".into())]))
        .with_label("Reported Term for the Adverse Event")
        .with_format(Format::character(200))
        .with_length(200),
])?;

Xpt::writer(dataset)
    .finalize()?
    .write_path("ae_metadata.xpt")?;
Ok(())
}

FDA Validation

Validate for FDA submission:

use xportrs::{Agency, Column, ColumnData, Dataset, Xpt};

fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
])?;
let validated = Xpt::writer(dataset)
    .agency(Agency::FDA)
    .finalize()?;

// Check for issues
if validated.has_errors() {
    eprintln!("Validation errors:");
    for issue in validated.issues() {
        eprintln!("  {}", issue);
    }
    return Ok(());
}

if validated.has_warnings() {
    println!("Warnings (proceeding anyway):");
    for issue in validated.issues() {
        println!("  {}", issue);
    }
}

validated.write_path("ae.xpt")?;
Ok(())
}

Round-Trip (Read → Modify → Write)

use xportrs::Xpt;

fn main() -> xportrs::Result<()> {
    // Read existing file
    let dataset = Xpt::read("ae.xpt")?;
    
    // Modify (example: add column)
    // dataset.extend([new_column]);
    
    // Write back
    Xpt::writer(dataset)
        .finalize()?
        .write_path("ae_modified.xpt")?;
    
    Ok(())
}

Common Patterns

Using From Conversions

use xportrs::{Column, ColumnData, Dataset};

fn main() -> xportrs::Result<()> {
// Simpler syntax with From implementations
let dataset = Dataset::new("LB", vec![
    Column::new("LBSEQ", vec![1.0, 2.0, 3.0].into()),  // Vec<f64> → ColumnData
    Column::new("LBTEST", vec!["HGB", "WBC", "PLT"].into()),  // Vec<&str> → ColumnData
])?;
Ok(())
}

Accessing Column Data

use xportrs::{ColumnData, Xpt};

fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;

// By name
let col = &dataset["USUBJID"];

// Match on data type
match col.data() {
    ColumnData::String(values) => {
        for (i, val) in values.iter().enumerate() {
            match val {
                Some(s) => println!("Row {}: {}", i, s),
                None => println!("Row {}: <missing>", i),
            }
        }
    }
    ColumnData::F64(values) => {
        for (i, val) in values.iter().enumerate() {
            match val {
                Some(n) => println!("Row {}: {}", i, n),
                None => println!("Row {}: <missing>", i),
            }
        }
    }
    _ => {}
}
Ok(())
}

Handling Errors

use xportrs::{Error, Xpt};

match Xpt::read("missing.xpt") {
    Ok(dataset) => println!("Loaded"),
    Err(Error::Io(e)) => eprintln!("File error: {}", e),
    Err(e) => eprintln!("Error: {}", e),
}

Next Steps

FDA Submission Workflow

This guide walks through creating FDA-compliant XPT files for regulatory submissions.

Prerequisites

  • Understanding of CDISC SDTM/ADaM standards
  • Access to define.xml for your study
  • Clinical trial data in a structured format

Step 1: Design Your Dataset

Plan your dataset structure based on SDTM/ADaM:

// Example: Adverse Events (AE) domain
// Required SDTM variables: STUDYID, DOMAIN, USUBJID, AESEQ, AETERM, ...

use xportrs::{Column, ColumnData, Dataset, Format, VariableRole};

Step 2: Create the Dataset with Full Metadata

use xportrs::{Column, ColumnData, Dataset, Format, VariableRole};
struct YourDataSource { studyid: Vec<Option<String>>, usubjid: Vec<Option<String>>, aeseq: Vec<Option<f64>>, aeterm: Vec<Option<String>>, aedecod: Vec<Option<String>>, aesev: Vec<Option<String>>, aestdtc: Vec<Option<String>>, aeendtc: Vec<Option<String>> }
impl YourDataSource { fn len(&self) -> usize { self.studyid.len() } }
fn create_ae_dataset(data: &YourDataSource) -> xportrs::Result<Dataset> {
    let dataset = Dataset::with_label("AE", "Adverse Events", vec![
        // Identifier variables
        Column::with_role(
            "STUDYID",
            VariableRole::Identifier,
            ColumnData::String(data.studyid.clone()),
        )
            .with_label("Study Identifier")
            .with_format(Format::character(20)),
        Column::new("DOMAIN", ColumnData::String(
            vec![Some("AE".into()); data.len()]
        ))
            .with_label("Domain Abbreviation")
            .with_format(Format::character(2))
            .with_length(2),
        Column::with_role(
            "USUBJID",
            VariableRole::Identifier,
            ColumnData::String(data.usubjid.clone()),
        )
            .with_label("Unique Subject Identifier")
            .with_format(Format::character(40)),
        Column::with_role(
            "AESEQ",
            VariableRole::Topic,
            ColumnData::F64(data.aeseq.clone()),
        )
            .with_label("Sequence Number")
            .with_format(Format::numeric(8, 0)),

        // Qualifier variables
        Column::with_role(
            "AETERM",
            VariableRole::Qualifier,
            ColumnData::String(data.aeterm.clone()),
        )
            .with_label("Reported Term for the Adverse Event")
            .with_format(Format::character(200))
            .with_length(200),
        Column::new("AEDECOD", ColumnData::String(data.aedecod.clone()))
            .with_label("Dictionary-Derived Term")
            .with_format(Format::character(200))
            .with_length(200),
        Column::new("AESEV", ColumnData::String(data.aesev.clone()))
            .with_label("Severity/Intensity")
            .with_format(Format::character(10))
            .with_length(10),

        // Timing variables
        Column::with_role(
            "AESTDTC",
            VariableRole::Timing,
            ColumnData::String(data.aestdtc.clone()),
        )
            .with_label("Start Date/Time of Adverse Event")
            .with_format(Format::character(19))
            .with_length(19),
        Column::new("AEENDTC", ColumnData::String(data.aeendtc.clone()))
            .with_label("End Date/Time of Adverse Event")
            .with_format(Format::character(19))
            .with_length(19),
    ])?;

    Ok(dataset)
}

Step 3: Validate for FDA Compliance

use xportrs::{Agency, Dataset, Severity, Xpt};
fn validate_for_fda(dataset: Dataset) -> xportrs::Result<xportrs::ValidatedWrite> {
    let validated = Xpt::writer(dataset)
        .agency(Agency::FDA)
        .finalize()?;

    // Report all issues
    println!("Validation Results:");
    println!("  Errors: {}", validated.issues().iter()
        .filter(|i| i.severity() == Severity::Error).count());
    println!("  Warnings: {}", validated.issues().iter()
        .filter(|i| i.severity() == Severity::Warning).count());

    // Detail issues
    for issue in validated.issues() {
        let prefix = match issue.severity() {
            Severity::Error => "ERROR",
            Severity::Warning => "WARN",
            Severity::Info => "INFO",
        };
        println!("  [{}] {}: {}", prefix, issue.target(), issue);
    }

    // Fail on errors
    if validated.has_errors() {
        return Err(xportrs::Error::invalid_data(
            "FDA validation failed with errors"
        ));
    }

    Ok(validated)
}

Step 4: Write the XPT File

use std::path::Path;
fn write_submission_file(
    validated: xportrs::ValidatedWrite,
    output_dir: &Path,
) -> xportrs::Result<()> {
    let output_path = output_dir.join("ae.xpt");

    // Write (may split if >5GB)
    let paths = validated.write_path(&output_path)?;

    for path in &paths {
        println!("Wrote: {}", path.display());

        // Verify file size
        let size = std::fs::metadata(path)?.len();
        println!("  Size: {} bytes ({:.2} GB)",
                 size, size as f64 / 1_073_741_824.0);
    }

    Ok(())
}

Step 5: Verify the Output

use xportrs::Xpt;
fn verify_output(path: &str) -> xportrs::Result<()> {
    // Read back
    let dataset = Xpt::read(path)?;

    // Verify structure
    println!("\nVerification:");
    println!("  Domain: {}", dataset.domain_code());
    println!("  Label: {:?}", dataset.dataset_label());
    println!("  Rows: {}", dataset.nrows());
    println!("  Columns: {}", dataset.ncols());

    // Check metadata preserved
    for col in dataset.columns() {
        print!("  {} ", col.name());
        if col.label().is_some() { print!("[label] "); }
        if col.format().is_some() { print!("[format] "); }
        if col.explicit_length().is_some() { print!("[length] "); }
        println!();
    }

    Ok(())
}

Complete Example

use xportrs::{Agency, Column, ColumnData, Dataset, Format, Severity, Xpt};
use std::path::PathBuf;
fn main() -> xportrs::Result<()> {
fn create_submission() -> xportrs::Result<()> {
    // 1. Create dataset
    let dataset = Dataset::with_label("AE", "Adverse Events", vec![
        Column::new("STUDYID", ColumnData::String(vec![
            Some("ABC-123".into()),
            Some("ABC-123".into()),
        ]))
            .with_label("Study Identifier")
            .with_format(Format::character(20)),
        Column::new("DOMAIN", ColumnData::String(vec![
            Some("AE".into()),
            Some("AE".into()),
        ]))
            .with_label("Domain Abbreviation")
            .with_format(Format::character(2)),
        Column::new("USUBJID", ColumnData::String(vec![
            Some("ABC-123-001".into()),
            Some("ABC-123-002".into()),
        ]))
            .with_label("Unique Subject Identifier")
            .with_format(Format::character(40)),
        Column::new("AESEQ", ColumnData::F64(vec![
            Some(1.0),
            Some(1.0),
        ]))
            .with_label("Sequence Number"),
        Column::new("AETERM", ColumnData::String(vec![
            Some("HEADACHE".into()),
            Some("NAUSEA".into()),
        ]))
            .with_label("Reported Term for the Adverse Event")
            .with_format(Format::character(200))
            .with_length(200),
        Column::new("AESTDTC", ColumnData::String(vec![
            Some("2024-01-15".into()),
            Some("2024-01-16".into()),
        ]))
            .with_label("Start Date/Time of Adverse Event")
            .with_format(Format::character(19)),
    ])?;

    // 2. Validate for FDA
    let validated = Xpt::writer(dataset)
        .agency(Agency::FDA)
        .finalize()?;

    // 3. Report issues
    if !validated.issues().is_empty() {
        println!("Validation Issues:");
        for issue in validated.issues() {
            println!("  [{}] {}", issue.severity(), issue);
        }
    }

    // 4. Check for blocking errors
    if validated.has_errors() {
        eprintln!("Cannot proceed due to validation errors");
        return Err(xportrs::Error::invalid_data("Validation failed"));
    }

    // 5. Write file
    let output = PathBuf::from("output/ae.xpt");
    std::fs::create_dir_all(output.parent().unwrap())?;
    validated.write_path(&output)?;

    // 6. Verify
    let loaded = Xpt::read(&output)?;
    assert_eq!(loaded.domain_code(), "AE");
    assert_eq!(loaded.nrows(), 2);

    println!("\nSuccessfully created ae.xpt for FDA submission");

    Ok(())
}
Ok(())
}

Checklist

Before submission, verify:

  • Dataset name ≤8 characters, uppercase
  • Variable names ≤8 characters, uppercase
  • Variable labels ≤40 characters, ASCII only
  • Character variables ≤200 bytes
  • All variables have labels
  • Dataset has a label
  • File size ≤5GB (or properly split)
  • Pinnacle 21 validation passed
  • Labels match define.xml

Common Issues

Missing Labels

[WARN] MYVAR: Variable 'MYVAR' is missing a label

Fix: Add .with_label("...") to all columns.

Non-ASCII Characters

[ERROR] AETERM: Variable label contains non-ASCII characters

Fix: Replace accented characters (é→e, ñ→n) and special symbols.

Variable Name Too Long

[ERROR] MYLONGNAME: Variable name exceeds 8 bytes

Fix: Shorten variable names to ≤8 characters.

Next Steps

  • Run Pinnacle 21 validation on generated files
  • Verify define.xml consistency
  • Package with eCTD structure

Read-Modify-Write Workflows

This guide covers common patterns for reading, modifying, and writing XPT files.

Basic Roundtrip

use xportrs::Xpt;
fn basic_roundtrip(input: &str, output: &str) -> xportrs::Result<()> {
    // Read
    let dataset = Xpt::read(input)?;

    // (Modify here if needed)

    // Write
    Xpt::writer(dataset)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

Preserving Metadata

xportrs automatically preserves metadata during roundtrip:

use xportrs::Xpt;
fn verify_metadata_preservation(path: &str) -> xportrs::Result<()> {
    // Read original
    let original = Xpt::read(path)?;

    // Write to temp
    let temp_path = "/tmp/roundtrip.xpt";
    Xpt::writer(original.clone())
        .finalize()?
        .write_path(temp_path)?;

    // Read back
    let reloaded = Xpt::read(temp_path)?;

    // Verify metadata preserved
    assert_eq!(original.domain_code(), reloaded.domain_code());
    assert_eq!(original.dataset_label(), reloaded.dataset_label());

    for (orig_col, new_col) in original.columns().iter()
        .zip(reloaded.columns().iter())
    {
        assert_eq!(orig_col.name(), new_col.name());
        assert_eq!(
            orig_col.label().map(|l| l.to_string()),
            new_col.label().map(|l| l.to_string())
        );
        // Format, length, etc. also preserved
    }

    Ok(())
}

Adding Columns

use xportrs::{Column, ColumnData, Format, Xpt};
fn add_derived_column(input: &str, output: &str) -> xportrs::Result<()> {
    let mut dataset = Xpt::read(input)?;

    // Get row count
    let nrows = dataset.nrows();

    // Create new column
    let new_column = Column::new(
        "DERIVED",
        ColumnData::F64(vec![Some(1.0); nrows]),
    )
        .with_label("Derived Variable")
        .with_format(Format::numeric(8, 0));

    // Add to dataset
    dataset.extend([new_column]);

    // Write
    Xpt::writer(dataset)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

Modifying Column Data

use xportrs::{Column, ColumnData, Xpt};
fn modify_column_data(input: &str, output: &str) -> xportrs::Result<()> {
    let dataset = Xpt::read(input)?;

    // Create modified columns
    let modified_columns: Vec<Column> = dataset.columns().iter()
        .map(|col| {
            if col.name() == "AESEQ" {
                // Modify AESEQ: multiply by 10
                if let ColumnData::F64(values) = col.data() {
                    let new_values: Vec<Option<f64>> = values.iter()
                        .map(|v| v.map(|x| x * 10.0))
                        .collect();

                    let mut new_col = Column::new(col.name(), ColumnData::F64(new_values));

                    // Preserve metadata
                    if let Some(label) = col.label() {
                        new_col = new_col.with_label(label.to_string());
                    }
                    if let Some(format) = col.format() {
                        new_col = new_col.with_format(format.clone());
                    }

                    return new_col;
                }
            }
            col.clone()
        })
        .collect();

    // Create new dataset with modified columns
    let mut new_dataset = xportrs::Dataset::new(
        dataset.domain_code(),
        modified_columns,
    )?;

    if let Some(label) = dataset.dataset_label() {
        new_dataset.set_label(label);
    }

    Xpt::writer(new_dataset)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

Filtering Rows

use xportrs::{Column, ColumnData, Dataset, Xpt};
fn filter_rows(input: &str, output: &str, keep_indices: &[usize]) -> xportrs::Result<()> {
    let dataset = Xpt::read(input)?;

    // Filter each column
    let filtered_columns: Vec<Column> = dataset.columns().iter()
        .map(|col| {
            let filtered_data = match col.data() {
                ColumnData::F64(values) => {
                    let filtered: Vec<_> = keep_indices.iter()
                        .map(|&i| values[i].clone())
                        .collect();
                    ColumnData::F64(filtered)
                }
                ColumnData::String(values) => {
                    let filtered: Vec<_> = keep_indices.iter()
                        .map(|&i| values[i].clone())
                        .collect();
                    ColumnData::String(filtered)
                }
                // Handle other types...
                _ => col.data().clone(),
            };

            let mut new_col = Column::new(col.name(), filtered_data);
            if let Some(label) = col.label() {
                new_col = new_col.with_label(label.to_string());
            }
            if let Some(format) = col.format() {
                new_col = new_col.with_format(format.clone());
            }
            new_col
        })
        .collect();

    let mut filtered_dataset = Dataset::new(
        dataset.domain_code(),
        filtered_columns,
    )?;

    if let Some(label) = dataset.dataset_label() {
        filtered_dataset.set_label(label);
    }

    Xpt::writer(filtered_dataset)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

Merging Datasets

use xportrs::{Column, ColumnData, Dataset, Xpt};
fn merge_datasets(input1: &str, input2: &str, output: &str) -> xportrs::Result<()> {
    let ds1 = Xpt::read(input1)?;
    let ds2 = Xpt::read(input2)?;

    // Verify same structure
    assert_eq!(ds1.ncols(), ds2.ncols(), "Column count mismatch");

    // Concatenate data
    let merged_columns: Vec<Column> = ds1.columns().iter()
        .zip(ds2.columns().iter())
        .map(|(col1, col2)| {
            let merged_data = match (col1.data(), col2.data()) {
                (ColumnData::F64(v1), ColumnData::F64(v2)) => {
                    let mut merged = v1.clone();
                    merged.extend(v2.clone());
                    ColumnData::F64(merged)
                }
                (ColumnData::String(v1), ColumnData::String(v2)) => {
                    let mut merged = v1.clone();
                    merged.extend(v2.clone());
                    ColumnData::String(merged)
                }
                _ => panic!("Type mismatch"),
            };

            let mut col = Column::new(col1.name(), merged_data);
            if let Some(label) = col1.label() {
                col = col.with_label(label.to_string());
            }
            if let Some(format) = col1.format() {
                col = col.with_format(format.clone());
            }
            col
        })
        .collect();

    let mut merged = Dataset::new(ds1.domain_code(), merged_columns)?;
    if let Some(label) = ds1.dataset_label() {
        merged.set_label(label);
    }

    Xpt::writer(merged)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

Updating Labels

use xportrs::{Column, Dataset, Xpt};
use std::collections::HashMap;
fn update_labels(
    input: &str,
    output: &str,
    label_updates: &HashMap<&str, &str>,
) -> xportrs::Result<()> {
    let dataset = Xpt::read(input)?;

    let updated_columns: Vec<Column> = dataset.columns().iter()
        .map(|col| {
            let mut new_col = Column::new(col.name(), col.data().clone());

            // Apply label update if specified
            if let Some(&new_label) = label_updates.get(col.name()) {
                new_col = new_col.with_label(new_label);
            } else if let Some(label) = col.label() {
                new_col = new_col.with_label(label.to_string());
            }

            if let Some(format) = col.format() {
                new_col = new_col.with_format(format.clone());
            }

            new_col
        })
        .collect();

    let mut updated = Dataset::new(dataset.domain_code(), updated_columns)?;
    if let Some(label) = dataset.dataset_label() {
        updated.set_label(label);
    }

    Xpt::writer(updated)
        .finalize()?
        .write_path(output)?;

    Ok(())
}

// Usage
fn main() -> xportrs::Result<()> {
    let mut updates = HashMap::new();
    updates.insert("USUBJID", "Unique Subject Identifier");
    updates.insert("AETERM", "Reported Adverse Event Term");
    update_labels("ae.xpt", "ae_updated.xpt", &updates)
}

Batch Processing

use xportrs::Xpt;
use std::path::Path;
fn process_directory(input_dir: &Path, output_dir: &Path) -> xportrs::Result<()> {
    std::fs::create_dir_all(output_dir)?;

    for entry in std::fs::read_dir(input_dir)? {
        let entry = entry?;
        let path = entry.path();

        if path.extension().map_or(false, |e| e == "xpt") {
            let filename = path.file_name().unwrap();
            let output_path = output_dir.join(filename);

            println!("Processing: {}", path.display());

            let dataset = Xpt::read(&path)?;

            // Process...

            Xpt::writer(dataset)
                .finalize()?
                .write_path(&output_path)?;

            println!("  Wrote: {}", output_path.display());
        }
    }

    Ok(())
}

Error Handling in Roundtrips

use xportrs::{Error, Xpt};
fn safe_roundtrip(input: &str, output: &str) -> Result<(), Box<dyn std::error::Error>> {
    // Read with error handling
    let dataset = match Xpt::read(input) {
        Ok(ds) => ds,
        Err(Error::Io(e)) => {
            eprintln!("Failed to read {}: {}", input, e);
            return Err(e.into());
        }
        Err(e) => return Err(e.into()),
    };

    // Validate
    let validated = Xpt::writer(dataset).finalize()?;

    if validated.has_errors() {
        for issue in validated.issues() {
            eprintln!("Validation error: {}", issue);
        }
        return Err("Validation failed".into());
    }

    // Write
    validated.write_path(output)?;

    // Verify
    let _ = Xpt::read(output)?;

    Ok(())
}

Troubleshooting

This guide covers common issues and their solutions when working with xportrs.

Validation Errors

Variable Name Too Long

[ERROR] MYLONGVARNAME: Variable name exceeds 8 bytes

Cause: XPT V5 limits variable names to 8 bytes.

Solution: Shorten the variable name to ≤8 characters.

use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("MYLONGVARNAME", data.clone());

// Correct
Column::new("MYVAR", data);

Variable Label Too Long

[ERROR] USUBJID: Variable label exceeds 40 bytes

Cause: XPT V5 limits labels to 40 bytes.

Solution: Shorten the label.

use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong (41 characters)
Column::new("VAR", data.clone())
    .with_label("This is a very long label that exceeds 40");

// Correct (40 characters max)
Column::new("VAR", data)
    .with_label("Unique Subject Identifier");

Non-ASCII Characters (FDA)

[ERROR] AETERM: Variable label contains non-ASCII characters

Cause: FDA requires ASCII-only text.

Solution: Replace non-ASCII characters.

use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("VAR", data.clone())
    .with_label("Événement indésirable");

// Correct
Column::new("VAR", data)
    .with_label("Adverse Event");

// Or use a helper function
fn to_ascii(s: &str) -> String {
    s.chars().map(|c| match c {
        'é' | 'è' | 'ê' | 'ë' => 'e',
        'à' | 'â' | 'ä' => 'a',
        // ... more mappings
        c if c.is_ascii() => c,
        _ => '?',
    }).collect()
}

Column Length Mismatch

Error: Column length mismatch: expected 100, got 99

Cause: Columns have different numbers of rows.

Solution: Ensure all columns have the same length.

use xportrs::{Column, ColumnData, Dataset};
// Wrong
Dataset::new("AE", vec![
    Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])),  // 2 rows
    Column::new("B", ColumnData::F64(vec![Some(1.0)])),              // 1 row!
]);

// Correct - same length
Dataset::new("AE", vec![
    Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
    Column::new("B", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
]);

Warnings

Missing Variable Label

[WARN] MYVAR: Variable 'MYVAR' is missing a label

Cause: Variable has no label defined.

Solution: Add a label.

use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
Column::new("MYVAR", data)
    .with_label("My Variable Description");

Missing Dataset Label

[WARN] AE: Dataset is missing a label

Cause: Dataset has no label defined.

Solution: Use with_label or set_label.

use xportrs::{Column, ColumnData, Dataset};
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// At construction
Dataset::with_label("AE", "Adverse Events", columns.clone());

// Or after
let mut ds = Dataset::new("AE", columns)?;
ds.set_label("Adverse Events");

Reading Errors

File Not Found

Error: No such file or directory (os error 2)

Solution: Verify the file path exists.

use std::path::Path;

let path = "data.xpt";
if !Path::new(path).exists() {
    eprintln!("File not found: {}", path);
}

Invalid XPT Format

Error: Invalid header record

Cause: File is not a valid XPT V5 file.

Solution: Verify the file:

  • Check it’s an XPT file (not XPT V8, SAS7BDAT, etc.)
  • Ensure it’s not corrupted
  • Verify with hex dump that it starts with HEADER RECORD
# Check file header
xxd -l 80 suspect.xpt

Member Not Found

Error: MemberNotFound { domain_code: "XX" }

Cause: Requested member doesn’t exist in the file.

Solution: Check available members.

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let info = Xpt::inspect("multi.xpt")?;
for name in info.member_names() {
    println!("Available: {}", name);
}
Ok(())
}

Writing Errors

Write Permission Denied

Error: Permission denied (os error 13)

Solution: Check file/directory permissions.

use std::fs;

let dir = "/output";
fs::create_dir_all(dir)?;  // Create if missing

// Check write permission
let test_file = format!("{}/test.tmp", dir);
match fs::write(&test_file, "test") {
    Ok(_) => { fs::remove_file(&test_file)?; }
    Err(e) => eprintln!("Cannot write to {}: {}", dir, e),
}

Disk Full

Error: No space left on device (os error 28)

Solution: Free disk space or write to a different location.

Data Issues

Precision Loss

// Original: 3.141592653589793
// After roundtrip: 3.141592653589792

Cause: IBM floating-point has slightly less precision than IEEE 754.

Solution: For critical values, store as strings or accept minor precision loss (~14-16 digits).

use xportrs::{Column, ColumnData};
// Store as string for exact preservation
Column::new("EXACTVAL", ColumnData::String(vec![
    Some("3.141592653589793".into()),
]));

Missing Values Handling

use xportrs::ColumnData;
let col_data = ColumnData::F64(vec![Some(1.0), None]);
// Check for missing values
if let ColumnData::F64(values) = &col_data {
    for (i, val) in values.iter().enumerate() {
        if val.is_none() {
            println!("Row {} is missing", i);
        }
    }
}

Format Issues

Invalid Format String

Error: Invalid format syntax: "DATE"

Cause: Format string missing trailing period.

Solution: SAS formats end with a period.

use xportrs::Format;
// Wrong
Format::parse("DATE9");

// Correct
Format::parse("DATE9.");

Format Not Preserved

Cause: Format might not be written if name is empty.

Solution: Use named formats.

use xportrs::Format;
// May not be preserved (bare numeric format)
Format::parse("8.2");

// Will be preserved (named format)
Format::parse("BEST12.");
Format::parse("DATE9.");
Format::character(200);

Performance Issues

Slow Reading Large Files

Solution: Use row limiting for previews.

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Preview first 100 rows
let preview = Xpt::reader("large.xpt")
    .row_limit(100)
    .read()?;
Ok(())
}

Memory Usage

Solution: Process in chunks for very large datasets.

use xportrs::{Dataset, Xpt};
fn process(_ds: &Dataset) {}
fn main() -> xportrs::Result<()> {
// Read, process, and release
{
    let dataset = Xpt::read("chunk1.xpt")?;
    process(&dataset);
} // dataset dropped, memory freed

{
    let dataset = Xpt::read("chunk2.xpt")?;
    process(&dataset);
}
Ok(())
}

Pinnacle 21 Validation Failures

SD0063: Label Mismatch

Cause: XPT label doesn’t match define.xml.

Solution: Ensure labels are consistent.

use xportrs::{Column, ColumnData};
let data = ColumnData::String(vec![Some("001".into())]);
// Label should match define.xml exactly
Column::new("USUBJID", data)
    .with_label("Unique Subject Identifier");  // As in define.xml

SD1001: Variable Name Invalid

Cause: Variable name doesn’t follow SAS naming rules.

Solution: Use uppercase, alphanumeric, start with letter.

use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("1stVar", data.clone());   // Starts with number
Column::new("my-var", data.clone());   // Contains hyphen

// Correct
Column::new("FIRSTVAR", data.clone());
Column::new("MYVAR", data);

Getting Help

If you encounter issues not covered here:

  1. Check the API documentation
  2. Review the XPT format specification
  3. Open an issue on GitHub

When reporting issues, include:

  • xportrs version
  • Rust version
  • Minimal code to reproduce
  • Error messages
  • Sample data (if not confidential)

Dataset and Column

The Dataset and Column types are the core data structures in xportrs for representing XPT datasets.

Dataset

A Dataset represents a single SAS dataset (domain) with columns of data.

Creating a Dataset

use xportrs::{Dataset, Column, ColumnData};

fn main() -> xportrs::Result<()> {
// Basic creation
let dataset = Dataset::new("AE", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;

// With dataset label
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;
Ok(())
}

Dataset Properties

// Domain code (dataset name)
let code: &str = dataset.domain_code();

// Dataset label (optional)
let label: Option<&str> = dataset.dataset_label();

// Dimensions
let rows: usize = dataset.nrows();
let cols: usize = dataset.ncols();

// Access columns
let columns: &[Column] = dataset.columns();

Setting the Label

use xportrs::{Dataset, Column, ColumnData};

fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// Using with_label at construction
let dataset = Dataset::with_label("AE", "Adverse Events", columns.clone())?;

// Or set later
let mut dataset = Dataset::new("AE", columns)?;
dataset.set_label("Adverse Events");
Ok(())
}

Accessing Columns

// By index
let first_col: &Column = &dataset[0];

// By name
let usubjid: &Column = &dataset["USUBJID"];

// Find column (returns Option)
let col: Option<&Column> = dataset.column("AESEQ");

Iterating

// Iterate over columns
for col in dataset.iter() {
    println!("{}: {}", col.name(), col.len());
}

// Column names only
for name in dataset.column_names() {
    println!("{}", name);
}

// Consuming iterator
for col in dataset {
    // col is owned Column
}

Extending a Dataset

use xportrs::{Dataset, Column, ColumnData};

fn main() -> xportrs::Result<()> {
let mut dataset = Dataset::new("AE", vec![
    Column::new("A", ColumnData::F64(vec![Some(1.0)])),
])?;

// Add more columns
dataset.extend([
    Column::new("B", ColumnData::F64(vec![Some(2.0)])),
    Column::new("C", ColumnData::F64(vec![Some(3.0)])),
]);

assert_eq!(dataset.ncols(), 3);
Ok(())
}

Column

A Column represents a single variable with its data and metadata.

Creating a Column

use xportrs::{Column, ColumnData, Format, VariableRole};

fn main() {
// Basic column
let col = Column::new("USUBJID", ColumnData::String(vec![
    Some("001".into()),
    Some("002".into()),
]));

// With full metadata
let col = Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
    .with_label("Start Date/Time of Adverse Event")
    .with_format(Format::character(19))
    .with_length(19);

// With role
let col = Column::with_role(
    "USUBJID",
    VariableRole::Identifier,
    ColumnData::String(vec![Some("001".into())]),
);
}

Column Properties

// Name
let name: &str = col.name();

// Label (optional)
let label: Option<&xportrs::Label> = col.label();

// Data
let data: &ColumnData = col.data();

// Length
let len: usize = col.len();

// Explicit length override
let explicit_len: Option<usize> = col.explicit_length();

// Role
let role: Option<VariableRole> = col.role();

// Format
let format: Option<&Format> = col.format();

// Informat
let informat: Option<&Format> = col.informat();

Builder Methods

use xportrs::{Column, ColumnData, Format};

fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("VAR", data)
    .with_label("Variable Label")
    .with_format(Format::numeric(8, 2))
    .with_informat(Format::numeric(8, 2))
    .with_length(200);

// Parse format from string
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("DATE", data)
    .with_format_str("DATE9.")?;
Ok(())
}

ColumnData

ColumnData is an enum representing the typed data within a column.

Variants

use xportrs::ColumnData;

fn main() {
// Floating-point numbers
let floats = ColumnData::F64(vec![Some(1.0), Some(2.0), None]);

// Integers (converted to f64 on write)
let ints = ColumnData::I64(vec![Some(1), Some(2), None]);

// Booleans (converted to f64: 1.0/0.0)
let bools = ColumnData::Bool(vec![Some(true), Some(false), None]);

// Strings
let strings = ColumnData::String(vec![Some("hello".into()), None]);

// Binary data
let bytes = ColumnData::Bytes(vec![Some(vec![0x01, 0x02]), None]);
}

From Conversions

use xportrs::ColumnData;

fn main() {
// From Vec<f64>
let data: ColumnData = vec![1.0, 2.0, 3.0].into();

// From Vec<&str>
let data: ColumnData = vec!["a", "b", "c"].into();

// From Vec<String>
let data: ColumnData = vec!["a".to_string(), "b".to_string()].into();

// From Vec<i64>
let data: ColumnData = vec![1i64, 2, 3].into();

// From Vec<bool>
let data: ColumnData = vec![true, false, true].into();
}

Accessing Data

match col.data() {
    ColumnData::F64(values) => {
        for value in values {
            match value {
                Some(v) => println!("Value: {}", v),
                None => println!("Missing"),
            }
        }
    }
    ColumnData::String(values) => {
        for value in values {
            if let Some(s) = value {
                println!("Value: {}", s);
            }
        }
    }
    // ... handle other variants
    _ => {}
}

Common Traits

Both Dataset and Column implement standard Rust traits:

use xportrs::{Dataset, Column};

// Clone
let dataset2 = dataset.clone();
let col2 = col.clone();

// Debug
println!("{:?}", dataset);
println!("{:?}", col);

// Display
println!("{}", dataset);  // "AE (10 rows, 5 cols)"
println!("{}", col);      // "USUBJID: String[10]"

// PartialEq
assert_eq!(dataset1, dataset2);
assert_eq!(col1, col2);

// Send + Sync (thread-safe)
std::thread::spawn(move || {
    println!("{}", dataset.nrows());
});

Error Handling

Dataset creation can fail:

use xportrs::{Dataset, Column, ColumnData};

fn main() {
// Column length mismatch
let result = Dataset::new("AE", vec![
    Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
    Column::new("B", ColumnData::F64(vec![Some(1.0)])),  // Different length!
]);

match result {
    Ok(ds) => println!("Created dataset"),
    Err(e) => eprintln!("Error: {}", e),
}
}

Example: Complete Dataset

use xportrs::{Column, ColumnData, Dataset, Format, VariableRole, Xpt};

fn create_ae_dataset() -> xportrs::Result<Dataset> {
    let dataset = Dataset::with_label("AE", "Adverse Events", vec![
        Column::with_role(
            "STUDYID",
            VariableRole::Identifier,
            ColumnData::String(vec![Some("ABC-123".into())]),
        )
        .with_label("Study Identifier")
        .with_format(Format::character(20)),

        Column::with_role(
            "USUBJID",
            VariableRole::Identifier,
            ColumnData::String(vec![Some("ABC-123-001".into())]),
        )
        .with_label("Unique Subject Identifier")
        .with_format(Format::character(40)),

        Column::with_role(
            "AESEQ",
            VariableRole::Topic,
            ColumnData::F64(vec![Some(1.0)]),
        )
        .with_label("Sequence Number")
        .with_format(Format::numeric(8, 0)),

        Column::new("AETERM", ColumnData::String(vec![Some("HEADACHE".into())]))
            .with_label("Reported Term for the Adverse Event")
            .with_format(Format::character(200))
            .with_length(200),

        Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
            .with_label("Start Date/Time of Adverse Event")
            .with_format(Format::character(19))
            .with_length(19),
    ])?;

    Ok(dataset)
}
fn main() { let _ = create_ae_dataset(); }

Format Type

The Format type represents a SAS display format or informat. It provides parsing and construction of format specifications.

Overview

SAS formats control how values are displayed or read:

FormatDescriptionExample Output
DATE9.Date format15JAN2024
8.2Numeric with decimals123.45
$CHAR200.Character formatHello World
BEST12.Best numeric representation123456789012

Creating Formats

Parsing from String

use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Date format
let date_fmt = Format::parse("DATE9.")?;
assert_eq!(date_fmt.name(), "DATE");
assert_eq!(date_fmt.length(), 9);

// Numeric format with decimals
let num_fmt = Format::parse("8.2")?;
assert_eq!(num_fmt.name(), "");
assert_eq!(num_fmt.length(), 8);
assert_eq!(num_fmt.decimals(), 2);

// Character format
let char_fmt = Format::parse("$CHAR200.")?;
assert_eq!(char_fmt.name(), "$CHAR");
assert_eq!(char_fmt.length(), 200);
assert!(char_fmt.is_character());
Ok(())
}

Using Constructors

use xportrs::Format;
// Numeric format
let num = Format::numeric(8, 2);
assert_eq!(num.length(), 8);
assert_eq!(num.decimals(), 2);

// Character format
let char_fmt = Format::character(200);
assert_eq!(char_fmt.name(), "$CHAR");
assert_eq!(char_fmt.length(), 200);

From NAMESTR Fields

When reading XPT files, formats are reconstructed from NAMESTR fields:

use xportrs::Format;
// Reconstruct from XPT fields
let format = Format::from_namestr(
    "DATE    ",  // nform (8 bytes, space-padded)
    9,           // nfl (format length)
    0,           // nfd (format decimals)
    1,           // nfj (justification: 0=left, 1=right)
);

assert_eq!(format.name(), "DATE");
assert_eq!(format.length(), 9);

Format Properties

use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("$CHAR200.")?;

// Format name (may include $ prefix)
let name: &str = format.name();  // "$CHAR"

// Name without $ prefix
let stripped: &str = format.name_without_prefix();  // "CHAR"

// Total display width
let length: usize = format.length();  // 200

// Decimal places
let decimals: usize = format.decimals();  // 0

// Is it a character format?
let is_char: bool = format.is_character();  // true

// Display representation
println!("{}", format);  // "$CHAR200."
Ok(())
}

Common Format Patterns

Date Formats

use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Standard date formats
let date9 = Format::parse("DATE9.")?;      // 15JAN2024
let date7 = Format::parse("DATE7.")?;      // 15JAN24
let yymmdd = Format::parse("YYMMDD10.")?;  // 2024-01-15
let e8601 = Format::parse("E8601DA10.")?;  // 2024-01-15
Ok(())
}

DateTime Formats

use xportrs::Format;
fn main() -> xportrs::Result<()> {
let datetime = Format::parse("DATETIME20.")?;  // 15JAN2024:14:30:00
let e8601dt = Format::parse("E8601DT19.")?;    // 2024-01-15T14:30:00
Ok(())
}

Numeric Formats

use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Bare numeric format
let bare = Format::parse("8.")?;    // 8 characters, 0 decimals
let decimal = Format::parse("8.2")?;  // 8 characters, 2 decimals

// Named numeric formats
let best = Format::parse("BEST12.")?;    // Best representation
let comma = Format::parse("COMMA10.2")?; // Comma-separated
Ok(())
}

Character Formats

use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Character formats start with $
let char200 = Format::parse("$CHAR200.")?;
let char40 = Format::parse("$40.")?;  // Shorthand for $CHAR40.
Ok(())
}

Using Formats with Columns

Setting Format on Column

use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
// Using Format object
let col = Column::new("AESTDTC", data.clone())
    .with_format(Format::character(19));

// Parsing from string
let col = Column::new("AESTDT", data.clone())
    .with_format_str("DATE9.")?;

// Using constructor
let col = Column::new("VALUE", data)
    .with_format(Format::numeric(8, 2));
Ok(())
}

Setting Informat

Informats control how data is read:

use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("RAWDATE", data)
    .with_informat(Format::parse("DATE9.")?);
Ok(())
}

Format in XPT Files

When written to XPT, formats are stored in the NAMESTR record:

FieldSizeDescription
nform8 bytesFormat name (space-padded)
nfl2 bytesFormat length
nfd2 bytesFormat decimals
nfj2 bytesJustification (0=left, 1=right)
use xportrs::{Column, ColumnData, Format, Xpt};
fn main() -> xportrs::Result<()> {
let col = Column::new("AESTDT", ColumnData::F64(vec![Some(23391.0)]))
    .with_format_str("DATE9.")?;

// When written, NAMESTR will contain:
// nform = "DATE    "
// nfl = 9
// nfd = 0
// nfj = 1 (right-justified)
Ok(())
}

Format Validation

Invalid format strings return errors:

use xportrs::Format;
// Missing period
let result = Format::parse("DATE9");
assert!(result.is_err());

// Invalid syntax
let result = Format::parse("INVALID");
assert!(result.is_err());

// Empty string
let result = Format::parse("");
assert!(result.is_err());

Display and Debug

use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("DATE9.")?;

// Display: canonical format string
println!("{}", format);  // "DATE9."

// Debug: detailed representation
println!("{:?}", format);  // Format { name: "DATE", length: 9, ... }
Ok(())
}

Common Traits

use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("DATE9.")?;

// Clone
let format2 = format.clone();

// PartialEq
assert_eq!(Format::parse("DATE9.")?, Format::parse("DATE9.")?);

// Debug
println!("{:?}", format);

// Display
println!("{}", format);
Ok(())
}

FDA Format Recommendations

[!TIP] The FDA recommends avoiding custom SAS formats. Use standard formats like DATE9., DATETIME20., or simple numeric formats.

Recommended formats:

TypeRecommended Format
Date (numeric)DATE9.
DateTime (numeric)DATETIME20.
Time (numeric)TIME8.
Numeric8., 8.2
Character$CHAR200., $40.

Avoid:

  • Custom user-defined formats
  • Formats requiring external catalogs
  • Regional-specific formats

Reading XPT Files

xportrs provides multiple ways to read XPT files, from simple one-liners to detailed inspection.

Quick Read

The simplest way to read an XPT file:

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;

println!("Domain: {}", dataset.domain_code());
println!("Rows: {}", dataset.nrows());
println!("Columns: {}", dataset.ncols());
Ok(())
}

Reading Multiple Members

XPT files can contain multiple datasets (members):

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read all members
let datasets = Xpt::read_all("multi.xpt")?;

for dataset in datasets {
    println!("{}: {} rows", dataset.domain_code(), dataset.nrows());
}

// Read specific member
let ae = Xpt::read_member("multi.xpt", "AE")?;
Ok(())
}

Inspecting Files

Get file metadata without loading all data:

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let info = Xpt::inspect("data.xpt")?;

// File timestamps
if let Some(created) = &info.created {
    println!("Created: {}", created);
}

// List members
for name in info.member_names() {
    println!("Member: {}", name);
}

// Find specific member
if let Some(member) = info.find_member("AE") {
    println!("AE has {} variables", member.variables.len());
}
Ok(())
}

Builder API

For more control, use the reader builder:

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::reader("data.xpt")
    .row_limit(1000)     // Read only first 1000 rows
    .read()?;            // Read first/only member
Ok(())
}

Row Limiting

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read only first 100 rows (useful for previews)
let preview = Xpt::reader("large.xpt")
    .row_limit(100)
    .read()?;

println!("Preview: {} rows", preview.nrows());
Ok(())
}

Reading from Buffers

Read from in-memory data:

use std::io::Cursor;
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let xpt_bytes: Vec<u8> = vec![]; // Your XPT data here
let cursor = Cursor::new(xpt_bytes);

let dataset = Xpt::reader_from(cursor).read()?;
Ok(())
}

Accessing Data

Once loaded, access the data through the Dataset API:

use xportrs::{ColumnData, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;

// Access by column name
let usubjid = &dataset["USUBJID"];
let aeseq = &dataset["AESEQ"];

// Iterate over column data
if let ColumnData::String(values) = usubjid.data() {
    for (i, value) in values.iter().enumerate() {
        match value {
            Some(s) => println!("Row {}: {}", i, s),
            None => println!("Row {}: <missing>", i),
        }
    }
}

if let ColumnData::F64(values) = aeseq.data() {
    for (i, value) in values.iter().enumerate() {
        match value {
            Some(v) => println!("Row {}: {}", i, v),
            None => println!("Row {}: <missing>", i),
        }
    }
}
Ok(())
}

Metadata Preservation

xportrs preserves metadata when reading:

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;

// Dataset label
if let Some(label) = dataset.dataset_label() {
    println!("Dataset label: {}", label);
}

// Column metadata
for col in dataset.columns() {
    println!("Variable: {}", col.name());

    if let Some(label) = col.label() {
        println!("  Label: {}", label);
    }

    if let Some(format) = col.format() {
        println!("  Format: {}", format);
    }

    if let Some(len) = col.explicit_length() {
        println!("  Length: {}", len);
    }
}
Ok(())
}

Error Handling

use xportrs::{Error, Xpt};
match Xpt::read("missing.xpt") {
    Ok(dataset) => println!("Loaded {} rows", dataset.nrows()),
    Err(Error::Io(e)) => eprintln!("IO error: {}", e),
    Err(Error::MemberNotFound { domain_code }) => {
        eprintln!("Member not found: {}", domain_code);
    }
    Err(e) => eprintln!("Error: {}", e),
}

Reading Large Files

For large files, consider:

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// 1. Preview first to understand structure
let info = Xpt::inspect("large.xpt")?;
println!("File has {} members", info.members.len());

// 2. Read with row limit for preview
let preview = Xpt::reader("large.xpt")
    .row_limit(100)
    .read()?;

// 3. Read specific columns of interest
let full = Xpt::read("large.xpt")?;
let columns_of_interest = ["USUBJID", "AETERM", "AESTDTC"];
for name in columns_of_interest {
    if let Some(col) = full.column(name) {
        println!("{}: {} values", name, col.len());
    }
}
Ok(())
}

Thread Safety

Datasets are Send + Sync, allowing concurrent access:

use std::sync::Arc;
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Arc::new(Xpt::read("ae.xpt")?);

let handles: Vec<_> = (0..4).map(|i| {
    let ds = Arc::clone(&dataset);
    std::thread::spawn(move || {
        println!("Thread {}: {} rows", i, ds.nrows());
    })
}).collect();

for handle in handles {
    handle.join().unwrap();
}
Ok(())
}

Example: Read and Process

use xportrs::{ColumnData, Xpt};

fn process_adverse_events(path: &str) -> xportrs::Result<()> {
    let dataset = Xpt::read(path)?;

    // Verify expected columns
    let required = ["USUBJID", "AETERM", "AESEV"];
    for name in required {
        if dataset.column(name).is_none() {
            return Err(xportrs::Error::invalid_data(
                format!("Missing required column: {}", name)
            ));
        }
    }

    // Process data
    let usubjid = &dataset["USUBJID"];
    let aeterm = &dataset["AETERM"];
    let aesev = &dataset["AESEV"];

    if let (
        ColumnData::String(subjects),
        ColumnData::String(terms),
        ColumnData::String(severities),
    ) = (usubjid.data(), aeterm.data(), aesev.data()) {
        for i in 0..dataset.nrows() {
            let subj = subjects[i].as_deref().unwrap_or("?");
            let term = terms[i].as_deref().unwrap_or("?");
            let sev = severities[i].as_deref().unwrap_or("?");
            println!("{}: {} ({})", subj, term, sev);
        }
    }

    Ok(())
}

Writing XPT Files

xportrs provides a builder API for writing XPT files with validation.

Basic Writing

The simplest way to write an XPT file:

use xportrs::{Column, ColumnData, Dataset, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
    Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
    Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;

Xpt::writer(dataset)
    .finalize()?
    .write_path("ae.xpt")?;
Ok(())
}

Writer Builder

The writer builder provides options for validation and output:

use xportrs::{Agency, Dataset, Xpt, Column, ColumnData};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset)
    .agency(Agency::FDA)           // Agency-specific validation
    .finalize()?;                  // Validate and prepare

// Check validation results
if validated.has_errors() {
    for issue in validated.issues() {
        eprintln!("{}", issue);
    }
    return Err("Validation failed".into());
}

// Write if valid
validated.write_path("output.xpt")?;
Ok(())
}

Validation Workflow

graph LR
    A[Dataset] --> B[Xpt::writer]
    B --> C[Configure]
    C --> D[finalize]
    D --> E{Valid?}
    E --> |Yes| F[write_path]
    E --> |No| G[Handle Errors]

Checking Issues

use xportrs::{Severity, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;

// Check for any issues
println!("Has errors: {}", validated.has_errors());
println!("Has warnings: {}", validated.has_warnings());

// Get all issues
for issue in validated.issues() {
    match issue.severity() {
        Severity::Error => eprintln!("ERROR: {}", issue),
        Severity::Warning => eprintln!("WARNING: {}", issue),
        Severity::Info => println!("INFO: {}", issue),
    }
}
Ok(())
}

Agency Validation

Different agencies have different requirements:

use xportrs::{Agency, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
// FDA (strict ASCII)
let fda_result = Xpt::writer(dataset.clone())
    .agency(Agency::FDA)
    .finalize()?;

// PMDA (allows extended characters)
let pmda_result = Xpt::writer(dataset.clone())
    .agency(Agency::PMDA)
    .finalize()?;

// NMPA
let nmpa_result = Xpt::writer(dataset)
    .agency(Agency::NMPA)
    .finalize()?;
Ok(())
}

Writing to Different Destinations

Write to File Path

let validated = todo!();
validated.write_path("output.xpt")?;

Write to Buffer

let validated = todo!();
let mut buffer = Vec::new();
validated.write_to(&mut buffer)?;

// buffer now contains the XPT bytes
println!("Wrote {} bytes", buffer.len());

Write to Any Writer

use std::fs::File;
use std::io::BufWriter;
let validated = todo!();
let file = File::create("output.xpt")?;
let mut writer = BufWriter::new(file);
validated.write_to(&mut writer)?;

File Splitting

Large datasets are automatically split:

use xportrs::{Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let large_dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let paths = Xpt::writer(large_dataset)
    .max_file_size_gb(5.0)  // Default is 5.0
    .finalize()?
    .write_path("ae.xpt")?;  // May create ae_001.xpt, ae_002.xpt, etc.

for path in paths {
    println!("Wrote: {}", path.display());
}
Ok(())
}

Complete Example

use xportrs::{Agency, Column, ColumnData, Dataset, Format, Xpt};

fn write_adverse_events() -> xportrs::Result<()> {
    // Create dataset with full metadata
    let dataset = Dataset::with_label("AE", "Adverse Events", vec![
        Column::new("STUDYID", ColumnData::String(vec![
            Some("ABC-123".into()),
            Some("ABC-123".into()),
        ]))
        .with_label("Study Identifier")
        .with_format(Format::character(20)),

        Column::new("USUBJID", ColumnData::String(vec![
            Some("ABC-123-001".into()),
            Some("ABC-123-002".into()),
        ]))
        .with_label("Unique Subject Identifier")
        .with_format(Format::character(40)),

        Column::new("AESEQ", ColumnData::F64(vec![
            Some(1.0),
            Some(1.0),
        ]))
        .with_label("Sequence Number")
        .with_format(Format::numeric(8, 0)),

        Column::new("AETERM", ColumnData::String(vec![
            Some("HEADACHE".into()),
            Some("NAUSEA".into()),
        ]))
        .with_label("Reported Term for the Adverse Event")
        .with_format(Format::character(200))
        .with_length(200),

        Column::new("AESTDTC", ColumnData::String(vec![
            Some("2024-01-15".into()),
            Some("2024-01-16".into()),
        ]))
        .with_label("Start Date/Time of Adverse Event")
        .with_format(Format::character(19))
        .with_length(19),
    ])?;

    // Validate with FDA rules
    let validated = Xpt::writer(dataset)
        .agency(Agency::FDA)
        .finalize()?;

    // Report validation issues
    if validated.has_warnings() {
        println!("Warnings:");
        for issue in validated.issues() {
            if issue.severity() == xportrs::Severity::Warning {
                println!("  - {}", issue);
            }
        }
    }

    if validated.has_errors() {
        eprintln!("Cannot write due to errors:");
        for issue in validated.issues() {
            if issue.severity() == xportrs::Severity::Error {
                eprintln!("  - {}", issue);
            }
        }
        return Err(xportrs::Error::invalid_data("Validation failed"));
    }

    // Write the file
    validated.write_path("ae.xpt")?;
    println!("Successfully wrote ae.xpt");

    Ok(())
}

Error Handling

use xportrs::{Error, Xpt, Dataset, Column, ColumnData};
fn main() {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))]).unwrap();
let result = Xpt::writer(dataset)
    .finalize()
    .and_then(|v| v.write_path("output.xpt"));

match result {
    Ok(paths) => {
        for path in paths {
            println!("Wrote: {}", path.display());
        }
    }
    Err(Error::Io(e)) => eprintln!("IO error: {}", e),
    Err(Error::InvalidSchema { message }) => {
        eprintln!("Schema error: {}", message);
    }
    Err(e) => eprintln!("Error: {}", e),
}
}

Best Practices

[!TIP] Always check validation results before deploying files to production or submission.

  1. Add metadata: Include labels and formats for all variables
  2. Use agency validation: Specify the target agency for appropriate checks
  3. Handle warnings: Review warnings even if they don’t block writing
  4. Test roundtrip: Verify files can be read back correctly
  5. Check file size: Ensure files don’t exceed agency limits
use xportrs::{Agency, Dataset, Error, Xpt};

// Production-ready writing pattern
fn write_submission_file(dataset: Dataset, path: &str) -> xportrs::Result<()> {
    let validated = Xpt::writer(dataset)
        .agency(Agency::FDA)
        .finalize()?;

    // Log all issues
    for issue in validated.issues() {
        log::info!("{}: {}", issue.severity(), issue);
    }

    // Fail on errors
    if validated.has_errors() {
        return Err(Error::invalid_data("Validation errors present"));
    }

    // Write and verify
    let paths = validated.write_path(path)?;

    // Verify by reading back
    for path in &paths {
        let _ = Xpt::read(path)?;
    }

    Ok(())
}

Validation API

xportrs provides comprehensive validation for XPT files. This page details the validation API.

Validation Overview

graph TB
    subgraph "Validation Pipeline"
        A[Dataset] --> B[Agency Rules]
        B --> C[Format Rules]
        C --> D[CDISC Rules]
        D --> E[Issue Collection]
    end
    
    subgraph "Issue Types"
        E --> F[Errors]
        E --> G[Warnings]
        E --> H[Info]
    end

ValidatedWrite

The ValidatedWrite type represents a validated dataset ready for writing:

use xportrs::{Severity, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;

// Check for issues
println!("Has errors: {}", validated.has_errors());
println!("Has warnings: {}", validated.has_warnings());

// Get all issues
let issues = validated.issues();

// Only write if no errors
if !validated.has_errors() {
    validated.write_path("output.xpt")?;
}
Ok(())
}

Issue Type

The Issue enum represents validation issues:

Issue Variants

use xportrs::Issue;
let issue: Issue = todo!();
match issue {
    Issue::VariableNameTooLong { variable, length } => {
        println!("Variable {} name is {} bytes (max 8)", variable, length);
    }
    Issue::VariableLabelTooLong { variable, length } => {
        println!("Variable {} label is {} bytes (max 40)", variable, length);
    }
    Issue::MissingVariableLabel { variable } => {
        println!("Variable {} is missing a label", variable);
    }
    Issue::MissingDatasetLabel { dataset } => {
        println!("Dataset {} is missing a label", dataset);
    }
    Issue::InvalidFormatSyntax { variable, format, reason } => {
        println!("Variable {} has invalid format '{}': {}", variable, format, reason);
    }
    // ... other variants
    _ => {}
}

Issue Properties

use xportrs::{Severity, Issue};
let issue: Issue = todo!();
// Severity level
let severity: Severity = issue.severity();

// Target (variable name, dataset name, etc.)
let target: &str = issue.target();

// Display representation
println!("{}", issue);

// Debug representation
println!("{:?}", issue);

Severity Levels

use xportrs::Severity;
let severity = Severity::Error;
match severity {
    Severity::Error => {
        // Blocks file writing
        // File would be rejected by agency
    }
    Severity::Warning => {
        // Does not block writing
        // Review recommended
    }
    Severity::Info => {
        // Informational only
        // Best practice suggestion
    }
}

// Severity is ordered
assert!(Severity::Info < Severity::Warning);
assert!(Severity::Warning < Severity::Error);

Filtering Issues

use xportrs::{Xpt, Dataset, Column, ColumnData, Severity};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;

// Get only errors
let errors: Vec<_> = validated.issues()
    .iter()
    .filter(|i| i.severity() == Severity::Error)
    .collect();

// Get issues for specific variable
let usubjid_issues: Vec<_> = validated.issues()
    .iter()
    .filter(|i| i.target() == "USUBJID")
    .collect();

// Count by severity
let error_count = validated.issues()
    .iter()
    .filter(|i| i.severity() == Severity::Error)
    .count();
Ok(())
}

Agency-Specific Validation

use xportrs::{Agency, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
// FDA: Strict ASCII validation
let fda_result = Xpt::writer(dataset.clone())
    .agency(Agency::FDA)
    .finalize()?;

// Check for ASCII violations
for issue in fda_result.issues() {
    if format!("{}", issue).contains("ASCII") {
        println!("ASCII issue: {}", issue);
    }
}
Ok(())
}

Validation Rules

Variable Name Rules

RuleSeverityTrigger
Empty nameErrorName is empty string
Name too longErrorName > 8 bytes
Invalid charactersErrorNon-alphanumeric (except _)
Starts with numberErrorFirst char is digit
Non-uppercaseInfoLowercase letters present

Variable Label Rules

RuleSeverityTrigger
Missing labelWarningLabel is None or empty
Label too longErrorLabel > 40 bytes
Non-ASCII (FDA)ErrorNon-ASCII characters

Dataset Rules

RuleSeverityTrigger
Empty nameErrorDomain code is empty
Name too longErrorDomain code > 8 bytes
Missing labelWarningDataset label is None
Label too longErrorLabel > 40 bytes

Data Rules

RuleSeverityTrigger
Column length mismatchErrorColumns have different lengths
Character too longErrorCharacter value > 200 bytes

Custom Pre-Validation

Add custom validation before xportrs validation:

use xportrs::{Dataset, Xpt};

fn custom_validate(dataset: &Dataset) -> Result<(), String> {
    // Check for required variables
    let required = ["STUDYID", "USUBJID"];
    for var in required {
        if dataset.column(var).is_none() {
            return Err(format!("Missing required variable: {}", var));
        }
    }

    // Check STUDYID consistency
    // ... additional checks ...

    Ok(())
}

fn write_with_validation(dataset: Dataset, path: &str) -> xportrs::Result<()> {
    // Custom validation first
    custom_validate(&dataset)
        .map_err(|e| xportrs::Error::invalid_data(e))?;

    // Then xportrs validation
    let validated = Xpt::writer(dataset).finalize()?;

    if validated.has_errors() {
        return Err(xportrs::Error::invalid_data("Validation failed"));
    }

    validated.write_path(path)?;
    Ok(())
}

Validation Reporting

use xportrs::{Severity, Xpt};

fn report_validation(dataset: xportrs::Dataset) {
    let validated = Xpt::writer(dataset).finalize().unwrap();

    // Summary
    let errors = validated.issues().iter()
        .filter(|i| i.severity() == Severity::Error).count();
    let warnings = validated.issues().iter()
        .filter(|i| i.severity() == Severity::Warning).count();
    let infos = validated.issues().iter()
        .filter(|i| i.severity() == Severity::Info).count();

    println!("Validation Summary:");
    println!("  Errors:   {}", errors);
    println!("  Warnings: {}", warnings);
    println!("  Info:     {}", infos);

    // Detailed report
    if !validated.issues().is_empty() {
        println!("\nDetails:");
        for issue in validated.issues() {
            let prefix = match issue.severity() {
                Severity::Error => "ERROR",
                Severity::Warning => "WARN ",
                Severity::Info => "INFO ",
            };
            println!("  [{}] {} - {}", prefix, issue.target(), issue);
        }
    }
}

Integration with Pinnacle 21

[!NOTE] xportrs validation covers XPT-level rules. For complete CDISC validation, use Pinnacle 21 or similar tools.

Validation AreaxportrsPinnacle 21
Variable names
Variable labels
Format metadata
Controlled terminology
Required variables
Cross-dataset consistency
define.xml matching

Best Practices

  1. Validate early: Check validation before processing large datasets
  2. Log all issues: Keep records of validation results
  3. Fail on errors: Don’t write files with validation errors
  4. Review warnings: Warnings may indicate data quality issues
  5. Document exceptions: If shipping with warnings, document why

Metadata

xportrs provides rich metadata support for XPT files, ensuring CDISC compliance and data clarity.

Metadata Overview

graph TB
    subgraph "Dataset Level"
        A[Domain Code] --> B[Dataset Label]
    end
    
    subgraph "Variable Level"
        C[Variable Name] --> D[Variable Label]
        D --> E[Format]
        E --> F[Informat]
        F --> G[Length]
        G --> H[Role]
    end

Dataset Metadata

Domain Code

The domain code is the dataset name (1-8 characters):

use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
let dataset = Dataset::new("AE", columns)?;

// Access domain code
let code: &str = dataset.domain_code();  // "AE"
Ok(())
}

Dataset Label

The dataset label provides a description (0-40 characters):

use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// Set at construction
let dataset = Dataset::with_label("AE", "Adverse Events", columns.clone())?;

// Or set later
let mut dataset = Dataset::new("AE", columns)?;
dataset.set_label("Adverse Events");

// Access
let label: Option<&str> = dataset.dataset_label();
Ok(())
}

Variable Metadata

Variable Name

Variable names follow SAS naming rules:

use xportrs::{Column, ColumnData, VariableName};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
// Name is set at construction
let col = Column::new("USUBJID", data);

// Access name
let name: &str = col.name();

// VariableName type for validation
let var_name = VariableName::new("USUBJID");
assert_eq!(var_name.as_str(), "USUBJID");
}

Variable Label

Labels describe the variable (0-40 characters):

use xportrs::{Column, ColumnData, Label};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
let col = Column::new("USUBJID", data)
    .with_label("Unique Subject Identifier");

// Access label
if let Some(label) = col.label() {
    println!("Label: {}", label);
}

// Label type
let label = Label::new("Unique Subject Identifier");
assert_eq!(label.as_str(), "Unique Subject Identifier");
}

Format

Display formats control how values are shown:

use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
// Using Format object
let col = Column::new("AESTDT", data.clone())
    .with_format(Format::parse("DATE9.")?);

// Using format string
let col = Column::new("AESTDT", data)
    .with_format_str("DATE9.")?;

// Access format
if let Some(format) = col.format() {
    println!("Format: {}", format);
}
Ok(())
}

Informat

Input formats control how values are read:

use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("RAWDATE", data)
    .with_informat(Format::parse("DATE9.")?);

if let Some(informat) = col.informat() {
    println!("Informat: {}", informat);
}
Ok(())
}

Length

Explicit length for character variables:

use xportrs::{Column, ColumnData};
fn main() {
// Auto-derived from data
let col = Column::new("VAR", ColumnData::String(vec![
    Some("Hello".into()),  // 5 characters
    Some("World".into()),  // 5 characters
]));
// Length will be 5

// Explicit override
let data = ColumnData::String(vec![Some("text".into())]);
let col = Column::new("VAR", data)
    .with_length(200);  // Force 200 bytes

// Access
if let Some(len) = col.explicit_length() {
    println!("Explicit length: {}", len);
}
}

Variable Role

Roles categorize variables per CDISC:

use xportrs::{Column, ColumnData, VariableRole};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
let col = Column::with_role(
    "USUBJID",
    VariableRole::Identifier,
    data,
);

// Available roles
let roles = [
    VariableRole::Identifier,
    VariableRole::Topic,
    VariableRole::Timing,
    VariableRole::Qualifier,
    VariableRole::Rule,
    VariableRole::Synonym,
    VariableRole::Record,
];

// Access role
if let Some(role) = col.role() {
    println!("Role: {:?}", role);
}
}

Metadata Types

DomainCode

use xportrs::DomainCode;
fn main() {
let code = DomainCode::new("AE");

// Access
let s: &str = code.as_str();
let code2 = DomainCode::new("AE");
let owned: String = code2.into_inner();

// Traits
assert_eq!(code, DomainCode::new("AE"));
println!("{}", code);  // "AE"
}

Label

use xportrs::Label;
fn main() {
let label = Label::new("Adverse Events");

// Access
let s: &str = label.as_str();
let label2 = Label::new("AE");
let owned: String = label2.into_inner();

// From string
let label: Label = "Test".into();
}

VariableName

use xportrs::VariableName;
fn main() {
let name = VariableName::new("USUBJID");

// Access
let s: &str = name.as_str();
let name2 = VariableName::new("TEST");
let owned: String = name2.into_inner();

// Validation (at construction or later)
// Names are uppercased automatically
let name = VariableName::new("usubjid");
assert_eq!(name.as_str(), "USUBJID");
}

Metadata in XPT Files

NAMESTR Record Storage

Field     Offset  Size  Description
nname     8-15    8     Variable name
nlabel    16-55   40    Variable label
nform     56-63   8     Format name
nfl       64-65   2     Format length
nfd       66-67   2     Format decimals
nfj       68-69   2     Format justification
niform    72-79   8     Informat name
nifl      80-81   2     Informat length
nifd      82-83   2     Informat decimals

Reading Metadata

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;

// Dataset metadata
println!("Domain: {}", dataset.domain_code());
if let Some(label) = dataset.dataset_label() {
    println!("Label: {}", label);
}

// Variable metadata
for col in dataset.columns() {
    println!("\n{}", col.name());
    if let Some(label) = col.label() {
        println!("  Label: {}", label);
    }
    if let Some(format) = col.format() {
        println!("  Format: {}", format);
    }
    if let Some(informat) = col.informat() {
        println!("  Informat: {}", informat);
    }
    if let Some(len) = col.explicit_length() {
        println!("  Length: {}", len);
    }
    if let Some(role) = col.role() {
        println!("  Role: {:?}", role);
    }
}
Ok(())
}

Preserving Metadata on Roundtrip

use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read
let original = Xpt::read("ae.xpt")?;

// Modify (metadata preserved)
// ...

// Write
Xpt::writer(original.clone())
    .finalize()?
    .write_path("ae_modified.xpt")?;

// Verify
let reloaded = Xpt::read("ae_modified.xpt")?;
assert_eq!(reloaded.dataset_label(), original.dataset_label());
Ok(())
}

Metadata and Define-XML

[!IMPORTANT] Variable labels in XPT files should match those in define.xml. Pinnacle 21 validates this consistency.

use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let data = ColumnData::String(vec![Some("test".into())]);
// Create dataset with labels matching define.xml
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
    Column::new("STUDYID", data.clone())
        .with_label("Study Identifier"),  // Must match define.xml
    Column::new("USUBJID", data)
        .with_label("Unique Subject Identifier"),  // Must match define.xml
    // ...
])?;
Ok(())
}

Best Practices

  1. Always include labels: Labels help reviewers understand data
  2. Use standard formats: DATE9., DATETIME20., $CHARn.
  3. Set explicit lengths: Control character variable lengths
  4. Assign roles: Categorize variables per CDISC
  5. Verify roundtrip: Ensure metadata survives read/write cycles
use xportrs::{Column, ColumnData, Format, VariableRole};
// Complete metadata example
let col = Column::with_role(
    "AESTDTC",
    VariableRole::Timing,
    ColumnData::String(vec![Some("2024-01-15".into())]),
)
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19);

Regulatory Compliance Overview

xportrs is designed to produce XPT files that meet the requirements of major regulatory agencies for clinical trial data submissions.

Supported Agencies

AgencyRegionStandardsxportrs Support
FDAUnited StatesCDISC SDTM/ADaMFull validation
PMDAJapanCDISC + J-SDTM extensionsFull validation
NMPAChinaCDISC + local requirementsFull validation
EMAEuropeCDISC SDTM/ADaMFull validation

Key Requirements

All agencies require XPT V5 format files that conform to the SAS Transport specification (TS-140). The key requirements are:

Variable Requirements

graph LR
subgraph "Variable Constraints"
A[Name ≤8 bytes] --> B[Uppercase A-Z, 0-9, _]
B --> C[Must start with letter]
D[Label ≤40 bytes] --> E[ASCII only for FDA]
F[Char length ≤200] --> G[Numeric = 8 bytes]
end

Dataset Requirements

  • Dataset name: 1-8 bytes, uppercase alphanumeric
  • Dataset label: 0-40 bytes (recommended for reviewer clarity)
  • File size: ≤5GB per file (auto-split supported)

Format Requirements

XPT files must use:

Validation Levels

xportrs provides three severity levels for validation issues:

SeverityMeaningExample
ErrorFile will not be acceptedVariable name >8 bytes
WarningReview recommendedMissing variable label
InfoBest practice suggestionNon-standard format

[!IMPORTANT] Only Error severity issues block file writing. Warnings and info messages are advisory.

Agency-Specific Rules

FDA (United States)

The FDA requires strict ASCII compliance for all text:

#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};

let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;

// Check for FDA-specific issues
for issue in validated.issues() {
println ! ("[{}] {}", issue.severity(), issue);
}
}

PMDA (Japan)

PMDA allows Shift-JIS encoding for Japanese text in certain fields:

#![allow(unused)]
fn main() {
use xportrs::{Agency, TextMode, Xpt};

let validated = Xpt::writer(dataset)
.agency(Agency::PMDA)
.text_mode(TextMode::Latin1)  // Extended character support
.finalize() ?;
}

NMPA (China)

NMPA follows CDISC standards with additional local requirements:

#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};

let validated = Xpt::writer(dataset)
.agency(Agency::NMPA)
.finalize() ?;
}

Compliance Verification

Using xportrs Validation

#![allow(unused)]
fn main() {
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;

// Count issues by severity
let errors = validated.issues().iter()
.filter( | i| i.severity() == Severity::Error)
.count();

if errors > 0 {
eprintln ! ("{} blocking errors found", errors);
}
}

External Validation (Pinnacle 21)

After generating XPT files, we recommend validation with Pinnacle 21 Community:

  1. Download from Pinnacle 21
  2. Run validation against your XPT files and define.xml
  3. Review any SD (Study Data) rule violations

[!NOTE] xportrs handles XPT-level compliance. Dataset content validation (controlled terminology, required variables) requires external tools like Pinnacle 21.

Official Sources

CDISC Standards

The Clinical Data Interchange Standards Consortium (CDISC) defines the data models and metadata standards used in clinical trial submissions.

CDISC Data Models

graph TB
    subgraph "CDISC Standards Hierarchy"
        CDASH[CDASH<br/>Data Collection] --> SDTM[SDTM<br/>Tabulation]
        SDTM --> ADaM[ADaM<br/>Analysis]
        ADaM --> TFL[Tables, Figures,<br/>Listings]
    end

    subgraph "Submission Package"
        SDTM --> XPT1[SDTM XPT Files]
        ADaM --> XPT2[ADaM XPT Files]
        XPT1 --> DEFINE[define.xml]
        XPT2 --> DEFINE
    end

SDTM (Study Data Tabulation Model)

SDTM is the standard for organizing clinical trial tabulation data. Each domain (dataset) represents a specific type of data:

DomainDescriptionCommon Variables
DMDemographicsSTUDYID, USUBJID, AGE, SEX
AEAdverse EventsAETERM, AESTDTC, AESEV
CMConcomitant MedicationsCMTRT, CMDOSE
LBLaboratory ResultsLBTESTCD, LBORRES
VSVital SignsVSTESTCD, VSORRES
EXExposureEXTRT, EXDOSE

Creating SDTM Datasets

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Dataset, Format, Xpt};

// Demographics (DM) domain
let dm = Dataset::with_label("DM", "Demographics", vec![
    Column::new("STUDYID", ColumnData::String(vec![Some("ABC-123".into())]))
        .with_label("Study Identifier"),
    Column::new("USUBJID", ColumnData::String(vec![Some("ABC-123-001".into())]))
        .with_label("Unique Subject Identifier"),
    Column::new("AGE", ColumnData::F64(vec![Some(45.0)]))
        .with_label("Age"),
    Column::new("SEX", ColumnData::String(vec![Some("M".into())]))
        .with_label("Sex"),
    Column::new("RACE", ColumnData::String(vec![Some("WHITE".into())]))
        .with_label("Race"),
]) ?;
}

ADaM (Analysis Data Model)

ADaM is the standard for analysis datasets derived from SDTM:

DatasetDescriptionPurpose
ADSLSubject-Level AnalysisOne row per subject
ADAEAdverse Events AnalysisOne row per event
ADLBLaboratory AnalysisDerived lab values
ADTTETime-to-EventSurvival analysis

Creating ADaM Datasets

#![allow(unused)]
fn main() {
// Subject-Level Analysis Dataset (ADSL)
let adsl = Dataset::with_label("ADSL", "Subject Level Analysis", vec![
    Column::new("STUDYID", ColumnData::String(vec![Some("ABC-123".into())]))
        .with_label("Study Identifier"),
    Column::new("USUBJID", ColumnData::String(vec![Some("ABC-123-001".into())]))
        .with_label("Unique Subject Identifier"),
    Column::new("TRT01P", ColumnData::String(vec![Some("DRUG A".into())]))
        .with_label("Planned Treatment for Period 01"),
    Column::new("TRT01A", ColumnData::String(vec![Some("DRUG A".into())]))
        .with_label("Actual Treatment for Period 01"),
    Column::new("SAFFL", ColumnData::String(vec![Some("Y".into())]))
        .with_label("Safety Population Flag"),
]) ?;
}

Variable Metadata

CDISC requires specific metadata for each variable:

Required Metadata

MetadataXPT Fieldxportrs Method
Variable NamennameColumn::new(name, ...)
Variable Labelnlabel.with_label(...)
Variable TypentypeInferred from ColumnData
Display Formatnform.with_format(...)
Variable Lengthnlng.with_length(...)

Example with Full Metadata

#![allow(unused)]
fn main() {
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19)
}

Controlled Terminology

CDISC defines controlled terminology for many variables:

[!WARNING] xportrs does not validate controlled terminology values. Use Pinnacle 21 or similar tools to verify that coded values match CDISC controlled terminology.

Common controlled terminology:

  • AESEV: MILD, MODERATE, SEVERE
  • SEX: M, F, U, UNDIFFERENTIATED
  • RACE: WHITE, BLACK OR AFRICAN AMERICAN, ASIAN, etc.
  • NY (Yes/No): Y, N

SDTM-IG Versions

xportrs supports the latest SDTM-IG metadata requirements:

VersionRelease DateKey Changes
SDTM-IG 3.42023Current recommended
SDTM-IG 3.32021Labels no longer conformance criteria
SDTM-IG 3.22013Legacy support

[!NOTE] As of SDTM-IG 3.3, variable labels are **recommended ** but not required for conformance. However, xportrs still generates warnings for missing labels since they are important for data reviewers.

Define-XML Integration

The define.xml file provides metadata that complements XPT files:

graph LR
    subgraph "Submission Package"
        XPT[XPT Files] -->|" Data "| FDA[FDA Review]
        DEFINE[define.xml] -->|" Metadata "| FDA
        XPT -.->|" Must match "| DEFINE
    end

[!IMPORTANT] Variable labels in XPT files should match those in define.xml. Pinnacle 21 rule SD0063 checks for mismatches.

Resources

FDA Technical Conformance Guide

The FDA Study Data Technical Conformance Guide (TCG) defines requirements for electronic study data submissions. This page covers XPT-specific requirements.

Submission Types

XPT files are required for these FDA submission types:

Submission TypeDescriptionXPT Required
NDANew Drug ApplicationYes
ANDAAbbreviated NDA (Generics)Yes
BLABiologics License ApplicationYes
INDInvestigational New DrugConditional

File Size Requirements

graph LR
    subgraph "File Size Limits"
        A[Single XPT] --> B{">5 GB?"}
        B -->|Yes| C[Split into parts]
        B -->|No| D[Single file OK]
        C --> E[ae_001.xpt<br/>ae_002.xpt<br/>...]
    end

Automatic File Splitting

xportrs automatically handles file splitting:

#![allow(unused)]
fn main() {
use xportrs::Xpt;

// Automatically splits if dataset would exceed 5GB
Xpt::writer(large_dataset)
.max_file_size_gb(5.0)  // Optional, 5.0 is default
.finalize() ?
.write_path("ae.xpt") ?;  // May create ae_001.xpt, ae_002.xpt, ...
}

Character Encoding

[!IMPORTANT] FDA requires **ASCII-only ** characters in variable names and labels. Extended characters may cause validation failures.

ASCII Validation

#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};

let validated = Xpt::writer(dataset)
.agency(Agency::FDA)  // Enforces ASCII validation
.finalize() ?;

// Non-ASCII characters will generate errors
for issue in validated.issues() {
if format ! ("{}", issue).contains("ASCII") {
eprintln ! ("ASCII violation: {}", issue);
}
}
}

Variable Requirements

Naming Conventions

RequirementFDA TCG Sectionxportrs Validation
1-8 charactersSection 4.1.5Error if violated
Uppercase onlySection 4.1.5Auto-converted
Start with letterSection 4.1.5Error if violated
A-Z, 0-9, underscore onlySection 4.1.5Error if violated

Label Requirements

RequirementFDA TCG Sectionxportrs Validation
0-40 charactersSection 4.1.5Error if >40
ASCII onlySection 4.1.5Error if non-ASCII
Recommended for all variablesSection 4.1.5Warning if missing

Numeric Precision

XPT files use IBM floating-point format with specific precision limits:

Data TypeIEEE 754 PrecisionIBM Float PrecisionNotes
IntegerExact to 2^53Exact to ~10^14Safe for IDs
Decimal~15-17 digits~14-16 digitsSlight loss
DateVariesSAS epoch-basedUse date formats

[!NOTE] For maximum precision, consider using the DATE9. or DATETIME20. formats for date/time values rather than storing as plain numerics.

Date Handling

FDA expects dates in specific formats:

ISO 8601 Character Dates

#![allow(unused)]
fn main() {
// Preferred: Store as ISO 8601 character string
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
}

SAS Numeric Dates

#![allow(unused)]
fn main() {
// Alternative: Store as SAS date number
// Days since January 1, 1960
let sas_date = 23_391.0;  // 2024-01-15

Column::new("AESTDT", ColumnData::F64(vec![Some(sas_date)]))
.with_label("Start Date")
.with_format_str("DATE9.") ?
}

Study Data Reviewer’s Guide

FDA recommends including a Reviewer’s Guide with submissions. The guide should reference:

  • Dataset locations and naming conventions
  • Variable naming patterns
  • Any deviations from CDISC standards
  • Data transformation documentation

eCTD Placement

XPT files are placed in specific eCTD module locations:

m5/
├── datasets/
│   ├── tabulations/
│   │   ├── sdtm/
│   │   │   ├── ae.xpt
│   │   │   ├── dm.xpt
│   │   │   └── define.xml
│   │   └── send/  (nonclinical)
│   └── analysis/
│       └── adam/
│           ├── adsl.xpt
│           ├── adae.xpt
│           └── define.xml

Validation Checklist

Before submission, verify:

  • All XPT files ≤5GB (or properly split)
  • Variable names ≤8 characters, uppercase
  • Variable labels ≤40 characters, ASCII only
  • Dataset names ≤8 characters
  • Character variable lengths ≤200 bytes
  • define.xml present and valid
  • Pinnacle 21 validation passed (or issues documented)

Resources

XPT V5 Specification

The XPT V5 format is defined by the SAS Technical Note TS-140. This page provides a comprehensive overview of the format.

Format Overview

XPT V5 (also known as SAS Transport Version 5) is a binary file format with:

graph TB
    subgraph "XPT V5 File Structure"
        LH[Library Header<br/>80 bytes] --> FD[First Dataset]
        FD -->|" More datasets "| ND[Next Dataset...]
    end

subgraph "Dataset Structure"
MH[Member Header<br/>80 bytes] --> DH[DSCRPTR Header<br/>80 bytes]
DH --> DD[Dataset Descriptor<br/>160 bytes]
DD --> NH[NAMESTR Header<br/>80 bytes]
NH --> NR[NAMESTR Records<br/>140 bytes × n]
NR --> OH[OBS Header<br/>80 bytes]
OH --> OD[Observation Data]
end

Library Header

The file begins with a library header identifying the format:

OffsetSizeContent
0-7980HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
#![allow(unused)]
fn main() {
const LIBRARY_HEADER: &[u8; 80] =
    b"HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000  ";
}

Member Header

Each dataset (member) begins with a member header:

OffsetSizeContent
0-7980HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140

The numbers at the end indicate:

  • 00000016 = 16 bytes for dataset descriptor (hex)
  • 0000014 = 140 bytes per NAMESTR record (decimal)

Dataset Descriptor

The dataset descriptor contains:

OffsetSizeFieldDescription
0-78SASSAS
8-158SASSAS
16-238SASLIBSASLIB
24-318Version9.4
32-398OSOperating system
40-478BlanksPadding
48-6316CreatedddMMMyy:hh:mm:ss
64-7916ModifiedddMMMyy:hh:mm:ss

Second Descriptor Record

OffsetSizeFieldDescription
0-78DSNAMEDataset name
8-158SASDATASASDATA
16-238Version9.4
24-318OSOperating system
32-398BlanksPadding
40-7940LabelDataset label

NAMESTR Records

The NAMESTR header introduces the variable metadata:

OffsetSizeContent
0-5354HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!
54-574Number of variables (zero-padded)
58-7922Padding

Each variable is described by a 140-byte NAMESTR record. See NAMESTR Records for detailed byte layout.

Observation Data

The observation header introduces the data:

OffsetSizeContent
0-7980HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000

After this, raw observation data follows in row-major order:

[Row 1: Var1][Row 1: Var2]...[Row 1: VarN]
[Row 2: Var1][Row 2: Var2]...[Row 2: VarN]
...

Numeric Variables

All numeric variables are stored as 8-byte IBM floating-point:

  • 8 bytes per value
  • Big-endian byte order
  • IBM base-16 exponent (not IEEE 754)

Character Variables

Character variables are stored as fixed-width text:

  • 1-200 bytes per value (as defined in NAMESTR)
  • Space-padded on the right
  • No null terminators

Missing Values

TypeEncoding
Numeric missing (.)0x2E in first byte, zeros elsewhere
Numeric missing (.A-.Z)0x41-0x5A in first byte
Character missingAll spaces

Record Padding

XPT uses 80-byte record alignment:

  • NAMESTR records: 140 bytes (not aligned)
  • Multiple NAMESTRs fill to 80-byte boundary
  • Observation rows: variable length (row_length × n)
  • File ends with space padding to 80 bytes

Version Differences

FeatureV5 (TS-140)V8+
Variable name length8 bytes32 bytes
Label length40 bytes256 bytes
Number encodingIBM floatIEEE 754
Max observations~2 billionUnlimited
Regulatory supportFDA/PMDA/NMPALimited

[!IMPORTANT] For regulatory submissions, only V5 format is accepted. xportrs focuses on V5 compliance.

Official Specification

The authoritative source for XPT V5 format is:

SAS Technical Note TS-140: Record Layout of a SAS Version 5 or 6 Data Set in SAS Transport (XPORT) Format

Download PDF | View on SAS Support

Format Family

The Library of Congress maintains format documentation:

XPT V8/V9 Specification

The XPT V8/V9 format extends the original V5 format with support for longer variable names and labels.

[!WARNING] XPT V8/V9 format is not accepted for FDA regulatory submissions. For regulatory submissions, use XPT V5 format only.

Key Differences from V5

FeatureV5 (TS-140)V8/V9
Variable name length8 bytes32 bytes
Label length40 bytes256 bytes
Number encodingIBM floatIEEE 754
Max observations~2 billionUnlimited
Regulatory supportFDA/PMDA/NMPANot accepted

Format Overview

XPT V8/V9 maintains the same basic structure as V5:

  • 80-byte records for headers
  • Big-endian byte order
  • Fixed-width text fields (space-padded)

However, it differs in:

  • Variable names: Extended from 8 to 32 characters
  • Labels: Extended from 40 to 256 characters
  • Numeric encoding: Uses IEEE 754 instead of IBM floating-point

Use Cases

V8/V9 format may be appropriate for:

  • Internal data storage where longer names improve readability
  • Non-regulatory data exchange between systems
  • Archival purposes where V5 limitations are problematic
  • Academic or research datasets not intended for regulatory submission

Regulatory Considerations

FDA Submissions

The FDA Data Standards Catalog explicitly requires XPT V5 format. Files in V8/V9 format will be rejected during technical validation.

CDISC Standards

CDISC standards (SDTM, ADaM) are designed around V5 limitations:

  • Variable names: 8 characters maximum
  • Labels: 40 characters maximum

Using V8/V9 format with CDISC data defeats the purpose of standardization.

Best Practice

If your data requires longer names or labels:

  1. Use V5-compliant short names in the XPT file
  2. Document full names in define.xml metadata
  3. Use controlled terminology for consistency

Official Specification

SAS Technical Note: Record Layout of a SAS Version 8 or 9 Data Set in SAS Transport Format

Download PDF | View on SAS Support

xportrs Support

xportrs currently focuses on V5 format for regulatory compliance. V8/V9 support is not a priority as it cannot be used for regulatory submissions.

If you need V8/V9 support for non-regulatory purposes, please open an issue to discuss your use case.

Validation Rules

xportrs provides built-in validation to catch compliance issues before file writing. This page documents the validation rules and their severity levels.

Validation Overview

graph LR
    subgraph "Validation Pipeline"
        A[Dataset] --> B[Agency Rules]
        B --> C[V5 Format Rules]
        C --> D[Issues Collection]
        D --> E{Has Errors?}
        E -->|Yes| F[Block Write]
        E -->|No| G[Allow Write]
    end

Severity Levels

SeverityMeaningBlocks Write?
ErrorFile would be rejectedYes
WarningReview recommendedNo
InfoBest practice suggestionNo

Built-in Validation Rules

Variable Name Rules

RuleSeverityMessage
Name emptyError“Variable name cannot be empty”
Name >8 bytesError“Variable name exceeds 8 bytes”
Invalid charactersError“Variable name contains invalid characters”
Starts with numberError“Variable name must start with a letter”

Variable Label Rules

RuleSeverityMessage
Label missingWarning“Variable ‘X’ is missing a label”
Label >40 bytesError“Variable label exceeds 40 bytes”
Non-ASCII (FDA)Error“Variable label contains non-ASCII characters”

Dataset Rules

RuleSeverityMessage
Name emptyErrorDataset name cannot be empty”
Name >8 bytesError“Dataset name exceeds 8 bytes”
Label missingWarning“Dataset is missing a label”
Label >40 bytesError“Dataset label exceeds 40 bytes”

Data Rules

RuleSeverityMessage
Column length mismatchError“Columns have different lengths”
Character >200 bytesError“Character value exceeds 200 bytes”

Using Validation

Basic Validation

#![allow(unused)]
fn main() {
use xportrs::Xpt;

let validated = Xpt::writer(dataset).finalize() ?;

// Check for any issues
if validated.has_errors() {
eprintln ! ("Cannot write file due to errors:");
for issue in validated.issues() {
if issue.severity() == xportrs::Severity::Error {
eprintln ! ("  ERROR: {}", issue);
}
}
return Err("Validation failed".into());
}

// Proceed with write
validated.write_path("output.xpt") ?;
}

Agency-Specific Validation

#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};

// FDA validation (strict ASCII)
let fda_validated = Xpt::writer(dataset.clone())
.agency(Agency::FDA)
.finalize() ?;

// PMDA validation (allows extended characters)
let pmda_validated = Xpt::writer(dataset)
.agency(Agency::PMDA)
.finalize() ?;
}

Filtering Issues

#![allow(unused)]
fn main() {
use xportrs::{Severity, Xpt};

let validated = Xpt::writer(dataset).finalize() ?;

// Get only errors
let errors: Vec<_ > = validated.issues()
.iter()
.filter( | i| i.severity() == Severity::Error)
.collect();

// Get only warnings
let warnings: Vec<_ > = validated.issues()
.iter()
.filter( | i| i.severity() == Severity::Warning)
.collect();
}

Checking Specific Variables

#![allow(unused)]
fn main() {
let validated = Xpt::writer(dataset).finalize() ?;

for issue in validated.issues() {
// Check what the issue targets
match issue.target() {
"USUBJID" => println ! ("Issue with USUBJID: {}", issue),
"AESEQ" => println ! ("Issue with AESEQ: {}", issue),
_ => {}
}
}
}

Pinnacle 21 Rules

xportrs validation covers XPT-level rules. For full CDISC compliance, use Pinnacle 21:

Rules Covered by xportrs

Pinnacle 21 RuleDescriptionxportrs
SD1001Variable name >8 characters✅ Error
SD1002Variable label >40 characters✅ Error
SD0063Missing/mismatched variable label✅ Warning
SD0063AMissing/mismatched dataset label✅ Warning

Rules Requiring External Validation

Pinnacle 21 RuleDescriptionWhy External
SD0001Missing required variableDomain-specific
SD0002Null value in required fieldData content
SD0060Variable not in define.xmlRequires define.xml
CT2002Invalid controlled terminologyRequires CDISC CT
SE0063Label doesn’t match SDTM standardRequires SDTM metadata

Custom Validation

You can add custom validation before writing:

use xportrs::{Dataset, Xpt};

fn validate_custom(dataset: &Dataset) -> Vec<String> {
    let mut issues = vec![];

    // Check for required variables
    let required = ["STUDYID", "USUBJID"];
    for var in required {
        if dataset.column(var).is_none() {
            issues.push(format!("Missing required variable: {}", var));
        }
    }

    // Check STUDYID consistency
    if let Some(col) = dataset.column("STUDYID") {
        if let xportrs::ColumnData::String(values) = col.data() {
            let first = values.first().and_then(|v| v.as_ref());
            for (i, value) in values.iter().enumerate() {
                if value.as_ref() != first {
                    issues.push(format!("STUDYID inconsistent at row {}", i));
                }
            }
        }
    }

    issues
}

fn main() -> xportrs::Result<()> {
    let dataset = /* ... */;

    // Custom validation
    let custom_issues = validate_custom(&dataset);
    if !custom_issues.is_empty() {
        for issue in custom_issues {
            eprintln!("Custom validation: {}", issue);
        }
        return Err(xportrs::Error::invalid_data("Custom validation failed"));
    }

    // xportrs validation
    let validated = Xpt::writer(dataset).finalize()?;
    validated.write_path("output.xpt")?;

    Ok(())
}

Validation Best Practices

[!TIP] Run validation early in your pipeline to catch issues before processing large datasets.

  1. Validate incrementally: Check validation after each transformation step
  2. Log all issues: Even warnings may indicate data quality problems
  3. Use agency-specific validation: Different agencies have different requirements
  4. Combine with Pinnacle 21: xportrs + Pinnacle 21 provides comprehensive coverage
  5. Document exceptions: If you must ship with warnings, document why

XPT File Structure

This page provides a detailed overview of the XPT V5 file structure.

Overall Structure

An XPT file consists of a library (file) level and one or more member (dataset) levels:

graph TB
    subgraph "XPT V5 File"
        LH["Library Header<br/>80 bytes"]
        
        subgraph "Member 1 (Dataset)"
            MH1["Member Header<br/>80 bytes"]
            DC1["DSCRPTR Header<br/>80 bytes"]
            DD1["Dataset Descriptor<br/>160 bytes"]
            NSH1["NAMESTR Header<br/>80 bytes"]
            NS1["NAMESTR Records<br/>140 bytes × n"]
            OH1["OBS Header<br/>80 bytes"]
            OBS1["Observation Data"]
        end
        
        subgraph "Member 2 (Optional)"
            MH2["Member Header"]
            MORE2["..."]
        end
        
        LH --> MH1
        MH1 --> DC1
        DC1 --> DD1
        DD1 --> NSH1
        NSH1 --> NS1
        NS1 --> OH1
        OH1 --> OBS1
        OBS1 --> MH2
        MH2 --> MORE2
    end

Header Records

All headers are exactly 80 bytes with a distinctive pattern:

HEADER RECORD*******<type> HEADER RECORD!!!!!!!<numbers>

Library Header

HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000  

This header identifies the file as an XPT transport file.

Member Header

HEADER RECORD*******MEMBER  HEADER RECORD!!!!!!!000000000000000001600000000140  

The numbers indicate:

  • 00000016 (hex) = 22 bytes for version information
  • 00000140 (decimal) = 140 bytes per NAMESTR record

DSCRPTR Header

HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000  

Introduces the dataset descriptor records.

NAMESTR Header

HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000000000000000000000000  

The variable count is embedded in positions 54-57.

OBS Header

HEADER RECORD*******OBS     HEADER RECORD!!!!!!!000000000000000000000000000000  

Introduces the observation data section.

Dataset Descriptor

The dataset descriptor spans two 80-byte records (160 bytes total):

First Record (80 bytes)

OffsetSizeFieldExample
0-78sas1SAS
8-158sas2SAS
16-238saslibSASLIB
24-318version9.4
32-398osX64_10HO
40-478blanks
48-6316created01JAN24:00:00:00
64-7916modified01JAN24:00:00:00

Second Record (80 bytes)

OffsetSizeFieldExample
0-78dsnameAE
8-158sasdataSASDATA
16-238version9.4
24-318osX64_10HO
32-398blanks
40-7940labelAdverse Events

NAMESTR Section

After the NAMESTR header, each variable is described by a 140-byte NAMESTR record:

graph LR
    subgraph "NAMESTR Layout (140 bytes)"
        A["Type Info<br/>0-7"] --> B["Name<br/>8-15"]
        B --> C["Label<br/>16-55"]
        C --> D["Format<br/>56-69"]
        D --> E["Informat<br/>72-83"]
        E --> F["Position<br/>84-87"]
        F --> G["Reserved<br/>88-139"]
    end

See NAMESTR Records for the complete byte-by-byte layout.

NAMESTR Padding

NAMESTR records are packed into 80-byte physical records. Since 140 bytes doesn’t divide evenly into 80:

  • 5 NAMESTRs = 700 bytes = 8.75 records → pad to 720 bytes (9 records)
  • Formula: ceil(n_vars * 140 / 80) * 80

Observation Data

After the OBS header, data is stored in row-major order:

[Row 1]──[Var 1][Var 2][Var 3]...[Var N]
[Row 2]──[Var 1][Var 2][Var 3]...[Var N]
...
[Row M]──[Var 1][Var 2][Var 3]...[Var N]
[Padding to 80-byte boundary]

Row Length Calculation

#![allow(unused)]
fn main() {
fn row_length(variables: &[Variable]) -> usize {
    variables.iter().map(|v| {
        if v.is_numeric() {
            8  // Always 8 bytes for numerics
        } else {
            v.length  // 1-200 bytes for characters
        }
    }).sum()
}
}

End-of-File Padding

The file ends with space padding (0x20) to reach an 80-byte boundary.

Byte Order

All multi-byte integers are big-endian:

#![allow(unused)]
fn main() {
// Reading a 16-bit integer from XPT
let value = i16::from_be_bytes([bytes[0], bytes[1]]);

// Writing a 16-bit integer to XPT
let bytes = value.to_be_bytes();
}

Character Encoding

RegionEncodingNotes
FDA (US)ASCIIRequired
PMDA (Japan)Shift-JISExtended
GeneralLatin-1Common

[!IMPORTANT] For FDA submissions, use ASCII only. xportrs validates this when Agency::FDA is specified.

Example File (Hex Dump)

00000000: 4845 4144 4552 2052 4543 4f52 442a 2a2a  HEADER RECORD***
00000010: 2a2a 2a2a 4c49 4252 4152 5920 4845 4144  ****LIBRARY HEAD
00000020: 4552 2052 4543 4f52 4421 2121 2121 2121  ER RECORD!!!!!!!
00000030: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 2020  00000000000000  

Multi-Member Files

XPT files can contain multiple datasets (members). Each member has its own:

  • Member header
  • Dataset descriptor
  • NAMESTR section
  • Observation data
#![allow(unused)]
fn main() {
use xportrs::Xpt;

// Reading all members
let datasets = Xpt::read_all("multi.xpt")?;
for ds in datasets {
    println!("Dataset: {}", ds.domain_code());
}
}

[!NOTE] For FDA submissions, it’s common practice to use one dataset per file, but the format supports multiple.

NAMESTR Records

The NAMESTR (Name String) record describes each variable in the dataset. Each record is exactly 140 bytes.

NAMESTR Layout

%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '11px'}}}%%
graph LR
    subgraph "NAMESTR Record (140 bytes)"
        A["0-1<br/>ntype"] --> B["2-3<br/>nhfun"]
        B --> C["4-5<br/>nlng"]
        C --> D["6-7<br/>nvar0"]
        D --> E["8-15<br/>nname"]
        E --> F["16-55<br/>nlabel"]
        F --> G["56-63<br/>nform"]
        G --> H["64-65<br/>nfl"]
        H --> I["66-67<br/>nfd"]
        I --> J["68-69<br/>nfj"]
        J --> K["70-71<br/>nfill"]
        K --> L["72-79<br/>niform"]
        L --> M["80-81<br/>nifl"]
        M --> N["82-83<br/>nifd"]
        N --> O["84-87<br/>npos"]
        O --> P["88-139<br/>rest"]
    end

Complete Field Reference

OffsetSizeFieldTypeDescription
0-12ntypei16Variable type: 1=numeric, 2=character
2-32nhfuni16Hash function (always 0)
4-52nlngi16Variable length in bytes
6-72nvar0i16Variable number (1-based)
8-158nnamechar[8]Variable name (space-padded)
16-5540nlabelchar[40]Variable label (space-padded)
56-638nformchar[8]Display format name
64-652nfli16Format length
66-672nfdi16Format decimal places
68-692nfji16Format justification (0=left, 1=right)
70-712nfilli16Unused padding
72-798niformchar[8]Input format name
80-812nifli16Informat length
82-832nifdi16Informat decimal places
84-874nposi32Position in observation
88-13952restchar[52]Reserved (zeros/spaces)

Field Details

ntype (Variable Type)

ValueMeaningStorage
1Numeric8 bytes, IBM float
2Character1-200 bytes, space-padded

nlng (Variable Length)

TypeValid RangeNotes
NumericAlways 8IBM float requires 8 bytes
Character1-200FDA maximum is 200 bytes

nname (Variable Name)

  • 8 bytes, right-padded with spaces
  • Uppercase letters A-Z, digits 0-9, underscore
  • Must start with a letter
  • Example: USUBJID (note trailing space)

nlabel (Variable Label)

  • 40 bytes, right-padded with spaces
  • Should be descriptive for data reviewers
  • Example: Unique Subject Identifier

Format Fields (nform, nfl, nfd, nfj)

The display format is stored across four fields:

#![allow(unused)]
fn main() {
// Example: DATE9. format
nform = "DATE    "   // Format name (8 bytes, space-padded)
nfl = 9              // Total width
nfd = 0              // Decimal places
nfj = 0              // Justification (0=left, 1=right)

// Example: 8.2 format (numeric with 2 decimals)
nform = "        "   // No named format
nfl = 8              // Total width
nfd = 2              // Decimal places
nfj = 1              // Right-justified (typical for numbers)

// Example: $CHAR200. format
nform = "$CHAR   "   // Format name with $ prefix
nfl = 200            // Total width
nfd = 0              // Not applicable for character
nfj = 0              // Left-justified (typical for text)
}

Informat Fields (niform, nifl, nifd)

Input format mirrors the display format structure but without justification:

FieldSizeDescription
niform8Input format name
nifl2Input format length
nifd2Input format decimals

npos (Position in Observation)

The byte offset of this variable within each observation row:

Observation Row:
[STUDYID      ][USUBJID      ][AGE     ][SEX]
^              ^              ^         ^
npos=0         npos=20        npos=60   npos=68

Parsing NAMESTR in Rust

#![allow(unused)]
fn main() {
use std::io::{Read, Cursor};
use byteorder::{BigEndian, ReadBytesExt};

struct Namestr {
    ntype: i16,
    nlng: i16,
    nvar0: i16,
    nname: String,
    nlabel: String,
    nform: String,
    nfl: i16,
    nfd: i16,
    nfj: i16,
    niform: String,
    nifl: i16,
    nifd: i16,
    npos: i32,
}

fn parse_namestr(bytes: &[u8; 140]) -> Namestr {
    let mut cursor = Cursor::new(bytes);
    
    let ntype = cursor.read_i16::<BigEndian>().unwrap();
    let _nhfun = cursor.read_i16::<BigEndian>().unwrap();
    let nlng = cursor.read_i16::<BigEndian>().unwrap();
    let nvar0 = cursor.read_i16::<BigEndian>().unwrap();
    
    let mut nname = [0u8; 8];
    cursor.read_exact(&mut nname).unwrap();
    let nname = String::from_utf8_lossy(&nname).trim_end().to_string();
    
    let mut nlabel = [0u8; 40];
    cursor.read_exact(&mut nlabel).unwrap();
    let nlabel = String::from_utf8_lossy(&nlabel).trim_end().to_string();
    
    // ... continue for remaining fields
    
    Namestr { ntype, nlng, nvar0, nname, nlabel, /* ... */ }
}
}

Writing NAMESTR in Rust

#![allow(unused)]
fn main() {
use std::io::Write;
use byteorder::{BigEndian, WriteBytesExt};

fn write_namestr<W: Write>(w: &mut W, var: &Variable, pos: i32) -> std::io::Result<()> {
    // ntype
    w.write_i16::<BigEndian>(if var.is_numeric { 1 } else { 2 })?;
    
    // nhfun (always 0)
    w.write_i16::<BigEndian>(0)?;
    
    // nlng
    w.write_i16::<BigEndian>(var.length as i16)?;
    
    // nvar0 (1-based variable number)
    w.write_i16::<BigEndian>(var.index as i16 + 1)?;
    
    // nname (8 bytes, space-padded)
    let mut name = [b' '; 8];
    name[..var.name.len().min(8)].copy_from_slice(var.name.as_bytes());
    w.write_all(&name)?;
    
    // nlabel (40 bytes, space-padded)
    let mut label = [b' '; 40];
    label[..var.label.len().min(40)].copy_from_slice(var.label.as_bytes());
    w.write_all(&label)?;
    
    // Format fields...
    // Informat fields...
    // npos
    w.write_i32::<BigEndian>(pos)?;
    
    // rest (52 bytes of zeros)
    w.write_all(&[0u8; 52])?;
    
    Ok(())
}
}

xportrs Format API

xportrs provides a high-level API for format handling:

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};

// Create column with format metadata
let col = Column::new("AESTDTC", ColumnData::F64(vec![Some(23391.0)]))
    .with_label("Start Date/Time")
    .with_format(Format::parse("DATE9.").unwrap());

// The Format struct extracts:
// - name: "DATE"
// - length: 9
// - decimals: 0
// - justification: Right (default for formats)
}

Common Formats

FormatnformnflnfdDescription
DATE9.DATE 90Date (01JAN2024)
DATETIME20.DATETIME200Date and time
8.2 82Numeric with 2 decimals
BEST12.BEST 120Best representation
$CHAR200.$CHAR 2000Character (200 bytes)
$200.$ 2000Character shorthand

[!TIP] For FDA submissions, avoid custom formats. Use standard SAS formats like DATE9., DATETIME20., and simple numeric formats.

IBM Floating Point

XPT files use IBM System/360 floating-point format, not IEEE 754. This page explains the format and conversion process.

Format Overview

IBM floating-point uses base-16 (hexadecimal) exponent instead of base-2:

graph LR
    subgraph "IBM Float (8 bytes = 64 bits)"
        A["Bit 0<br/>Sign"] --> B["Bits 1-7<br/>Exponent<br/>(excess-64)"]
        B --> C["Bits 8-63<br/>Mantissa<br/>(56 bits)"]
    end
FieldBitsRangeDescription
Sign10-10=positive, 1=negative
Exponent70-127Power of 16, biased by 64
Mantissa56Fractional part in hex

Key Differences from IEEE 754

AspectIEEE 754 (double)IBM Float
Exponent base216
Exponent bias102364
Mantissa bits5256
Implied bitYes (1.xxx)No
Precision~15-17 digits~14-16 digits
Special valuesNaN, ±InfMissing values

Value Calculation

The value of an IBM float is:

value = sign × (0.mantissa) × 16^(exponent - 64)

Where:

  • sign = +1 if bit 0 is 0, -1 if bit 0 is 1
  • mantissa = fractional value in hexadecimal (0.xxxxxx…)
  • exponent = 7-bit integer from bits 1-7

Conversion Examples

Example 1: Encoding 1.0

1.0 in hex: 0.1 × 16^1

Exponent = 1 + 64 = 65 = 0x41
Mantissa = 0x1000000000000 (1 in top nibble)

Bytes: 41 10 00 00 00 00 00 00

Example 2: Encoding 100.0

100.0 = 0x64 = 0.64 × 16^2

Exponent = 2 + 64 = 66 = 0x42
Mantissa = 0x6400000000000

Bytes: 42 64 00 00 00 00 00 00

Example 3: Encoding -3.14159

3.14159 ≈ 0.3243F6A8885A3 × 16^1

Sign = 1 (negative)
Exponent = 1 + 64 = 65 = 0x41
With sign: 0xC1

Bytes: C1 32 43 F6 A8 88 5A 30

Rust Implementation

Encoding (IEEE → IBM)

#![allow(unused)]
fn main() {
fn ieee_to_ibm(value: f64) -> [u8; 8] {
    if value == 0.0 {
        return [0u8; 8];
    }

    let sign = if value < 0.0 { 0x80u8 } else { 0x00u8 };
    let abs_value = value.abs();

    // Get IEEE 754 components
    let bits = abs_value.to_bits();
    let ieee_exp = ((bits >> 52) & 0x7FF) as i32 - 1023;
    let ieee_mant = bits & 0xFFFFFFFFFFFFF;

    // Convert to IBM format
    // IBM exponent is power of 16, so divide IEEE exp by 4
    let ibm_exp = (ieee_exp + 256) / 4 - 64 + 65;  // Adjust for bias
    let shift = (ieee_exp + 256) % 4;

    // Shift mantissa accordingly
    let ibm_mant = ((ieee_mant | 0x10000000000000) >> (4 - shift))
        >> (52 - 56);  // Extend to 56 bits

    let mut result = [0u8; 8];
    result[0] = sign | (ibm_exp as u8 & 0x7F);
    result[1..8].copy_from_slice(&ibm_mant.to_be_bytes()[1..8]);

    result
}
}

Decoding (IBM → IEEE)

#![allow(unused)]
fn main() {
fn ibm_to_ieee(bytes: [u8; 8]) -> f64 {
    // Check for zero
    if bytes == [0u8; 8] {
        return 0.0;
    }

    // Check for missing value
    if bytes[0] == 0x2E || (bytes[0] >= 0x41 && bytes[0] <= 0x5A) {
        return f64::NAN;  // Represent as NaN
    }

    let sign = if bytes[0] & 0x80 != 0 { -1.0 } else { 1.0 };
    let exp = (bytes[0] & 0x7F) as i32 - 64;

    // Extract 56-bit mantissa
    let mut mant: u64 = 0;
    for i in 1..8 {
        mant = (mant << 8) | bytes[i] as u64;
    }

    // Convert to IEEE
    let value = (mant as f64) / (1u64 << 56) as f64;
    sign * value * 16.0_f64.powi(exp)
}
}

Missing Values

XPT uses special byte patterns for missing values:

Missing TypeFirst ByteDescription
.0x2EStandard missing
.A0x41Missing A
.B0x42Missing B
.Z0x5AMissing Z
._0x5FMissing underscore

Detecting Missing Values

#![allow(unused)]
fn main() {
fn is_missing(bytes: [u8; 8]) -> Option<char> {
    match bytes[0] {
        0x2E => Some('.'),  // Standard missing
        b @ 0x41..=0x5A => Some((b - 0x41 + b'A') as char),  // .A-.Z
        0x5F => Some('_'),  // ._
        _ => None,
    }
}
}

Precision Considerations

Due to the base-16 exponent, IBM float has variable precision:

Value RangeApproximate Precision
0.0001 - 0.001~14 digits
0.001 - 1.0~15 digits
1.0 - 1000.0~15-16 digits
Large values~14 digits

[!WARNING] When converting from IEEE 754 to IBM float, some precision loss may occur. For critical values, consider storing as character strings.

xportrs Handling

xportrs handles IBM float conversion automatically:

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Dataset, Xpt};

// Numeric values are automatically converted to IBM float on write
let dataset = Dataset::new("LB", vec![
    Column::new("LBSTRESN", ColumnData::F64(vec![
        Some(3.14159265358979),
        Some(100.0),
        None,  // Becomes SAS missing value
    ])),
]) ?;

Xpt::writer(dataset)
.finalize() ?
.write_path("lb.xpt") ?;

// On read, IBM floats are automatically converted back to f64
let loaded = Xpt::read("lb.xpt") ?;
}

Testing Conversion

#![allow(unused)]
fn main() {
#[test]
fn test_roundtrip() {
    let values = [1.0, -1.0, 100.0, 0.001, 3.14159, 1e10, 1e-10];

    for &v in &values {
        let ibm = ieee_to_ibm(v);
        let back = ibm_to_ieee(ibm);

        // Allow for small precision loss
        let rel_error = ((v - back) / v).abs();
        assert!(rel_error < 1e-14, "Value {} roundtrip error: {}", v, rel_error);
    }
}
}

References

Timestamps and Dates

XPT files use the SAS date system for timestamps and dates. This page explains date handling in xportrs.

SAS Epoch

SAS uses January 1, 1960 as its epoch (day zero), different from Unix (1970):

graph LR
    subgraph "Date Epochs"
        SAS["SAS Epoch<br/>1960-01-01<br/>Day 0"]
        UNIX["Unix Epoch<br/>1970-01-01<br/>Day 3653"]
        TODAY["2024-01-15<br/>Day 23391"]
    end
    
    SAS --> |"3653 days"| UNIX
    UNIX --> |"19738 days"| TODAY

Date Types

TypeStorageUnitExample Format
Datef64Days since 1960-01-01DATE9.
Timef64Seconds since midnightTIME8.
DateTimef64Seconds since 1960-01-01 00:00:00DATETIME20.

Conversion Formulas

Date Conversions

#![allow(unused)]
fn main() {
use chrono::{NaiveDate, Datelike};

// SAS epoch
const SAS_EPOCH: NaiveDate = NaiveDate::from_ymd_opt(1960, 1, 1).unwrap();

/// Convert NaiveDate to SAS date number
fn to_sas_date(date: NaiveDate) -> f64 {
    (date - SAS_EPOCH).num_days() as f64
}

/// Convert SAS date number to NaiveDate
fn from_sas_date(sas_date: f64) -> NaiveDate {
    SAS_EPOCH + chrono::Duration::days(sas_date as i64)
}

// Examples:
// 1960-01-01 → 0
// 1970-01-01 → 3653
// 2024-01-15 → 23391
}

DateTime Conversions

#![allow(unused)]
fn main() {
use chrono::{NaiveDateTime, NaiveDate, NaiveTime};

/// Convert NaiveDateTime to SAS datetime number
fn to_sas_datetime(dt: NaiveDateTime) -> f64 {
    let epoch = NaiveDateTime::new(
        NaiveDate::from_ymd_opt(1960, 1, 1).unwrap(),
        NaiveTime::from_hms_opt(0, 0, 0).unwrap(),
    );
    (dt - epoch).num_seconds() as f64
}

/// Convert SAS datetime number to NaiveDateTime
fn from_sas_datetime(sas_dt: f64) -> NaiveDateTime {
    let epoch = NaiveDateTime::new(
        NaiveDate::from_ymd_opt(1960, 1, 1).unwrap(),
        NaiveTime::from_hms_opt(0, 0, 0).unwrap(),
    );
    epoch + chrono::Duration::seconds(sas_dt as i64)
}
}

Time Conversions

#![allow(unused)]
fn main() {
use chrono::NaiveTime;

/// Convert NaiveTime to SAS time number
fn to_sas_time(time: NaiveTime) -> f64 {
    time.num_seconds_from_midnight() as f64
}

/// Convert SAS time number to NaiveTime
fn from_sas_time(sas_time: f64) -> NaiveTime {
    let seconds = sas_time as u32;
    NaiveTime::from_num_seconds_from_midnight_opt(seconds, 0).unwrap()
}
}

Date Formats

Common Date Formats

FormatExample OutputDescription
DATE9.15JAN2024Standard SAS date
DATE7.15JAN24Short year
MMDDYY10.01/15/2024US format
DDMMYY10.15/01/2024European format
YYMMDD10.2024-01-15ISO format
E8601DA.2024-01-15ISO 8601

DateTime Formats

FormatExample Output
DATETIME20.15JAN2024:14:30:00
E8601DT.2024-01-15T14:30:00

Time Formats

FormatExample Output
TIME8.14:30:00
TIME5.14:30
HHMM.14:30

Using Dates in xportrs

Storing as Numeric with Format

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};

// Calculate SAS date for 2024-01-15
let sas_date = 23391.0;  // Days since 1960-01-01

Column::new("AESTDT", ColumnData::F64(vec![Some(sas_date)]))
    .with_label("Start Date")
    .with_format_str("DATE9.")?
}

For SDTM submissions, dates are typically stored as ISO 8601 character strings:

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};

// ISO 8601 date string
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
    .with_label("Start Date/Time of Adverse Event")
    .with_format(Format::character(19))
    .with_length(19)
}

[!TIP] SDTM uses --DTC variables (character) for dates/times, while ADaM often uses --DT/--TM (numeric) variables with date formats.

Partial Dates

SDTM allows partial dates in character variables:

PrecisionExampleDescription
Complete2024-01-15Full date
Month2024-01Unknown day
Year2024Unknown month/day
#![allow(unused)]
fn main() {
// Partial date examples
let dates = vec![
    Some("2024-01-15".to_string()),  // Complete
    Some("2024-01".to_string()),     // Month only
    Some("2024".to_string()),        // Year only
    None,                             // Missing
];

Column::new("AESTDTC", ColumnData::String(dates))
    .with_label("Start Date/Time")
    .with_format(Format::character(19))
}

File Timestamps

XPT files contain creation and modification timestamps in the dataset descriptor:

Position 48-63: Creation timestamp (ddMMMyy:hh:mm:ss)
Position 64-79: Modified timestamp (ddMMMyy:hh:mm:ss)

Example: "01JAN24:14:30:00"

Reading File Timestamps

#![allow(unused)]
fn main() {
use xportrs::Xpt;

let info = Xpt::inspect("ae.xpt")?;
if let Some(created) = &info.created {
    println!("Created: {}", created);
}
if let Some(modified) = &info.modified {
    println!("Modified: {}", modified);
}
}

Time Zone Considerations

[!WARNING] XPT files do not store time zone information. All times are assumed to be in the local time zone where the data was collected.

For SDTM submissions:

  • Store times in ISO 8601 format with explicit time zone when known
  • Document time zone assumptions in the Reviewer’s Guide

Best Practices

  1. Use ISO 8601 for SDTM: Store dates as character strings (AESTDTC) rather than numeric
  2. Use numeric for ADaM: ADaM analysis dates (ASTDT) are typically numeric with formats
  3. Document partial dates: Use imputation flags (AESTDTF) to indicate partial date handling
  4. Consider precision: Numeric dates have ~15 digit precision; sub-second precision may be lost

Reference

Text Encoding

XPT files store text as fixed-width byte strings. This page covers character encoding considerations.

Encoding Overview

graph LR
    subgraph "Text Encoding Flow"
        A[Rust String<br/>UTF-8] --> B{Agency?}
        B -->|FDA| C[ASCII Only]
        B -->|PMDA| D[Shift-JIS/Latin-1]
        B -->|Other| E[Latin-1]
        C --> F[XPT File]
        D --> F
        E --> F
    end

Supported Encodings

Encodingxportrs SupportUse Case
ASCIIFullFDA submissions
Latin-1 (ISO-8859-1)FullExtended European
UTF-8Input onlyConverted to target

FDA ASCII Requirements

For FDA submissions, all text must be ASCII (bytes 0x00-0x7F):

#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};

// ASCII validation is automatic with FDA agency
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;

// Non-ASCII characters will generate errors
for issue in validated.issues() {
println ! ("{}", issue);
}
}

Valid ASCII Characters

CategoryCharacters
LettersA-Z, a-z
Digits0-9
Punctuation!\"#$%&'()*+,-./:;<=>?@[\\]^_\{
Space (0x20)

Common Non-ASCII Issues

CharacterUnicodeIssue
é (e-acute)U+00E9Not ASCII
° (degree)U+00B0Not ASCII
µ (micro)U+00B5Not ASCII
® (registered)U+00AENot ASCII
— (em dash)U+2014Not ASCII
“ “ (smart quotes)U+201C/DNot ASCII

Handling Non-ASCII in FDA Submissions

#![allow(unused)]
fn main() {
/// Replace common non-ASCII characters with ASCII equivalents
fn ascii_safe(s: &str) -> String {
    s.chars().map(|c| match c {
        'é' | 'è' | 'ê' | 'ë' => 'e',
        'á' | 'à' | 'â' | 'ä' => 'a',
        'ó' | 'ò' | 'ô' | 'ö' => 'o',
        'ú' | 'ù' | 'û' | 'ü' => 'u',
        'í' | 'ì' | 'î' | 'ï' => 'i',
        'ñ' => 'n',
        'ç' => 'c',
        '°' => ' ',  // or "deg"
        'µ' => 'u',  // or "micro"
        '®' => '(
        R)',
            '™' => '(TM)',
            '"' | '"' => '"',
        ''' | ''
        ' => '\'',
        '—' | '–' => '-',
        c if c.is_ascii() => c,
        _ => '?',  // Unknown non-ASCII
    }).collect()
}
}

Latin-1 Encoding

For non-FDA submissions, Latin-1 (ISO-8859-1) provides extended character support:

#![allow(unused)]
fn main() {
use xportrs::{TextMode, Xpt};

let validated = Xpt::writer(dataset)
.text_mode(TextMode::Latin1)
.finalize() ?;
}

Latin-1 Character Range

RangeDescription
0x00-0x7FASCII (same as UTF-8)
0x80-0x9FControl characters (avoid)
0xA0-0xFFExtended Latin (accents, symbols)

Character Variable Length

XPT character variables have a fixed length (1-200 bytes):

graph LR
subgraph "Character Field (20 bytes)"
A["H"] --> B["e"] --> C["l"] --> D["l"] --> E["o"]
E --> F[" "] --> G[" "] --> H["..."] --> I[" "]
end
  • Values shorter than the field length are right-padded with spaces
  • Values longer than the field length are truncated

Explicit Length Control

#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};

// Set explicit length to 200 bytes for long text
Column::new("AETERM", ColumnData::String(vec![Some("Headache".into())]))
.with_label("Reported Term")
.with_format(Format::character(200))
.with_length(200)
}

Auto-Derived Length

When no explicit length is set, xportrs derives the length from the data:

#![allow(unused)]
fn main() {
// Length will be max(len("Hello"), len("World")) = 5
let data = vec![Some("Hello".into()), Some("World".into())];
Column::new("VAR", ColumnData::String(data))
}

UTF-8 to Encoding Conversion

xportrs accepts UTF-8 strings and converts to the target encoding:

#![allow(unused)]
fn main() {
// UTF-8 input (Rust default)
let utf8_string = "Héllo Wörld";  // Contains non-ASCII

// With ASCII mode (FDA)
// Error: contains non-ASCII characters

// With Latin-1 mode
// Converted: "Héllo Wörld" → Latin-1 bytes
}

Conversion Errors

Non-representable characters cause errors:

#![allow(unused)]
fn main() {
// Japanese text cannot be represented in Latin-1
let japanese = "日本語";

// This will fail with Latin-1 encoding
// Use ASCII transliteration or Shift-JIS for PMDA
}

Space Padding

XPT uses space (0x20) for padding, not null (0x00):

#![allow(unused)]
fn main() {
fn pad_to_length(s: &str, len: usize) -> Vec<u8> {
    let mut bytes = s.as_bytes().to_vec();
    bytes.resize(len, b' ');  // Space padding
    bytes
}

// "Hi" with length 8 → [72, 105, 32, 32, 32, 32, 32, 32]
//                        'H' 'i'  ' '  ' '  ' '  ' '  ' '  ' '
}

Reading Encoded Text

When reading XPT files, xportrs trims trailing spaces and converts to UTF-8:

#![allow(unused)]
fn main() {
use xportrs::Xpt;

let dataset = Xpt::read("data.xpt") ?;

for col in dataset.columns() {
if let ColumnData::String(values) = col.data() {
for value in values {
if let Some(s) = value {
// s is a Rust String (UTF-8)
println ! ("{}", s);
}
}
}
}
}

Best Practices

  1. Use ASCII for FDA submissions: Avoid accented characters and symbols
  2. Validate early: Check for encoding issues before building datasets
  3. Document character sets: Note any extended character usage in metadata
  4. Prefer explicit lengths: Set character lengths explicitly for predictable behavior
  5. Test roundtrip: Verify that read → write → read preserves text correctly

[!IMPORTANT] The FDA Technical Conformance Guide requires ASCII text. Non-ASCII characters may cause validation failures or data integrity issues during regulatory review.

Architecture Overview

This page provides a high-level view of xportrs internal architecture.

Module Structure

graph TB
    subgraph "Public API"
        XPT[Xpt] --> READER[XptReaderBuilder]
        XPT --> WRITER[XptWriterBuilder]
        DATASET[Dataset] --> COLUMN[Column]
        COLUMN --> COLDATA[ColumnData]
        COLUMN --> FORMAT[Format]
    end
    
    subgraph "Core Modules"
        SCHEMA[schema] --> DERIVE[derive.rs]
        SCHEMA --> PLAN[plan.rs]
        VALIDATE[validate] --> CHECKS[checks_v5.rs]
        VALIDATE --> ISSUES[issues.rs]
    end
    
    subgraph "XPT V5 Implementation"
        V5[xpt/v5] --> READ[read/]
        V5 --> WRITE[write/]
        READ --> PARSER[parse.rs]
        READ --> OBS[obs.rs]
        WRITE --> NAMESTR[namestr.rs]
        WRITE --> SPLIT[split.rs]
    end
    
    subgraph "Low-Level"
        IBM[ibm_float.rs]
        RECORD[record.rs]
        TIMESTAMP[timestamp.rs]
    end

Key Components

Public API Layer

ComponentPurpose
XptEntry point for reading/writing
DatasetCollection of columns with metadata
ColumnVariable data and metadata
ColumnDataTyped data storage
FormatSAS format parsing and representation

Schema Layer

ComponentPurpose
DatasetSchemaComputed schema for writing
VariableSpecPer-variable write plan
derive_schema_plan()Computes schema from Dataset

Validation Layer

ComponentPurpose
ValidatedWriteValidated dataset ready to write
IssueValidation problem description
SeverityError/Warning/Info classification

XPT V5 Layer

ComponentPurpose
XptReaderReads XPT files
XptWriterWrites XPT files
SplitWriterHandles file splitting
pack_namestr()Creates NAMESTR records

Low-Level Layer

ComponentPurpose
ibm_floatIBM float encoding/decoding
record80-byte record handling
timestampSAS epoch date handling

Design Principles

1. Type Safety

Rust’s type system prevents common errors:

#![allow(unused)]
fn main() {
// DomainCode, Label, VariableName are distinct types
let domain = DomainCode::new("AE");
let label = Label::new("Adverse Events");
// Can't accidentally swap them

// ColumnData enforces type consistency
let data = ColumnData::F64(vec![Some(1.0)]);
// Can't mix types within a column
}

2. Builder Pattern

Complex objects use builders for ergonomic construction:

#![allow(unused)]
fn main() {
// Reader builder
let dataset = Xpt::reader("file.xpt")
    .row_limit(100)
    .read()?;

// Writer builder
let validated = Xpt::writer(dataset)
    .agency(Agency::FDA)
    .finalize()?;
}

3. Validation Pipeline

Validation happens before writing:

graph LR
    A[Dataset] --> B[XptWriterBuilder]
    B --> C[finalize]
    C --> D[validate_v5_schema]
    D --> E[ValidatedWrite]
    E --> F{has_errors?}
    F --> |No| G[write_path]
    F --> |Yes| H[Return Issues]

4. Metadata Preservation

Metadata flows through all operations:

graph LR
    subgraph "Read Path"
        XPT1[XPT File] --> NS1[NAMESTR]
        NS1 --> COL1[Column]
    end
    
    subgraph "Storage"
        COL1 --> DS[Dataset]
    end
    
    subgraph "Write Path"
        DS --> VS[VariableSpec]
        VS --> NS2[NAMESTR]
        NS2 --> XPT2[XPT File]
    end

5. Zero-Copy Where Possible

String data uses references where safe:

#![allow(unused)]
fn main() {
// Reading: borrows from buffer where possible
// Writing: uses slices directly when aligned
}

Error Handling

xportrs uses a unified Error type:

#![allow(unused)]
fn main() {
pub enum Error {
    Io(std::io::Error),
    InvalidHeader { message: String },
    InvalidData { message: String },
    InvalidSchema { message: String },
    MemberNotFound { domain_code: String },
    // ...
}
}

Errors implement std::error::Error and are Send + Sync + 'static.

Thread Safety

All public types are Send + Sync:

#![allow(unused)]
fn main() {
// Can be shared across threads
let dataset = Arc::new(Xpt::read("data.xpt")?);

// Can be sent to other threads
std::thread::spawn(move || {
    for col in dataset.columns() {
        println!("{}", col.name());
    }
});
}

Memory Layout

Dataset

Dataset {
    domain_code: DomainCode(String),
    dataset_label: Option<Label>,
    columns: Vec<Column>,
}

Column

Column {
    name: VariableName(String),
    role: Option<VariableRole>,
    data: ColumnData,
    label: Option<Label>,
    format: Option<Format>,
    informat: Option<Format>,
    length: Option<usize>,
}

ColumnData

enum ColumnData {
    F64(Vec<Option<f64>>),
    I64(Vec<Option<i64>>),
    Bool(Vec<Option<bool>>),
    String(Vec<Option<String>>),
    Bytes(Vec<Option<Vec<u8>>>),
    Date(Vec<Option<NaiveDate>>),
    DateTime(Vec<Option<NaiveDateTime>>),
    Time(Vec<Option<NaiveTime>>),
}

Extension Points

Adding New Validation Rules

  1. Add variant to Issue enum
  2. Implement severity() and Display
  3. Add check in validate_v5_schema()

Supporting New Agencies

  1. Add variant to Agency enum
  2. Add agency-specific validation in checks_v5.rs

Adding Column Types

  1. Add variant to ColumnData
  2. Handle in reader/writer
  3. Add From implementation

Data Flow

This page details how data flows through xportrs during reading and writing.

Reading Flow

flowchart TB
    subgraph "1. File Parsing"
        A[XPT File] --> B[parse_header]
        B --> C[XptMemberInfo]
        C --> D[NamestrV5 records]
    end
    
    subgraph "2. Data Reading"
        D --> E[ObservationReader]
        E --> F[decode_ibm_float]
        E --> G[decode_text]
        F --> H[ObsValue::Numeric]
        G --> I[ObsValue::Character]
    end
    
    subgraph "3. Type Conversion"
        H --> J[ColumnData::F64]
        I --> K[ColumnData::String]
    end
    
    subgraph "4. Assembly"
        J --> L[Column]
        K --> L
        D --> |metadata| L
        L --> M[Dataset]
    end

Step-by-Step Reading

1. Parse File Header

#![allow(unused)]
fn main() {
// In parse.rs
pub fn parse_header<R: Read + Seek>(reader: &mut R) -> Result<XptInfo> {
    // Read library header (80 bytes)
    let lib_header = read_record(reader)?;
    verify_library_header(&lib_header)?;
    
    // Read each member
    let mut members = Vec::new();
    while let Some(member) = parse_member_header(reader)? {
        members.push(member);
    }
    
    Ok(XptInfo { members, ... })
}
}

2. Parse NAMESTR Records

#![allow(unused)]
fn main() {
// In namestr.rs
pub fn unpack_namestr(bytes: &[u8; 140]) -> Result<NamestrV5> {
    let ntype = i16::from_be_bytes([bytes[0], bytes[1]]);
    let nlng = i16::from_be_bytes([bytes[4], bytes[5]]);
    let nname = parse_string(&bytes[8..16]);
    let nlabel = parse_string(&bytes[16..56]);
    let nform = parse_string(&bytes[56..64]);
    let nfl = i16::from_be_bytes([bytes[64], bytes[65]]);
    // ... more fields
    
    Ok(NamestrV5 { ntype, nlng, nname, nlabel, ... })
}
}

3. Read Observations

#![allow(unused)]
fn main() {
// In obs.rs
pub fn read_observation(&mut self) -> Result<Option<Vec<ObsValue>>> {
    let mut row = Vec::with_capacity(self.variables.len());
    
    for var in &self.variables {
        if var.is_numeric() {
            let bytes = self.read_bytes(8)?;
            let value = decode_ibm_float(bytes);
            row.push(ObsValue::Numeric(value));
        } else {
            let bytes = self.read_bytes(var.length)?;
            let value = decode_text(bytes);
            row.push(ObsValue::Character(value));
        }
    }
    
    Ok(Some(row))
}
}

4. Build Column with Metadata

#![allow(unused)]
fn main() {
// In reader.rs
let cols: Vec<Column> = member.variables.iter()
    .zip(columns)
    .map(|(var, data)| {
        let mut col = Column::new(&var.nname, data);
        
        // Transfer metadata from NAMESTR
        if !var.nlabel.is_empty() {
            col = col.with_label(var.nlabel.as_str());
        }
        if !var.nform.is_empty() {
            col = col.with_format(Format::from_namestr(
                &var.nform, var.nfl, var.nfd, var.nfj
            ));
        }
        if var.is_character() {
            col = col.with_length(var.length());
        }
        
        col
    })
    .collect();
}

Writing Flow

flowchart TB
    subgraph "1. Schema Planning"
        A[Dataset] --> B[derive_schema_plan]
        B --> C[DatasetSchema]
        C --> D[VariableSpec per column]
    end
    
    subgraph "2. Validation"
        D --> E[validate_v5_schema]
        E --> F[Issue collection]
        F --> G{has_errors?}
        G --> |Yes| H[Block write]
        G --> |No| I[ValidatedWrite]
    end
    
    subgraph "3. Writing"
        I --> J[XptWriter]
        J --> K[write_headers]
        J --> L[pack_namestr]
        J --> M[write_observations]
    end
    
    subgraph "4. Encoding"
        M --> N[encode_ibm_float]
        M --> O[encode_text]
        N --> P[XPT File]
        O --> P
    end

Step-by-Step Writing

1. Derive Schema

#![allow(unused)]
fn main() {
// In derive.rs
pub fn derive_schema_plan(
    dataset: &Dataset,
    metadata: Option<&VariableMetadata>,
) -> DatasetSchema {
    let variables: Vec<VariableSpec> = dataset.columns()
        .iter()
        .enumerate()
        .map(|(i, col)| {
            let mut spec = VariableSpec {
                name: col.name().to_uppercase(),
                is_numeric: col.data().is_numeric(),
                length: compute_length(col),
                position: 0,  // Computed later
                ...
            };
            
            // Apply Column metadata
            if let Some(label) = col.label() {
                spec.label = label.to_string();
            }
            if let Some(format) = col.format() {
                spec.format = Some(format.clone());
            }
            
            spec
        })
        .collect();
    
    DatasetSchema { variables, ... }
}
}

2. Validate

#![allow(unused)]
fn main() {
// In checks_v5.rs
pub fn validate_v5_schema(
    schema: &DatasetSchema,
    options: &WriteOptions,
) -> Vec<Issue> {
    let mut issues = Vec::new();
    
    // Dataset-level checks
    if schema.dataset_label.is_empty() {
        issues.push(Issue::MissingDatasetLabel { 
            dataset: schema.name.clone() 
        });
    }
    
    // Variable-level checks
    for var in &schema.variables {
        if var.name.len() > 8 {
            issues.push(Issue::VariableNameTooLong { ... });
        }
        if var.label.is_empty() {
            issues.push(Issue::MissingVariableLabel { 
                variable: var.name.clone() 
            });
        }
        // ... more checks
    }
    
    issues
}
}

3. Pack NAMESTR

#![allow(unused)]
fn main() {
// In namestr.rs
pub fn pack_namestr<W: Write>(
    writer: &mut W,
    var: &VariableSpec,
    position: i32,
) -> Result<()> {
    // ntype
    writer.write_i16::<BigEndian>(
        if var.is_numeric { 1 } else { 2 }
    )?;
    
    // nhfun (always 0)
    writer.write_i16::<BigEndian>(0)?;
    
    // nlng
    writer.write_i16::<BigEndian>(var.length as i16)?;
    
    // nvar0
    writer.write_i16::<BigEndian>(var.index as i16 + 1)?;
    
    // nname (8 bytes, space-padded)
    let mut name = [b' '; 8];
    name[..var.name.len().min(8)]
        .copy_from_slice(var.name.as_bytes());
    writer.write_all(&name)?;
    
    // nlabel (40 bytes, space-padded)
    let mut label = [b' '; 40];
    label[..var.label.len().min(40)]
        .copy_from_slice(var.label.as_bytes());
    writer.write_all(&label)?;
    
    // Format fields
    if let Some(ref format) = var.format {
        write_format_fields(writer, format)?;
    } else {
        write_empty_format_fields(writer)?;
    }
    
    // ... remaining fields
    
    Ok(())
}
}

4. Write Observations

#![allow(unused)]
fn main() {
// In writer.rs
fn write_observations<W: Write>(
    writer: &mut W,
    dataset: &Dataset,
    schema: &DatasetSchema,
) -> Result<()> {
    for row_idx in 0..dataset.nrows() {
        for (col, spec) in dataset.columns().iter()
            .zip(&schema.variables) 
        {
            if spec.is_numeric {
                let value = get_numeric_value(col, row_idx);
                let ibm = encode_ibm_float(value);
                writer.write_all(&ibm)?;
            } else {
                let value = get_string_value(col, row_idx);
                let padded = pad_to_length(&value, spec.length);
                writer.write_all(&padded)?;
            }
        }
    }
    
    // Pad to 80-byte boundary
    pad_to_record_boundary(writer)?;
    
    Ok(())
}
}

Metadata Flow

sequenceDiagram
    participant User
    participant Column
    participant VariableSpec
    participant NAMESTR
    participant XPT
    
    User->>Column: with_label("Label")
    User->>Column: with_format(Format)
    
    Column->>VariableSpec: derive_schema_plan()
    Note right of VariableSpec: label, format copied
    
    VariableSpec->>NAMESTR: pack_namestr()
    Note right of NAMESTR: nlabel, nform, nfl, nfd
    
    NAMESTR->>XPT: Written to file
    
    Note over XPT,Column: Reading reverses the flow
    
    XPT->>NAMESTR: unpack_namestr()
    NAMESTR->>Column: Transfer metadata
    Note left of Column: Label, format restored

Error Flow

flowchart TB
    A[Operation] --> B{Success?}
    B --> |Yes| C[Return Ok]
    B --> |No| D[Create Error]
    D --> E[Add context]
    E --> F[Return Err]
    F --> G{Caller handles?}
    G --> |Yes| H[Recovery/Fallback]
    G --> |No| I[Propagate up]

All errors are:

  • Enriched with context
  • Send + Sync + 'static
  • Implement std::error::Error

Schema Derivation

This page explains how xportrs derives the write schema from a Dataset.

Schema Overview

The schema contains all information needed to write an XPT file:

#![allow(unused)]
fn main() {
pub struct DatasetSchema {
    pub name: String,           // Dataset name (uppercase)
    pub label: String,          // Dataset label
    pub variables: Vec<VariableSpec>,
    pub row_length: usize,      // Bytes per observation row
}

pub struct VariableSpec {
    pub name: String,           // Variable name
    pub label: String,          // Variable label
    pub is_numeric: bool,       // Type flag
    pub length: usize,          // Bytes per value
    pub position: usize,        // Offset in row
    pub format: Option<Format>, // Display format
    pub informat: Option<Format>, // Input format
}
}

Derivation Process

flowchart TB
    subgraph "Input"
        A[Dataset] --> B[Columns]
        C[VariableMetadata] --> D[Overrides]
    end
    
    subgraph "Derivation"
        B --> E[For each Column]
        E --> F[Compute base spec]
        D --> G[Apply overrides]
        F --> G
        G --> H[VariableSpec]
    end
    
    subgraph "Post-Processing"
        H --> I[Compute positions]
        I --> J[Compute row length]
        J --> K[DatasetSchema]
    end

Step 1: Compute Base Spec

For each column, derive the base specification:

#![allow(unused)]
fn main() {
fn compute_base_spec(col: &Column, index: usize) -> VariableSpec {
    // Determine type
    let is_numeric = matches!(col.data(), 
        ColumnData::F64(_) | ColumnData::I64(_) | ColumnData::Bool(_) |
        ColumnData::Date(_) | ColumnData::DateTime(_) | ColumnData::Time(_)
    );
    
    // Compute length
    let length = if is_numeric {
        8  // Always 8 bytes for numerics
    } else {
        compute_character_length(col)
    };
    
    VariableSpec {
        name: col.name().to_uppercase(),
        label: String::new(),
        is_numeric,
        length,
        position: 0,
        format: None,
        informat: None,
    }
}
}

Step 2: Character Length Computation

Character length is computed from data unless explicitly set:

#![allow(unused)]
fn main() {
fn compute_character_length(col: &Column) -> usize {
    // Priority 1: Explicit length override
    if let Some(len) = col.explicit_length() {
        return len.min(200);  // Cap at 200
    }
    
    // Priority 2: Derive from data
    if let ColumnData::String(values) = col.data() {
        let max_len = values.iter()
            .filter_map(|v| v.as_ref())
            .map(|s| s.len())
            .max()
            .unwrap_or(1);
        
        // Round up to reasonable size, cap at 200
        max_len.max(1).min(200)
    } else if let ColumnData::Bytes(values) = col.data() {
        let max_len = values.iter()
            .filter_map(|v| v.as_ref())
            .map(|b| b.len())
            .max()
            .unwrap_or(1);
        
        max_len.max(1).min(200)
    } else {
        8  // Default for numeric types
    }
}
}

Step 3: Apply Column Metadata

Column metadata is applied to the spec:

#![allow(unused)]
fn main() {
fn apply_column_metadata(spec: &mut VariableSpec, col: &Column) {
    // Label from Column
    if let Some(label) = col.label() {
        spec.label = truncate_to_bytes(label.as_ref(), 40);
    }
    
    // Format from Column
    if let Some(format) = col.format() {
        spec.format = Some(format.clone());
    }
    
    // Informat from Column
    if let Some(informat) = col.informat() {
        spec.informat = Some(informat.clone());
    }
    
    // Length override from Column (for character)
    if !spec.is_numeric {
        if let Some(len) = col.explicit_length() {
            spec.length = len.min(200);
        }
    }
}
}

Step 4: Apply External Metadata

Optional external metadata can override Column values:

#![allow(unused)]
fn main() {
fn apply_external_metadata(
    spec: &mut VariableSpec, 
    meta: Option<&VariableMetadata>,
) {
    if let Some(meta) = meta {
        // External metadata takes priority
        if let Some(label) = &meta.label {
            spec.label = truncate_to_bytes(label, 40);
        }
        if let Some(format) = &meta.format {
            spec.format = Some(format.clone());
        }
        if let Some(length) = meta.length {
            if !spec.is_numeric {
                spec.length = length.min(200);
            }
        }
    }
}
}

Step 5: Compute Positions

After all specs are created, compute byte positions:

#![allow(unused)]
fn main() {
fn compute_positions(specs: &mut [VariableSpec]) {
    let mut position = 0;
    
    for spec in specs {
        spec.position = position;
        position += spec.length;
    }
}

fn compute_row_length(specs: &[VariableSpec]) -> usize {
    specs.iter().map(|s| s.length).sum()
}
}

Complete Flow

#![allow(unused)]
fn main() {
pub fn derive_schema_plan(
    dataset: &Dataset,
    metadata: Option<&VariableMetadata>,
) -> DatasetSchema {
    // 1. Derive base specs
    let mut variables: Vec<VariableSpec> = dataset.columns()
        .iter()
        .enumerate()
        .map(|(i, col)| compute_base_spec(col, i))
        .collect();
    
    // 2. Apply Column metadata
    for (spec, col) in variables.iter_mut()
        .zip(dataset.columns()) 
    {
        apply_column_metadata(spec, col);
    }
    
    // 3. Apply external metadata
    for spec in &mut variables {
        apply_external_metadata(spec, metadata);
    }
    
    // 4. Compute positions
    compute_positions(&mut variables);
    
    // 5. Build schema
    DatasetSchema {
        name: dataset.domain_code().to_uppercase(),
        label: dataset.dataset_label()
            .map(|l| truncate_to_bytes(l, 40))
            .unwrap_or_default(),
        row_length: compute_row_length(&variables),
        variables,
    }
}
}

Priority Order

Metadata is applied with this priority (highest to lowest):

  1. External VariableMetadata - Programmatic overrides
  2. Column metadata - .with_label(), .with_format(), etc.
  3. Computed defaults - Derived from data
graph TB
    A[Computed Default] --> B[Column Metadata]
    B --> C[External Metadata]
    C --> D[Final VariableSpec]
    
    style D fill:#90EE90

Validation

The schema is validated after derivation:

#![allow(unused)]
fn main() {
fn validate_schema(schema: &DatasetSchema) -> Vec<Issue> {
    let mut issues = Vec::new();
    
    for var in &schema.variables {
        // Name validation
        if var.name.is_empty() {
            issues.push(Issue::InvalidVariableName { ... });
        }
        if var.name.len() > 8 {
            issues.push(Issue::VariableNameTooLong { ... });
        }
        
        // Label validation
        if var.label.is_empty() {
            issues.push(Issue::MissingVariableLabel { ... });
        }
        
        // Length validation
        if !var.is_numeric && var.length > 200 {
            issues.push(Issue::CharacterTooLong { ... });
        }
    }
    
    issues
}
}

Example

#![allow(unused)]
fn main() {
// Input
let col = Column::new("USUBJID", ColumnData::String(vec![
    Some("ABC-001".into()),
    Some("ABC-002".into()),
]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40))
.with_length(40);

// Derived VariableSpec
VariableSpec {
    name: "USUBJID",
    label: "Unique Subject Identifier",
    is_numeric: false,
    length: 40,
    position: 0,
    format: Some(Format::character(40)),
    informat: None,
}
}

Official Sources

This page lists authoritative sources for XPT format and regulatory requirements.

SAS Documentation

TS-140: XPT V5 Specification

The authoritative specification for XPT V5 format.

  • Title: Record Layout of a SAS Version 5 or 6 Data Set in SAS Transport (XPORT) Format
  • Publisher: SAS Institute Inc.
  • Link: TS-140 PDF

Key contents:

  • File structure (headers, NAMESTR, observations)
  • 140-byte NAMESTR record layout
  • IBM floating-point encoding
  • Character encoding rules

FDA Documentation

Study Data Technical Conformance Guide

Requirements for electronic study data submissions.

  • Publisher: U.S. Food and Drug Administration
  • Link: TCG PDF

Key contents:

  • Required file formats (XPT V5)
  • File size limits (5GB)
  • Character encoding (ASCII)
  • eCTD placement

FDA Data Standards Catalog

Supported CDISC standards and versions.

CDISC Standards

SDTM Implementation Guide

Standard for tabulation data structure.

Key contents:

ADaM Implementation Guide

Standard for analysis datasets.

Key contents:

  • Analysis dataset structures
  • Derived variable conventions
  • Traceability requirements

CDISC Controlled Terminology

Standard coded values for CDISC variables.

Format Registries

Library of Congress

Format documentation and preservation information.

Validation Tools

Pinnacle 21

Industry-standard CDISC validation tool.

Validates:

  • XPT file structure
  • CDISC standard compliance
  • define.xml consistency
  • Controlled terminology

OpenCDISC (Legacy)

Open-source validation (now Pinnacle 21 Community).

International Regulators

PMDA (Japan)

NMPA (China)

EMA (Europe)

Technical References

IBM Floating-Point

ISO 8601 Date/Time

Used for SDTM timing variables (--DTC).

Character Encodings

xportr (R Package)

R package for XPT file handling:

pyreadstat (Python)

Python library for reading statistical file formats:

haven (R Package)

R package for reading SAS files:

Citation

When referencing xportrs in academic or regulatory contexts:

@software{xportrs,
  title = {xportrs: SAS Transport (XPT) file format library for Rust},
  author = {xportrs contributors},
  year = {2024},
  url = {https://github.com/rubentalstra/xportrs},
  license = {MIT OR Apache-2.0},
}

Glossary

This glossary defines key terms used in xportrs and clinical trial data management.

A

ADaM (Analysis Data Model)
CDISC standard for analysis-ready datasets derived from SDTM data. Common datasets include ADSL (subject-level), ADAE (adverse events analysis), and ADLB (laboratory analysis).
Agency
Regulatory authority that reviews drug submissions. Major agencies include FDA (US), PMDA (Japan), NMPA (China), and EMA (Europe).
ANDA (Abbreviated New Drug Application)
FDA submission type for generic drugs.
ASCII
American Standard Code for Information Interchange. Character encoding required by FDA for XPT file text content. Uses bytes 0x00-0x7F.

B

BLA (Biologics License Application)
FDA submission type for biological products.
Big-endian
Byte order where the most significant byte is stored first. Used in XPT files.

C

CDASH (Clinical Data Acquisition Standards Harmonization)
CDISC standard for data collection forms. Upstream of SDTM.
CDISC (Clinical Data Interchange Standards Consortium)
Organization that develops data standards for clinical research, including SDTM, ADaM, and controlled terminology.
Column
In xportrs, represents a variable with its data and metadata. Corresponds to a variable in XPT terminology.
ColumnData
Enum in xportrs representing typed data storage (F64, String, Date, etc.).
Controlled Terminology
CDISC-defined standard values for coded variables. Example: SEX must be M, F, U, or UNDIFFERENTIATED.

D

Dataset
In xportrs, a collection of columns representing an XPT member. Also called a domain in SDTM context.
Define-XML
XML file describing the metadata for CDISC datasets. Required alongside XPT files in submissions.
Domain
SDTM term for a dataset representing a specific type of data (DM=Demographics, AE=Adverse Events, etc.).
DomainCode
In xportrs, the 1-8 character dataset identifier (e.g., “AE”, “DM”).

E

eCTD (Electronic Common Technical Document)
Standard format for regulatory submissions. XPT files are placed in specific eCTD modules.
EMA (European Medicines Agency)
Regulatory authority for the European Union.
Epoch
Reference date for date calculations. SAS uses January 1, 1960. Unix uses January 1, 1970.

F

FDA (Food and Drug Administration)
U.S. regulatory authority for drugs and medical devices.
Format
In xportrs, represents a SAS display format (e.g., DATE9., 8.2, $CHAR200.).

I

IBM Floating-Point
Hexadecimal (base-16) floating-point format used in XPT files. Different from IEEE 754.
IND (Investigational New Drug)
FDA application to begin clinical trials.
Informat
SAS input format specifying how data is read. Stored in XPT NAMESTR records.
Issue
In xportrs, represents a validation problem (Error, Warning, or Info severity).

L

Label
Descriptive text for a dataset or variable. Limited to 40 bytes in XPT V5.
Latin-1 (ISO-8859-1)
Character encoding supporting Western European characters. Allowed for non-FDA submissions.

M

Member
XPT term for a dataset within a transport file. An XPT file can contain multiple members.
Missing Value
XPT uses special byte patterns for missing data. Standard missing is 0x2E (period). Special missing values .A-.Z and ._ are also supported.

N

NAMESTR
140-byte record in XPT files describing a variable’s metadata (name, label, format, type, length).
NDA (New Drug Application)
FDA submission type for new drugs.
NMPA (National Medical Products Administration)
Regulatory authority for China.

P

Pinnacle 21
Industry-standard validation tool for CDISC compliance. Checks XPT files and define.xml.
PMDA (Pharmaceuticals and Medical Devices Agency)
Regulatory authority for Japan.

S

SAS
Statistical Analysis System. Software that created the XPT format.
SAS Epoch
January 1, 1960. Reference date for SAS date values.
SDTM (Study Data Tabulation Model)
CDISC standard for tabulation data structure. Defines domains like DM, AE, LB, VS.
SEND (Standard for Exchange of Nonclinical Data)
CDISC standard for nonclinical (animal) study data.
Severity
Validation issue classification in xportrs: Error (blocks write), Warning (review recommended), Info (suggestion).

T

TCG (Technical Conformance Guide)
FDA document specifying electronic submission requirements.
TS-140
SAS Technical Note defining the XPT V5 format specification.

U

USUBJID (Unique Subject Identifier)
Standard SDTM variable uniquely identifying a subject across all datasets.

V

ValidatedWrite
In xportrs, a validated dataset ready to be written to a file.
VariableName
In xportrs, the 1-8 character variable identifier (e.g., “USUBJID”).
VariableRole
CDISC classification of variables: Identifier, Topic, Timing, Qualifier, Rule, Synonym, Record.
VariableSpec
Internal xportrs structure containing computed write specification for a variable.

X

XPT
SAS Transport file format. XPT V5 is required for regulatory submissions.
XPT V5
Version 5 of SAS Transport format (also called Version 5/6). Uses 8-byte variable names, IBM floating-point, and 80-byte records.
XPT V8
Newer SAS Transport format with longer names and IEEE floating-point. Not accepted for FDA submissions.

Changelog

All notable changes to xportrs are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Comprehensive mdbook documentation with mermaid diagrams
  • Full regulatory compliance documentation
  • API reference documentation
  • Usage guides and troubleshooting

[0.0.6] - 2026

Added

  • Format type with parsing: New Format struct for SAS format handling

    • Format::parse("DATE9.") - Parse format strings
    • Format::numeric(8, 2) - Create numeric formats
    • Format::character(200) - Create character formats
    • Format::from_namestr() - Reconstruct from XPT fields
  • Column metadata support: Extended Column struct

    • .with_label("...") - Set variable label
    • .with_format(Format) - Set display format
    • .with_format_str("DATE9.") - Parse and set format
    • .with_informat(Format) - Set input format
    • .with_length(n) - Set explicit character length
  • Metadata preservation on read: XPT files now preserve:

    • Variable labels from NAMESTR nlabel
    • Display formats from NAMESTR nform, nfl, nfd, nfj
    • Input formats from NAMESTR niform, nifl, nifd
    • Explicit character lengths
  • Validation warnings: New validation issues

    • MissingVariableLabel - Warning when label is empty
    • MissingDatasetLabel - Warning when dataset label is empty
    • InvalidFormatSyntax - Error for malformed format strings

Changed

  • Dataset::with_label() now takes impl Into<Label> instead of Option<impl Into<Label>>
  • Added Dataset::set_label() for conditional label setting
  • Format metadata now correctly written to NAMESTR records (previously hardcoded to 0)

Fixed

  • Format fields nfl, nfd, nfj, nifl, nifd now contain actual values instead of zeros
  • Metadata roundtrip: labels and formats preserved through read → write cycle

[0.0.5] - 2026

Added

  • CITATION.cff for academic citation
  • codemeta.json for metadata
  • JSON schema support

[0.0.4] - 2026

Added

  • Initial public release
  • XPT V5 reading and writing
  • FDA/PMDA/NMPA agency validation
  • Automatic file splitting at 5GB
  • IBM floating-point encoding/decoding
  • SAS epoch date handling

Features

  • Xpt::read() - Read XPT files
  • Xpt::write() - Write XPT files
  • Xpt::inspect() - Get file metadata
  • Dataset, Column, ColumnData types
  • Agency enum for regulatory validation
  • Issue and Severity for validation results

Migration Guide

From 0.0.5 to 0.0.6

Dataset::with_label signature change

#![allow(unused)]
fn main() {
// Before (0.0.5)
Dataset::with_label("AE", Some("Adverse Events"), columns)

// After (0.0.6)
Dataset::with_label("AE", "Adverse Events", columns)

// For conditional labels
let mut ds = Dataset::new("AE", columns)?;
if let Some(label) = maybe_label {
    ds.set_label(label);
}
}

Adding metadata to columns

#![allow(unused)]
fn main() {
// New in 0.0.6
Column::new("VAR", data)
    .with_label("Variable Label")
    .with_format(Format::character(200))
    .with_length(200)
}

Checking for warnings

#![allow(unused)]
fn main() {
// New warnings in 0.0.6
let validated = Xpt::writer(dataset).finalize()?;

for issue in validated.issues() {
    match issue {
        Issue::MissingVariableLabel { variable } => {
            println!("Warning: {} missing label", variable);
        }
        Issue::MissingDatasetLabel { dataset } => {
            println!("Warning: {} missing label", dataset);
        }
        _ => {}
    }
}
}

Compatibility

xportrs VersionRust VersionMSRV
0.0.61.70+1.70
0.0.51.70+1.70
0.0.41.70+1.70

License

xportrs is dual-licensed under MIT and Apache 2.0.

For Developers

This section contains information for contributors and developers working on xportrs.

Getting Started

Prerequisites

  • Rust 1.70 or later
  • Git

Clone and Build

git clone https://github.com/rubentalstra/xportrs.git
cd xportrs
cargo build

Run Tests

cargo test --all-features

Run Clippy

cargo clippy -- -D warnings

Project Structure

xportrs/
├── src/
│   ├── lib.rs              # Public API exports
│   ├── dataset/            # Dataset, Column, ColumnData
│   ├── schema/             # Schema derivation
│   ├── validate/           # Validation rules
│   ├── xpt/
│   │   └── v5/             # XPT V5 implementation
│   │       ├── read/       # Reading logic
│   │       └── write/      # Writing logic
│   ├── config/             # Configuration types
│   ├── error/              # Error types
│   └── metadata/           # Metadata types
├── tests/                  # Integration tests
├── docs/                   # mdbook documentation
└── benches/                # Benchmarks (if any)

Adding New Features

Adding a New Validation Rule

  1. Add variant to Issue enum in src/validate/issues.rs
  2. Implement severity() method for the new variant
  3. Implement Display for the new variant
  4. Add check in src/validate/checks_v5.rs
  5. Add tests

Adding a New Column Type

  1. Add variant to ColumnData enum in src/dataset/domain_dataset.rs
  2. Handle in reader (src/xpt/v5/read/reader.rs)
  3. Handle in writer (src/xpt/v5/write/writer.rs)
  4. Add From implementation
  5. Add tests

Supporting a New Agency

  1. Add variant to Agency enum
  2. Add agency-specific validation in src/validate/checks_v5.rs
  3. Document in regulatory section

Code Style

  • Follow Rust API Guidelines
  • Use cargo fmt before committing
  • Ensure cargo clippy -- -D warnings passes
  • Add doc comments for public items
  • Include examples in documentation

Testing

Unit Tests

Located alongside the code in mod tests blocks.

Integration Tests

Located in tests/ directory:

  • tests/v5/read.rs - Reading tests
  • tests/v5/write.rs - Writing tests
  • tests/api_guidelines.rs - API compliance tests

Test Data

Test XPT files are in tests/data/.

Documentation

Building Docs

cd docs
mdbook build

Serving Locally

cd docs
mdbook serve

Then open http://localhost:3000

Adding Pages

  1. Create .md file in appropriate directory
  2. Add entry to SUMMARY.md
  3. Use mermaid for diagrams

Pull Request Guidelines

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Ensure CI passes
  5. Submit PR with clear description

Release Process

  1. Update version in Cargo.toml
  2. Update CHANGELOG.md
  3. Create git tag
  4. CI publishes to crates.io

Contributors

Here is a list of the contributors who have helped to improve xportrs. Big shout-out to them!

  • @rubentalstra

if you feel you’re missing from this list, please open a pull request or issue to get added!