Introduction
xportrs is a Rust library for reading and writing SAS Transport (XPT) files, the standard format for regulatory submissions to the FDA, PMDA, and other health authorities.
Why xportrs?
Clinical trial data submitted to regulatory agencies must be in XPT V5 format. While SAS has traditionally been the tool of choice, modern data pipelines increasingly use Python, R, and Rust. xportrs provides:
- Full CDISC/FDA compliance — Correct NAMESTR structure, IBM floating-point encoding, and metadata handling
- Type safety — Rust’s type system prevents common errors at compile time
- Performance — Zero-copy parsing where possible, efficient memory usage
- Validation — Built-in checks for FDA, PMDA, and NMPA requirements
Quick Example
use xportrs::{Column, ColumnData, Dataset, Format, Xpt};
fn main() -> xportrs::Result<()> {
// Create a dataset with full CDISC metadata
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("STUDYID", ColumnData::String(vec![Some("ABC123".into())]))
.with_label("Study Identifier")
.with_format(Format::character(20)),
Column::new("USUBJID", ColumnData::String(vec![Some("ABC123-001".into())]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)]))
.with_label("Sequence Number")
.with_format(Format::numeric(8, 0)),
])?;
// Write with FDA validation
Xpt::writer(dataset)
.agency(xportrs::Agency::FDA)
.finalize()?
.write_path("ae.xpt")?;
Ok(())
}
Compliance Matrix
| Requirement | Status | Implementation |
|---|---|---|
| Variable names ≤8 bytes, uppercase | ✓ | Validated |
| Variable labels ≤40 bytes | ✓ | Validated |
| Dataset names ≤8 bytes | ✓ | Validated |
| Character length 1–200 bytes | ✓ | Validated |
| Numeric = 8 bytes IBM float | ✓ | Enforced |
| ASCII-only for FDA | ✓ | Agency rules |
| File splitting at 5GB | ✓ | Automatic |
| SAS epoch (1960) dates | ✓ | Handled |
Installation
Add to your Cargo.toml:
[dependencies]
xportrs = "0.0.8"
Next Steps
- Quick Start Guide — Get up and running in 5 minutes
- FDA Submission Workflow — Complete walkthrough for regulatory submissions
- API Reference — Detailed API documentation
- XPT Format Specification — Understanding the file format
Quick Start
Get up and running with xportrs in 5 minutes.
Installation
Add xportrs to your Cargo.toml:
[dependencies]
xportrs = "0.0.8"
Reading an XPT File
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read an XPT file
let dataset = Xpt::read("ae.xpt")?;
// Basic info
println!("Domain: {}", dataset.domain_code());
println!("Rows: {}", dataset.nrows());
println!("Columns: {}", dataset.ncols());
// List columns
for col in dataset.columns() {
println!(" - {}", col.name());
}
Ok(())
}
Creating a Dataset
use xportrs::{Column, ColumnData, Dataset};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
Column::new("USUBJID", ColumnData::String(vec![
Some("001".into()),
Some("002".into()),
Some("003".into()),
])),
Column::new("AESEQ", ColumnData::F64(vec![
Some(1.0),
Some(1.0),
Some(2.0),
])),
Column::new("AETERM", ColumnData::String(vec![
Some("HEADACHE".into()),
Some("NAUSEA".into()),
Some("FATIGUE".into()),
])),
])?;
println!("Created {} with {} rows", dataset.domain_code(), dataset.nrows());
Ok(())
}
Writing an XPT File
use xportrs::{Column, ColumnData, Dataset, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;
// Write to file
Xpt::writer(dataset)
.finalize()?
.write_path("ae_output.xpt")?;
println!("Wrote ae_output.xpt");
Ok(())
}
Adding Metadata
For regulatory submissions, include metadata:
use xportrs::{Column, ColumnData, Dataset, Format, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)]))
.with_label("Sequence Number")
.with_format(Format::numeric(8, 0)),
Column::new("AETERM", ColumnData::String(vec![Some("HEADACHE".into())]))
.with_label("Reported Term for the Adverse Event")
.with_format(Format::character(200))
.with_length(200),
])?;
Xpt::writer(dataset)
.finalize()?
.write_path("ae_metadata.xpt")?;
Ok(())
}
FDA Validation
Validate for FDA submission:
use xportrs::{Agency, Column, ColumnData, Dataset, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
])?;
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
// Check for issues
if validated.has_errors() {
eprintln!("Validation errors:");
for issue in validated.issues() {
eprintln!(" {}", issue);
}
return Ok(());
}
if validated.has_warnings() {
println!("Warnings (proceeding anyway):");
for issue in validated.issues() {
println!(" {}", issue);
}
}
validated.write_path("ae.xpt")?;
Ok(())
}
Round-Trip (Read → Modify → Write)
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read existing file
let dataset = Xpt::read("ae.xpt")?;
// Modify (example: add column)
// dataset.extend([new_column]);
// Write back
Xpt::writer(dataset)
.finalize()?
.write_path("ae_modified.xpt")?;
Ok(())
}
Common Patterns
Using From Conversions
use xportrs::{Column, ColumnData, Dataset};
fn main() -> xportrs::Result<()> {
// Simpler syntax with From implementations
let dataset = Dataset::new("LB", vec![
Column::new("LBSEQ", vec![1.0, 2.0, 3.0].into()), // Vec<f64> → ColumnData
Column::new("LBTEST", vec!["HGB", "WBC", "PLT"].into()), // Vec<&str> → ColumnData
])?;
Ok(())
}
Accessing Column Data
use xportrs::{ColumnData, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;
// By name
let col = &dataset["USUBJID"];
// Match on data type
match col.data() {
ColumnData::String(values) => {
for (i, val) in values.iter().enumerate() {
match val {
Some(s) => println!("Row {}: {}", i, s),
None => println!("Row {}: <missing>", i),
}
}
}
ColumnData::F64(values) => {
for (i, val) in values.iter().enumerate() {
match val {
Some(n) => println!("Row {}: {}", i, n),
None => println!("Row {}: <missing>", i),
}
}
}
_ => {}
}
Ok(())
}
Handling Errors
use xportrs::{Error, Xpt};
match Xpt::read("missing.xpt") {
Ok(dataset) => println!("Loaded"),
Err(Error::Io(e)) => eprintln!("File error: {}", e),
Err(e) => eprintln!("Error: {}", e),
}
Next Steps
- FDA Submission Workflow — Complete FDA submission guide
- API Reference — Full API documentation
- XPT Format — Understanding the file format
- Validation — Validation rules and handling
FDA Submission Workflow
This guide walks through creating FDA-compliant XPT files for regulatory submissions.
Prerequisites
- Understanding of CDISC SDTM/ADaM standards
- Access to define.xml for your study
- Clinical trial data in a structured format
Step 1: Design Your Dataset
Plan your dataset structure based on SDTM/ADaM:
// Example: Adverse Events (AE) domain
// Required SDTM variables: STUDYID, DOMAIN, USUBJID, AESEQ, AETERM, ...
use xportrs::{Column, ColumnData, Dataset, Format, VariableRole};
Step 2: Create the Dataset with Full Metadata
use xportrs::{Column, ColumnData, Dataset, Format, VariableRole};
struct YourDataSource { studyid: Vec<Option<String>>, usubjid: Vec<Option<String>>, aeseq: Vec<Option<f64>>, aeterm: Vec<Option<String>>, aedecod: Vec<Option<String>>, aesev: Vec<Option<String>>, aestdtc: Vec<Option<String>>, aeendtc: Vec<Option<String>> }
impl YourDataSource { fn len(&self) -> usize { self.studyid.len() } }
fn create_ae_dataset(data: &YourDataSource) -> xportrs::Result<Dataset> {
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
// Identifier variables
Column::with_role(
"STUDYID",
VariableRole::Identifier,
ColumnData::String(data.studyid.clone()),
)
.with_label("Study Identifier")
.with_format(Format::character(20)),
Column::new("DOMAIN", ColumnData::String(
vec![Some("AE".into()); data.len()]
))
.with_label("Domain Abbreviation")
.with_format(Format::character(2))
.with_length(2),
Column::with_role(
"USUBJID",
VariableRole::Identifier,
ColumnData::String(data.usubjid.clone()),
)
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::with_role(
"AESEQ",
VariableRole::Topic,
ColumnData::F64(data.aeseq.clone()),
)
.with_label("Sequence Number")
.with_format(Format::numeric(8, 0)),
// Qualifier variables
Column::with_role(
"AETERM",
VariableRole::Qualifier,
ColumnData::String(data.aeterm.clone()),
)
.with_label("Reported Term for the Adverse Event")
.with_format(Format::character(200))
.with_length(200),
Column::new("AEDECOD", ColumnData::String(data.aedecod.clone()))
.with_label("Dictionary-Derived Term")
.with_format(Format::character(200))
.with_length(200),
Column::new("AESEV", ColumnData::String(data.aesev.clone()))
.with_label("Severity/Intensity")
.with_format(Format::character(10))
.with_length(10),
// Timing variables
Column::with_role(
"AESTDTC",
VariableRole::Timing,
ColumnData::String(data.aestdtc.clone()),
)
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19),
Column::new("AEENDTC", ColumnData::String(data.aeendtc.clone()))
.with_label("End Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19),
])?;
Ok(dataset)
}
Step 3: Validate for FDA Compliance
use xportrs::{Agency, Dataset, Severity, Xpt};
fn validate_for_fda(dataset: Dataset) -> xportrs::Result<xportrs::ValidatedWrite> {
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
// Report all issues
println!("Validation Results:");
println!(" Errors: {}", validated.issues().iter()
.filter(|i| i.severity() == Severity::Error).count());
println!(" Warnings: {}", validated.issues().iter()
.filter(|i| i.severity() == Severity::Warning).count());
// Detail issues
for issue in validated.issues() {
let prefix = match issue.severity() {
Severity::Error => "ERROR",
Severity::Warning => "WARN",
Severity::Info => "INFO",
};
println!(" [{}] {}: {}", prefix, issue.target(), issue);
}
// Fail on errors
if validated.has_errors() {
return Err(xportrs::Error::invalid_data(
"FDA validation failed with errors"
));
}
Ok(validated)
}
Step 4: Write the XPT File
use std::path::Path;
fn write_submission_file(
validated: xportrs::ValidatedWrite,
output_dir: &Path,
) -> xportrs::Result<()> {
let output_path = output_dir.join("ae.xpt");
// Write (may split if >5GB)
let paths = validated.write_path(&output_path)?;
for path in &paths {
println!("Wrote: {}", path.display());
// Verify file size
let size = std::fs::metadata(path)?.len();
println!(" Size: {} bytes ({:.2} GB)",
size, size as f64 / 1_073_741_824.0);
}
Ok(())
}
Step 5: Verify the Output
use xportrs::Xpt;
fn verify_output(path: &str) -> xportrs::Result<()> {
// Read back
let dataset = Xpt::read(path)?;
// Verify structure
println!("\nVerification:");
println!(" Domain: {}", dataset.domain_code());
println!(" Label: {:?}", dataset.dataset_label());
println!(" Rows: {}", dataset.nrows());
println!(" Columns: {}", dataset.ncols());
// Check metadata preserved
for col in dataset.columns() {
print!(" {} ", col.name());
if col.label().is_some() { print!("[label] "); }
if col.format().is_some() { print!("[format] "); }
if col.explicit_length().is_some() { print!("[length] "); }
println!();
}
Ok(())
}
Complete Example
use xportrs::{Agency, Column, ColumnData, Dataset, Format, Severity, Xpt};
use std::path::PathBuf;
fn main() -> xportrs::Result<()> {
fn create_submission() -> xportrs::Result<()> {
// 1. Create dataset
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("STUDYID", ColumnData::String(vec![
Some("ABC-123".into()),
Some("ABC-123".into()),
]))
.with_label("Study Identifier")
.with_format(Format::character(20)),
Column::new("DOMAIN", ColumnData::String(vec![
Some("AE".into()),
Some("AE".into()),
]))
.with_label("Domain Abbreviation")
.with_format(Format::character(2)),
Column::new("USUBJID", ColumnData::String(vec![
Some("ABC-123-001".into()),
Some("ABC-123-002".into()),
]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::new("AESEQ", ColumnData::F64(vec![
Some(1.0),
Some(1.0),
]))
.with_label("Sequence Number"),
Column::new("AETERM", ColumnData::String(vec![
Some("HEADACHE".into()),
Some("NAUSEA".into()),
]))
.with_label("Reported Term for the Adverse Event")
.with_format(Format::character(200))
.with_length(200),
Column::new("AESTDTC", ColumnData::String(vec![
Some("2024-01-15".into()),
Some("2024-01-16".into()),
]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19)),
])?;
// 2. Validate for FDA
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
// 3. Report issues
if !validated.issues().is_empty() {
println!("Validation Issues:");
for issue in validated.issues() {
println!(" [{}] {}", issue.severity(), issue);
}
}
// 4. Check for blocking errors
if validated.has_errors() {
eprintln!("Cannot proceed due to validation errors");
return Err(xportrs::Error::invalid_data("Validation failed"));
}
// 5. Write file
let output = PathBuf::from("output/ae.xpt");
std::fs::create_dir_all(output.parent().unwrap())?;
validated.write_path(&output)?;
// 6. Verify
let loaded = Xpt::read(&output)?;
assert_eq!(loaded.domain_code(), "AE");
assert_eq!(loaded.nrows(), 2);
println!("\nSuccessfully created ae.xpt for FDA submission");
Ok(())
}
Ok(())
}
Checklist
Before submission, verify:
- Dataset name ≤8 characters, uppercase
- Variable names ≤8 characters, uppercase
- Variable labels ≤40 characters, ASCII only
- Character variables ≤200 bytes
- All variables have labels
- Dataset has a label
- File size ≤5GB (or properly split)
- Pinnacle 21 validation passed
- Labels match define.xml
Common Issues
Missing Labels
[WARN] MYVAR: Variable 'MYVAR' is missing a label
Fix: Add .with_label("...") to all columns.
Non-ASCII Characters
[ERROR] AETERM: Variable label contains non-ASCII characters
Fix: Replace accented characters (é→e, ñ→n) and special symbols.
Variable Name Too Long
[ERROR] MYLONGNAME: Variable name exceeds 8 bytes
Fix: Shorten variable names to ≤8 characters.
Next Steps
- Run Pinnacle 21 validation on generated files
- Verify define.xml consistency
- Package with eCTD structure
Read-Modify-Write Workflows
This guide covers common patterns for reading, modifying, and writing XPT files.
Basic Roundtrip
use xportrs::Xpt;
fn basic_roundtrip(input: &str, output: &str) -> xportrs::Result<()> {
// Read
let dataset = Xpt::read(input)?;
// (Modify here if needed)
// Write
Xpt::writer(dataset)
.finalize()?
.write_path(output)?;
Ok(())
}
Preserving Metadata
xportrs automatically preserves metadata during roundtrip:
use xportrs::Xpt;
fn verify_metadata_preservation(path: &str) -> xportrs::Result<()> {
// Read original
let original = Xpt::read(path)?;
// Write to temp
let temp_path = "/tmp/roundtrip.xpt";
Xpt::writer(original.clone())
.finalize()?
.write_path(temp_path)?;
// Read back
let reloaded = Xpt::read(temp_path)?;
// Verify metadata preserved
assert_eq!(original.domain_code(), reloaded.domain_code());
assert_eq!(original.dataset_label(), reloaded.dataset_label());
for (orig_col, new_col) in original.columns().iter()
.zip(reloaded.columns().iter())
{
assert_eq!(orig_col.name(), new_col.name());
assert_eq!(
orig_col.label().map(|l| l.to_string()),
new_col.label().map(|l| l.to_string())
);
// Format, length, etc. also preserved
}
Ok(())
}
Adding Columns
use xportrs::{Column, ColumnData, Format, Xpt};
fn add_derived_column(input: &str, output: &str) -> xportrs::Result<()> {
let mut dataset = Xpt::read(input)?;
// Get row count
let nrows = dataset.nrows();
// Create new column
let new_column = Column::new(
"DERIVED",
ColumnData::F64(vec![Some(1.0); nrows]),
)
.with_label("Derived Variable")
.with_format(Format::numeric(8, 0));
// Add to dataset
dataset.extend([new_column]);
// Write
Xpt::writer(dataset)
.finalize()?
.write_path(output)?;
Ok(())
}
Modifying Column Data
use xportrs::{Column, ColumnData, Xpt};
fn modify_column_data(input: &str, output: &str) -> xportrs::Result<()> {
let dataset = Xpt::read(input)?;
// Create modified columns
let modified_columns: Vec<Column> = dataset.columns().iter()
.map(|col| {
if col.name() == "AESEQ" {
// Modify AESEQ: multiply by 10
if let ColumnData::F64(values) = col.data() {
let new_values: Vec<Option<f64>> = values.iter()
.map(|v| v.map(|x| x * 10.0))
.collect();
let mut new_col = Column::new(col.name(), ColumnData::F64(new_values));
// Preserve metadata
if let Some(label) = col.label() {
new_col = new_col.with_label(label.to_string());
}
if let Some(format) = col.format() {
new_col = new_col.with_format(format.clone());
}
return new_col;
}
}
col.clone()
})
.collect();
// Create new dataset with modified columns
let mut new_dataset = xportrs::Dataset::new(
dataset.domain_code(),
modified_columns,
)?;
if let Some(label) = dataset.dataset_label() {
new_dataset.set_label(label);
}
Xpt::writer(new_dataset)
.finalize()?
.write_path(output)?;
Ok(())
}
Filtering Rows
use xportrs::{Column, ColumnData, Dataset, Xpt};
fn filter_rows(input: &str, output: &str, keep_indices: &[usize]) -> xportrs::Result<()> {
let dataset = Xpt::read(input)?;
// Filter each column
let filtered_columns: Vec<Column> = dataset.columns().iter()
.map(|col| {
let filtered_data = match col.data() {
ColumnData::F64(values) => {
let filtered: Vec<_> = keep_indices.iter()
.map(|&i| values[i].clone())
.collect();
ColumnData::F64(filtered)
}
ColumnData::String(values) => {
let filtered: Vec<_> = keep_indices.iter()
.map(|&i| values[i].clone())
.collect();
ColumnData::String(filtered)
}
// Handle other types...
_ => col.data().clone(),
};
let mut new_col = Column::new(col.name(), filtered_data);
if let Some(label) = col.label() {
new_col = new_col.with_label(label.to_string());
}
if let Some(format) = col.format() {
new_col = new_col.with_format(format.clone());
}
new_col
})
.collect();
let mut filtered_dataset = Dataset::new(
dataset.domain_code(),
filtered_columns,
)?;
if let Some(label) = dataset.dataset_label() {
filtered_dataset.set_label(label);
}
Xpt::writer(filtered_dataset)
.finalize()?
.write_path(output)?;
Ok(())
}
Merging Datasets
use xportrs::{Column, ColumnData, Dataset, Xpt};
fn merge_datasets(input1: &str, input2: &str, output: &str) -> xportrs::Result<()> {
let ds1 = Xpt::read(input1)?;
let ds2 = Xpt::read(input2)?;
// Verify same structure
assert_eq!(ds1.ncols(), ds2.ncols(), "Column count mismatch");
// Concatenate data
let merged_columns: Vec<Column> = ds1.columns().iter()
.zip(ds2.columns().iter())
.map(|(col1, col2)| {
let merged_data = match (col1.data(), col2.data()) {
(ColumnData::F64(v1), ColumnData::F64(v2)) => {
let mut merged = v1.clone();
merged.extend(v2.clone());
ColumnData::F64(merged)
}
(ColumnData::String(v1), ColumnData::String(v2)) => {
let mut merged = v1.clone();
merged.extend(v2.clone());
ColumnData::String(merged)
}
_ => panic!("Type mismatch"),
};
let mut col = Column::new(col1.name(), merged_data);
if let Some(label) = col1.label() {
col = col.with_label(label.to_string());
}
if let Some(format) = col1.format() {
col = col.with_format(format.clone());
}
col
})
.collect();
let mut merged = Dataset::new(ds1.domain_code(), merged_columns)?;
if let Some(label) = ds1.dataset_label() {
merged.set_label(label);
}
Xpt::writer(merged)
.finalize()?
.write_path(output)?;
Ok(())
}
Updating Labels
use xportrs::{Column, Dataset, Xpt};
use std::collections::HashMap;
fn update_labels(
input: &str,
output: &str,
label_updates: &HashMap<&str, &str>,
) -> xportrs::Result<()> {
let dataset = Xpt::read(input)?;
let updated_columns: Vec<Column> = dataset.columns().iter()
.map(|col| {
let mut new_col = Column::new(col.name(), col.data().clone());
// Apply label update if specified
if let Some(&new_label) = label_updates.get(col.name()) {
new_col = new_col.with_label(new_label);
} else if let Some(label) = col.label() {
new_col = new_col.with_label(label.to_string());
}
if let Some(format) = col.format() {
new_col = new_col.with_format(format.clone());
}
new_col
})
.collect();
let mut updated = Dataset::new(dataset.domain_code(), updated_columns)?;
if let Some(label) = dataset.dataset_label() {
updated.set_label(label);
}
Xpt::writer(updated)
.finalize()?
.write_path(output)?;
Ok(())
}
// Usage
fn main() -> xportrs::Result<()> {
let mut updates = HashMap::new();
updates.insert("USUBJID", "Unique Subject Identifier");
updates.insert("AETERM", "Reported Adverse Event Term");
update_labels("ae.xpt", "ae_updated.xpt", &updates)
}
Batch Processing
use xportrs::Xpt;
use std::path::Path;
fn process_directory(input_dir: &Path, output_dir: &Path) -> xportrs::Result<()> {
std::fs::create_dir_all(output_dir)?;
for entry in std::fs::read_dir(input_dir)? {
let entry = entry?;
let path = entry.path();
if path.extension().map_or(false, |e| e == "xpt") {
let filename = path.file_name().unwrap();
let output_path = output_dir.join(filename);
println!("Processing: {}", path.display());
let dataset = Xpt::read(&path)?;
// Process...
Xpt::writer(dataset)
.finalize()?
.write_path(&output_path)?;
println!(" Wrote: {}", output_path.display());
}
}
Ok(())
}
Error Handling in Roundtrips
use xportrs::{Error, Xpt};
fn safe_roundtrip(input: &str, output: &str) -> Result<(), Box<dyn std::error::Error>> {
// Read with error handling
let dataset = match Xpt::read(input) {
Ok(ds) => ds,
Err(Error::Io(e)) => {
eprintln!("Failed to read {}: {}", input, e);
return Err(e.into());
}
Err(e) => return Err(e.into()),
};
// Validate
let validated = Xpt::writer(dataset).finalize()?;
if validated.has_errors() {
for issue in validated.issues() {
eprintln!("Validation error: {}", issue);
}
return Err("Validation failed".into());
}
// Write
validated.write_path(output)?;
// Verify
let _ = Xpt::read(output)?;
Ok(())
}
Troubleshooting
This guide covers common issues and their solutions when working with xportrs.
Validation Errors
Variable Name Too Long
[ERROR] MYLONGVARNAME: Variable name exceeds 8 bytes
Cause: XPT V5 limits variable names to 8 bytes.
Solution: Shorten the variable name to ≤8 characters.
use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("MYLONGVARNAME", data.clone());
// Correct
Column::new("MYVAR", data);
Variable Label Too Long
[ERROR] USUBJID: Variable label exceeds 40 bytes
Cause: XPT V5 limits labels to 40 bytes.
Solution: Shorten the label.
use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong (41 characters)
Column::new("VAR", data.clone())
.with_label("This is a very long label that exceeds 40");
// Correct (40 characters max)
Column::new("VAR", data)
.with_label("Unique Subject Identifier");
Non-ASCII Characters (FDA)
[ERROR] AETERM: Variable label contains non-ASCII characters
Cause: FDA requires ASCII-only text.
Solution: Replace non-ASCII characters.
use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("VAR", data.clone())
.with_label("Événement indésirable");
// Correct
Column::new("VAR", data)
.with_label("Adverse Event");
// Or use a helper function
fn to_ascii(s: &str) -> String {
s.chars().map(|c| match c {
'é' | 'è' | 'ê' | 'ë' => 'e',
'à' | 'â' | 'ä' => 'a',
// ... more mappings
c if c.is_ascii() => c,
_ => '?',
}).collect()
}
Column Length Mismatch
Error: Column length mismatch: expected 100, got 99
Cause: Columns have different numbers of rows.
Solution: Ensure all columns have the same length.
use xportrs::{Column, ColumnData, Dataset};
// Wrong
Dataset::new("AE", vec![
Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])), // 2 rows
Column::new("B", ColumnData::F64(vec![Some(1.0)])), // 1 row!
]);
// Correct - same length
Dataset::new("AE", vec![
Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
Column::new("B", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
]);
Warnings
Missing Variable Label
[WARN] MYVAR: Variable 'MYVAR' is missing a label
Cause: Variable has no label defined.
Solution: Add a label.
use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
Column::new("MYVAR", data)
.with_label("My Variable Description");
Missing Dataset Label
[WARN] AE: Dataset is missing a label
Cause: Dataset has no label defined.
Solution: Use with_label or set_label.
use xportrs::{Column, ColumnData, Dataset};
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// At construction
Dataset::with_label("AE", "Adverse Events", columns.clone());
// Or after
let mut ds = Dataset::new("AE", columns)?;
ds.set_label("Adverse Events");
Reading Errors
File Not Found
Error: No such file or directory (os error 2)
Solution: Verify the file path exists.
use std::path::Path;
let path = "data.xpt";
if !Path::new(path).exists() {
eprintln!("File not found: {}", path);
}
Invalid XPT Format
Error: Invalid header record
Cause: File is not a valid XPT V5 file.
Solution: Verify the file:
- Check it’s an XPT file (not XPT V8, SAS7BDAT, etc.)
- Ensure it’s not corrupted
- Verify with hex dump that it starts with
HEADER RECORD
# Check file header
xxd -l 80 suspect.xpt
Member Not Found
Error: MemberNotFound { domain_code: "XX" }
Cause: Requested member doesn’t exist in the file.
Solution: Check available members.
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let info = Xpt::inspect("multi.xpt")?;
for name in info.member_names() {
println!("Available: {}", name);
}
Ok(())
}
Writing Errors
Write Permission Denied
Error: Permission denied (os error 13)
Solution: Check file/directory permissions.
use std::fs;
let dir = "/output";
fs::create_dir_all(dir)?; // Create if missing
// Check write permission
let test_file = format!("{}/test.tmp", dir);
match fs::write(&test_file, "test") {
Ok(_) => { fs::remove_file(&test_file)?; }
Err(e) => eprintln!("Cannot write to {}: {}", dir, e),
}
Disk Full
Error: No space left on device (os error 28)
Solution: Free disk space or write to a different location.
Data Issues
Precision Loss
// Original: 3.141592653589793
// After roundtrip: 3.141592653589792
Cause: IBM floating-point has slightly less precision than IEEE 754.
Solution: For critical values, store as strings or accept minor precision loss (~14-16 digits).
use xportrs::{Column, ColumnData};
// Store as string for exact preservation
Column::new("EXACTVAL", ColumnData::String(vec![
Some("3.141592653589793".into()),
]));
Missing Values Handling
use xportrs::ColumnData;
let col_data = ColumnData::F64(vec![Some(1.0), None]);
// Check for missing values
if let ColumnData::F64(values) = &col_data {
for (i, val) in values.iter().enumerate() {
if val.is_none() {
println!("Row {} is missing", i);
}
}
}
Format Issues
Invalid Format String
Error: Invalid format syntax: "DATE"
Cause: Format string missing trailing period.
Solution: SAS formats end with a period.
use xportrs::Format;
// Wrong
Format::parse("DATE9");
// Correct
Format::parse("DATE9.");
Format Not Preserved
Cause: Format might not be written if name is empty.
Solution: Use named formats.
use xportrs::Format;
// May not be preserved (bare numeric format)
Format::parse("8.2");
// Will be preserved (named format)
Format::parse("BEST12.");
Format::parse("DATE9.");
Format::character(200);
Performance Issues
Slow Reading Large Files
Solution: Use row limiting for previews.
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Preview first 100 rows
let preview = Xpt::reader("large.xpt")
.row_limit(100)
.read()?;
Ok(())
}
Memory Usage
Solution: Process in chunks for very large datasets.
use xportrs::{Dataset, Xpt};
fn process(_ds: &Dataset) {}
fn main() -> xportrs::Result<()> {
// Read, process, and release
{
let dataset = Xpt::read("chunk1.xpt")?;
process(&dataset);
} // dataset dropped, memory freed
{
let dataset = Xpt::read("chunk2.xpt")?;
process(&dataset);
}
Ok(())
}
Pinnacle 21 Validation Failures
SD0063: Label Mismatch
Cause: XPT label doesn’t match define.xml.
Solution: Ensure labels are consistent.
use xportrs::{Column, ColumnData};
let data = ColumnData::String(vec![Some("001".into())]);
// Label should match define.xml exactly
Column::new("USUBJID", data)
.with_label("Unique Subject Identifier"); // As in define.xml
SD1001: Variable Name Invalid
Cause: Variable name doesn’t follow SAS naming rules.
Solution: Use uppercase, alphanumeric, start with letter.
use xportrs::{Column, ColumnData};
let data = ColumnData::F64(vec![Some(1.0)]);
// Wrong
Column::new("1stVar", data.clone()); // Starts with number
Column::new("my-var", data.clone()); // Contains hyphen
// Correct
Column::new("FIRSTVAR", data.clone());
Column::new("MYVAR", data);
Getting Help
If you encounter issues not covered here:
- Check the API documentation
- Review the XPT format specification
- Open an issue on GitHub
When reporting issues, include:
- xportrs version
- Rust version
- Minimal code to reproduce
- Error messages
- Sample data (if not confidential)
Dataset and Column
The Dataset and Column types are the core data structures in xportrs for representing XPT datasets.
Dataset
A Dataset represents a single SAS dataset (domain) with columns of data.
Creating a Dataset
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
// Basic creation
let dataset = Dataset::new("AE", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;
// With dataset label
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;
Ok(())
}
Dataset Properties
// Domain code (dataset name)
let code: &str = dataset.domain_code();
// Dataset label (optional)
let label: Option<&str> = dataset.dataset_label();
// Dimensions
let rows: usize = dataset.nrows();
let cols: usize = dataset.ncols();
// Access columns
let columns: &[Column] = dataset.columns();
Setting the Label
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// Using with_label at construction
let dataset = Dataset::with_label("AE", "Adverse Events", columns.clone())?;
// Or set later
let mut dataset = Dataset::new("AE", columns)?;
dataset.set_label("Adverse Events");
Ok(())
}
Accessing Columns
// By index
let first_col: &Column = &dataset[0];
// By name
let usubjid: &Column = &dataset["USUBJID"];
// Find column (returns Option)
let col: Option<&Column> = dataset.column("AESEQ");
Iterating
// Iterate over columns
for col in dataset.iter() {
println!("{}: {}", col.name(), col.len());
}
// Column names only
for name in dataset.column_names() {
println!("{}", name);
}
// Consuming iterator
for col in dataset {
// col is owned Column
}
Extending a Dataset
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let mut dataset = Dataset::new("AE", vec![
Column::new("A", ColumnData::F64(vec![Some(1.0)])),
])?;
// Add more columns
dataset.extend([
Column::new("B", ColumnData::F64(vec![Some(2.0)])),
Column::new("C", ColumnData::F64(vec![Some(3.0)])),
]);
assert_eq!(dataset.ncols(), 3);
Ok(())
}
Column
A Column represents a single variable with its data and metadata.
Creating a Column
use xportrs::{Column, ColumnData, Format, VariableRole};
fn main() {
// Basic column
let col = Column::new("USUBJID", ColumnData::String(vec![
Some("001".into()),
Some("002".into()),
]));
// With full metadata
let col = Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19);
// With role
let col = Column::with_role(
"USUBJID",
VariableRole::Identifier,
ColumnData::String(vec![Some("001".into())]),
);
}
Column Properties
// Name
let name: &str = col.name();
// Label (optional)
let label: Option<&xportrs::Label> = col.label();
// Data
let data: &ColumnData = col.data();
// Length
let len: usize = col.len();
// Explicit length override
let explicit_len: Option<usize> = col.explicit_length();
// Role
let role: Option<VariableRole> = col.role();
// Format
let format: Option<&Format> = col.format();
// Informat
let informat: Option<&Format> = col.informat();
Builder Methods
use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("VAR", data)
.with_label("Variable Label")
.with_format(Format::numeric(8, 2))
.with_informat(Format::numeric(8, 2))
.with_length(200);
// Parse format from string
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("DATE", data)
.with_format_str("DATE9.")?;
Ok(())
}
ColumnData
ColumnData is an enum representing the typed data within a column.
Variants
use xportrs::ColumnData;
fn main() {
// Floating-point numbers
let floats = ColumnData::F64(vec![Some(1.0), Some(2.0), None]);
// Integers (converted to f64 on write)
let ints = ColumnData::I64(vec![Some(1), Some(2), None]);
// Booleans (converted to f64: 1.0/0.0)
let bools = ColumnData::Bool(vec![Some(true), Some(false), None]);
// Strings
let strings = ColumnData::String(vec![Some("hello".into()), None]);
// Binary data
let bytes = ColumnData::Bytes(vec![Some(vec![0x01, 0x02]), None]);
}
From Conversions
use xportrs::ColumnData;
fn main() {
// From Vec<f64>
let data: ColumnData = vec![1.0, 2.0, 3.0].into();
// From Vec<&str>
let data: ColumnData = vec!["a", "b", "c"].into();
// From Vec<String>
let data: ColumnData = vec!["a".to_string(), "b".to_string()].into();
// From Vec<i64>
let data: ColumnData = vec![1i64, 2, 3].into();
// From Vec<bool>
let data: ColumnData = vec![true, false, true].into();
}
Accessing Data
match col.data() {
ColumnData::F64(values) => {
for value in values {
match value {
Some(v) => println!("Value: {}", v),
None => println!("Missing"),
}
}
}
ColumnData::String(values) => {
for value in values {
if let Some(s) = value {
println!("Value: {}", s);
}
}
}
// ... handle other variants
_ => {}
}
Common Traits
Both Dataset and Column implement standard Rust traits:
use xportrs::{Dataset, Column};
// Clone
let dataset2 = dataset.clone();
let col2 = col.clone();
// Debug
println!("{:?}", dataset);
println!("{:?}", col);
// Display
println!("{}", dataset); // "AE (10 rows, 5 cols)"
println!("{}", col); // "USUBJID: String[10]"
// PartialEq
assert_eq!(dataset1, dataset2);
assert_eq!(col1, col2);
// Send + Sync (thread-safe)
std::thread::spawn(move || {
println!("{}", dataset.nrows());
});
Error Handling
Dataset creation can fail:
use xportrs::{Dataset, Column, ColumnData};
fn main() {
// Column length mismatch
let result = Dataset::new("AE", vec![
Column::new("A", ColumnData::F64(vec![Some(1.0), Some(2.0)])),
Column::new("B", ColumnData::F64(vec![Some(1.0)])), // Different length!
]);
match result {
Ok(ds) => println!("Created dataset"),
Err(e) => eprintln!("Error: {}", e),
}
}
Example: Complete Dataset
use xportrs::{Column, ColumnData, Dataset, Format, VariableRole, Xpt};
fn create_ae_dataset() -> xportrs::Result<Dataset> {
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::with_role(
"STUDYID",
VariableRole::Identifier,
ColumnData::String(vec![Some("ABC-123".into())]),
)
.with_label("Study Identifier")
.with_format(Format::character(20)),
Column::with_role(
"USUBJID",
VariableRole::Identifier,
ColumnData::String(vec![Some("ABC-123-001".into())]),
)
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::with_role(
"AESEQ",
VariableRole::Topic,
ColumnData::F64(vec![Some(1.0)]),
)
.with_label("Sequence Number")
.with_format(Format::numeric(8, 0)),
Column::new("AETERM", ColumnData::String(vec![Some("HEADACHE".into())]))
.with_label("Reported Term for the Adverse Event")
.with_format(Format::character(200))
.with_length(200),
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19),
])?;
Ok(dataset)
}
fn main() { let _ = create_ae_dataset(); }
Format Type
The Format type represents a SAS display format or informat. It provides parsing and construction of format specifications.
Overview
SAS formats control how values are displayed or read:
| Format | Description | Example Output |
|---|---|---|
DATE9. | Date format | 15JAN2024 |
8.2 | Numeric with decimals | 123.45 |
$CHAR200. | Character format | Hello World |
BEST12. | Best numeric representation | 123456789012 |
Creating Formats
Parsing from String
use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Date format
let date_fmt = Format::parse("DATE9.")?;
assert_eq!(date_fmt.name(), "DATE");
assert_eq!(date_fmt.length(), 9);
// Numeric format with decimals
let num_fmt = Format::parse("8.2")?;
assert_eq!(num_fmt.name(), "");
assert_eq!(num_fmt.length(), 8);
assert_eq!(num_fmt.decimals(), 2);
// Character format
let char_fmt = Format::parse("$CHAR200.")?;
assert_eq!(char_fmt.name(), "$CHAR");
assert_eq!(char_fmt.length(), 200);
assert!(char_fmt.is_character());
Ok(())
}
Using Constructors
use xportrs::Format;
// Numeric format
let num = Format::numeric(8, 2);
assert_eq!(num.length(), 8);
assert_eq!(num.decimals(), 2);
// Character format
let char_fmt = Format::character(200);
assert_eq!(char_fmt.name(), "$CHAR");
assert_eq!(char_fmt.length(), 200);
From NAMESTR Fields
When reading XPT files, formats are reconstructed from NAMESTR fields:
use xportrs::Format;
// Reconstruct from XPT fields
let format = Format::from_namestr(
"DATE ", // nform (8 bytes, space-padded)
9, // nfl (format length)
0, // nfd (format decimals)
1, // nfj (justification: 0=left, 1=right)
);
assert_eq!(format.name(), "DATE");
assert_eq!(format.length(), 9);
Format Properties
use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("$CHAR200.")?;
// Format name (may include $ prefix)
let name: &str = format.name(); // "$CHAR"
// Name without $ prefix
let stripped: &str = format.name_without_prefix(); // "CHAR"
// Total display width
let length: usize = format.length(); // 200
// Decimal places
let decimals: usize = format.decimals(); // 0
// Is it a character format?
let is_char: bool = format.is_character(); // true
// Display representation
println!("{}", format); // "$CHAR200."
Ok(())
}
Common Format Patterns
Date Formats
use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Standard date formats
let date9 = Format::parse("DATE9.")?; // 15JAN2024
let date7 = Format::parse("DATE7.")?; // 15JAN24
let yymmdd = Format::parse("YYMMDD10.")?; // 2024-01-15
let e8601 = Format::parse("E8601DA10.")?; // 2024-01-15
Ok(())
}
DateTime Formats
use xportrs::Format;
fn main() -> xportrs::Result<()> {
let datetime = Format::parse("DATETIME20.")?; // 15JAN2024:14:30:00
let e8601dt = Format::parse("E8601DT19.")?; // 2024-01-15T14:30:00
Ok(())
}
Numeric Formats
use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Bare numeric format
let bare = Format::parse("8.")?; // 8 characters, 0 decimals
let decimal = Format::parse("8.2")?; // 8 characters, 2 decimals
// Named numeric formats
let best = Format::parse("BEST12.")?; // Best representation
let comma = Format::parse("COMMA10.2")?; // Comma-separated
Ok(())
}
Character Formats
use xportrs::Format;
fn main() -> xportrs::Result<()> {
// Character formats start with $
let char200 = Format::parse("$CHAR200.")?;
let char40 = Format::parse("$40.")?; // Shorthand for $CHAR40.
Ok(())
}
Using Formats with Columns
Setting Format on Column
use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
// Using Format object
let col = Column::new("AESTDTC", data.clone())
.with_format(Format::character(19));
// Parsing from string
let col = Column::new("AESTDT", data.clone())
.with_format_str("DATE9.")?;
// Using constructor
let col = Column::new("VALUE", data)
.with_format(Format::numeric(8, 2));
Ok(())
}
Setting Informat
Informats control how data is read:
use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("RAWDATE", data)
.with_informat(Format::parse("DATE9.")?);
Ok(())
}
Format in XPT Files
When written to XPT, formats are stored in the NAMESTR record:
| Field | Size | Description |
|---|---|---|
nform | 8 bytes | Format name (space-padded) |
nfl | 2 bytes | Format length |
nfd | 2 bytes | Format decimals |
nfj | 2 bytes | Justification (0=left, 1=right) |
use xportrs::{Column, ColumnData, Format, Xpt};
fn main() -> xportrs::Result<()> {
let col = Column::new("AESTDT", ColumnData::F64(vec![Some(23391.0)]))
.with_format_str("DATE9.")?;
// When written, NAMESTR will contain:
// nform = "DATE "
// nfl = 9
// nfd = 0
// nfj = 1 (right-justified)
Ok(())
}
Format Validation
Invalid format strings return errors:
use xportrs::Format;
// Missing period
let result = Format::parse("DATE9");
assert!(result.is_err());
// Invalid syntax
let result = Format::parse("INVALID");
assert!(result.is_err());
// Empty string
let result = Format::parse("");
assert!(result.is_err());
Display and Debug
use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("DATE9.")?;
// Display: canonical format string
println!("{}", format); // "DATE9."
// Debug: detailed representation
println!("{:?}", format); // Format { name: "DATE", length: 9, ... }
Ok(())
}
Common Traits
use xportrs::Format;
fn main() -> xportrs::Result<()> {
let format = Format::parse("DATE9.")?;
// Clone
let format2 = format.clone();
// PartialEq
assert_eq!(Format::parse("DATE9.")?, Format::parse("DATE9.")?);
// Debug
println!("{:?}", format);
// Display
println!("{}", format);
Ok(())
}
FDA Format Recommendations
[!TIP] The FDA recommends avoiding custom SAS formats. Use standard formats like DATE9., DATETIME20., or simple numeric formats.
Recommended formats:
| Type | Recommended Format |
|---|---|
| Date (numeric) | DATE9. |
| DateTime (numeric) | DATETIME20. |
| Time (numeric) | TIME8. |
| Numeric | 8., 8.2 |
| Character | $CHAR200., $40. |
Avoid:
- Custom user-defined formats
- Formats requiring external catalogs
- Regional-specific formats
Reading XPT Files
xportrs provides multiple ways to read XPT files, from simple one-liners to detailed inspection.
Quick Read
The simplest way to read an XPT file:
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;
println!("Domain: {}", dataset.domain_code());
println!("Rows: {}", dataset.nrows());
println!("Columns: {}", dataset.ncols());
Ok(())
}
Reading Multiple Members
XPT files can contain multiple datasets (members):
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read all members
let datasets = Xpt::read_all("multi.xpt")?;
for dataset in datasets {
println!("{}: {} rows", dataset.domain_code(), dataset.nrows());
}
// Read specific member
let ae = Xpt::read_member("multi.xpt", "AE")?;
Ok(())
}
Inspecting Files
Get file metadata without loading all data:
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let info = Xpt::inspect("data.xpt")?;
// File timestamps
if let Some(created) = &info.created {
println!("Created: {}", created);
}
// List members
for name in info.member_names() {
println!("Member: {}", name);
}
// Find specific member
if let Some(member) = info.find_member("AE") {
println!("AE has {} variables", member.variables.len());
}
Ok(())
}
Builder API
For more control, use the reader builder:
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::reader("data.xpt")
.row_limit(1000) // Read only first 1000 rows
.read()?; // Read first/only member
Ok(())
}
Row Limiting
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read only first 100 rows (useful for previews)
let preview = Xpt::reader("large.xpt")
.row_limit(100)
.read()?;
println!("Preview: {} rows", preview.nrows());
Ok(())
}
Reading from Buffers
Read from in-memory data:
use std::io::Cursor;
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let xpt_bytes: Vec<u8> = vec![]; // Your XPT data here
let cursor = Cursor::new(xpt_bytes);
let dataset = Xpt::reader_from(cursor).read()?;
Ok(())
}
Accessing Data
Once loaded, access the data through the Dataset API:
use xportrs::{ColumnData, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;
// Access by column name
let usubjid = &dataset["USUBJID"];
let aeseq = &dataset["AESEQ"];
// Iterate over column data
if let ColumnData::String(values) = usubjid.data() {
for (i, value) in values.iter().enumerate() {
match value {
Some(s) => println!("Row {}: {}", i, s),
None => println!("Row {}: <missing>", i),
}
}
}
if let ColumnData::F64(values) = aeseq.data() {
for (i, value) in values.iter().enumerate() {
match value {
Some(v) => println!("Row {}: {}", i, v),
None => println!("Row {}: <missing>", i),
}
}
}
Ok(())
}
Metadata Preservation
xportrs preserves metadata when reading:
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;
// Dataset label
if let Some(label) = dataset.dataset_label() {
println!("Dataset label: {}", label);
}
// Column metadata
for col in dataset.columns() {
println!("Variable: {}", col.name());
if let Some(label) = col.label() {
println!(" Label: {}", label);
}
if let Some(format) = col.format() {
println!(" Format: {}", format);
}
if let Some(len) = col.explicit_length() {
println!(" Length: {}", len);
}
}
Ok(())
}
Error Handling
use xportrs::{Error, Xpt};
match Xpt::read("missing.xpt") {
Ok(dataset) => println!("Loaded {} rows", dataset.nrows()),
Err(Error::Io(e)) => eprintln!("IO error: {}", e),
Err(Error::MemberNotFound { domain_code }) => {
eprintln!("Member not found: {}", domain_code);
}
Err(e) => eprintln!("Error: {}", e),
}
Reading Large Files
For large files, consider:
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// 1. Preview first to understand structure
let info = Xpt::inspect("large.xpt")?;
println!("File has {} members", info.members.len());
// 2. Read with row limit for preview
let preview = Xpt::reader("large.xpt")
.row_limit(100)
.read()?;
// 3. Read specific columns of interest
let full = Xpt::read("large.xpt")?;
let columns_of_interest = ["USUBJID", "AETERM", "AESTDTC"];
for name in columns_of_interest {
if let Some(col) = full.column(name) {
println!("{}: {} values", name, col.len());
}
}
Ok(())
}
Thread Safety
Datasets are Send + Sync, allowing concurrent access:
use std::sync::Arc;
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Arc::new(Xpt::read("ae.xpt")?);
let handles: Vec<_> = (0..4).map(|i| {
let ds = Arc::clone(&dataset);
std::thread::spawn(move || {
println!("Thread {}: {} rows", i, ds.nrows());
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
Ok(())
}
Example: Read and Process
use xportrs::{ColumnData, Xpt};
fn process_adverse_events(path: &str) -> xportrs::Result<()> {
let dataset = Xpt::read(path)?;
// Verify expected columns
let required = ["USUBJID", "AETERM", "AESEV"];
for name in required {
if dataset.column(name).is_none() {
return Err(xportrs::Error::invalid_data(
format!("Missing required column: {}", name)
));
}
}
// Process data
let usubjid = &dataset["USUBJID"];
let aeterm = &dataset["AETERM"];
let aesev = &dataset["AESEV"];
if let (
ColumnData::String(subjects),
ColumnData::String(terms),
ColumnData::String(severities),
) = (usubjid.data(), aeterm.data(), aesev.data()) {
for i in 0..dataset.nrows() {
let subj = subjects[i].as_deref().unwrap_or("?");
let term = terms[i].as_deref().unwrap_or("?");
let sev = severities[i].as_deref().unwrap_or("?");
println!("{}: {} ({})", subj, term, sev);
}
}
Ok(())
}
Writing XPT Files
xportrs provides a builder API for writing XPT files with validation.
Basic Writing
The simplest way to write an XPT file:
use xportrs::{Column, ColumnData, Dataset, Xpt};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![
Column::new("USUBJID", ColumnData::String(vec![Some("001".into())])),
Column::new("AESEQ", ColumnData::F64(vec![Some(1.0)])),
])?;
Xpt::writer(dataset)
.finalize()?
.write_path("ae.xpt")?;
Ok(())
}
Writer Builder
The writer builder provides options for validation and output:
use xportrs::{Agency, Dataset, Xpt, Column, ColumnData};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset)
.agency(Agency::FDA) // Agency-specific validation
.finalize()?; // Validate and prepare
// Check validation results
if validated.has_errors() {
for issue in validated.issues() {
eprintln!("{}", issue);
}
return Err("Validation failed".into());
}
// Write if valid
validated.write_path("output.xpt")?;
Ok(())
}
Validation Workflow
graph LR
A[Dataset] --> B[Xpt::writer]
B --> C[Configure]
C --> D[finalize]
D --> E{Valid?}
E --> |Yes| F[write_path]
E --> |No| G[Handle Errors]
Checking Issues
use xportrs::{Severity, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;
// Check for any issues
println!("Has errors: {}", validated.has_errors());
println!("Has warnings: {}", validated.has_warnings());
// Get all issues
for issue in validated.issues() {
match issue.severity() {
Severity::Error => eprintln!("ERROR: {}", issue),
Severity::Warning => eprintln!("WARNING: {}", issue),
Severity::Info => println!("INFO: {}", issue),
}
}
Ok(())
}
Agency Validation
Different agencies have different requirements:
use xportrs::{Agency, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
// FDA (strict ASCII)
let fda_result = Xpt::writer(dataset.clone())
.agency(Agency::FDA)
.finalize()?;
// PMDA (allows extended characters)
let pmda_result = Xpt::writer(dataset.clone())
.agency(Agency::PMDA)
.finalize()?;
// NMPA
let nmpa_result = Xpt::writer(dataset)
.agency(Agency::NMPA)
.finalize()?;
Ok(())
}
Writing to Different Destinations
Write to File Path
let validated = todo!();
validated.write_path("output.xpt")?;
Write to Buffer
let validated = todo!();
let mut buffer = Vec::new();
validated.write_to(&mut buffer)?;
// buffer now contains the XPT bytes
println!("Wrote {} bytes", buffer.len());
Write to Any Writer
use std::fs::File;
use std::io::BufWriter;
let validated = todo!();
let file = File::create("output.xpt")?;
let mut writer = BufWriter::new(file);
validated.write_to(&mut writer)?;
File Splitting
Large datasets are automatically split:
use xportrs::{Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let large_dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let paths = Xpt::writer(large_dataset)
.max_file_size_gb(5.0) // Default is 5.0
.finalize()?
.write_path("ae.xpt")?; // May create ae_001.xpt, ae_002.xpt, etc.
for path in paths {
println!("Wrote: {}", path.display());
}
Ok(())
}
Complete Example
use xportrs::{Agency, Column, ColumnData, Dataset, Format, Xpt};
fn write_adverse_events() -> xportrs::Result<()> {
// Create dataset with full metadata
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("STUDYID", ColumnData::String(vec![
Some("ABC-123".into()),
Some("ABC-123".into()),
]))
.with_label("Study Identifier")
.with_format(Format::character(20)),
Column::new("USUBJID", ColumnData::String(vec![
Some("ABC-123-001".into()),
Some("ABC-123-002".into()),
]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40)),
Column::new("AESEQ", ColumnData::F64(vec![
Some(1.0),
Some(1.0),
]))
.with_label("Sequence Number")
.with_format(Format::numeric(8, 0)),
Column::new("AETERM", ColumnData::String(vec![
Some("HEADACHE".into()),
Some("NAUSEA".into()),
]))
.with_label("Reported Term for the Adverse Event")
.with_format(Format::character(200))
.with_length(200),
Column::new("AESTDTC", ColumnData::String(vec![
Some("2024-01-15".into()),
Some("2024-01-16".into()),
]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19),
])?;
// Validate with FDA rules
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
// Report validation issues
if validated.has_warnings() {
println!("Warnings:");
for issue in validated.issues() {
if issue.severity() == xportrs::Severity::Warning {
println!(" - {}", issue);
}
}
}
if validated.has_errors() {
eprintln!("Cannot write due to errors:");
for issue in validated.issues() {
if issue.severity() == xportrs::Severity::Error {
eprintln!(" - {}", issue);
}
}
return Err(xportrs::Error::invalid_data("Validation failed"));
}
// Write the file
validated.write_path("ae.xpt")?;
println!("Successfully wrote ae.xpt");
Ok(())
}
Error Handling
use xportrs::{Error, Xpt, Dataset, Column, ColumnData};
fn main() {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))]).unwrap();
let result = Xpt::writer(dataset)
.finalize()
.and_then(|v| v.write_path("output.xpt"));
match result {
Ok(paths) => {
for path in paths {
println!("Wrote: {}", path.display());
}
}
Err(Error::Io(e)) => eprintln!("IO error: {}", e),
Err(Error::InvalidSchema { message }) => {
eprintln!("Schema error: {}", message);
}
Err(e) => eprintln!("Error: {}", e),
}
}
Best Practices
[!TIP] Always check validation results before deploying files to production or submission.
- Add metadata: Include labels and formats for all variables
- Use agency validation: Specify the target agency for appropriate checks
- Handle warnings: Review warnings even if they don’t block writing
- Test roundtrip: Verify files can be read back correctly
- Check file size: Ensure files don’t exceed agency limits
use xportrs::{Agency, Dataset, Error, Xpt};
// Production-ready writing pattern
fn write_submission_file(dataset: Dataset, path: &str) -> xportrs::Result<()> {
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
// Log all issues
for issue in validated.issues() {
log::info!("{}: {}", issue.severity(), issue);
}
// Fail on errors
if validated.has_errors() {
return Err(Error::invalid_data("Validation errors present"));
}
// Write and verify
let paths = validated.write_path(path)?;
// Verify by reading back
for path in &paths {
let _ = Xpt::read(path)?;
}
Ok(())
}
Validation API
xportrs provides comprehensive validation for XPT files. This page details the validation API.
Validation Overview
graph TB
subgraph "Validation Pipeline"
A[Dataset] --> B[Agency Rules]
B --> C[Format Rules]
C --> D[CDISC Rules]
D --> E[Issue Collection]
end
subgraph "Issue Types"
E --> F[Errors]
E --> G[Warnings]
E --> H[Info]
end
ValidatedWrite
The ValidatedWrite type represents a validated dataset ready for writing:
use xportrs::{Severity, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;
// Check for issues
println!("Has errors: {}", validated.has_errors());
println!("Has warnings: {}", validated.has_warnings());
// Get all issues
let issues = validated.issues();
// Only write if no errors
if !validated.has_errors() {
validated.write_path("output.xpt")?;
}
Ok(())
}
Issue Type
The Issue enum represents validation issues:
Issue Variants
use xportrs::Issue;
let issue: Issue = todo!();
match issue {
Issue::VariableNameTooLong { variable, length } => {
println!("Variable {} name is {} bytes (max 8)", variable, length);
}
Issue::VariableLabelTooLong { variable, length } => {
println!("Variable {} label is {} bytes (max 40)", variable, length);
}
Issue::MissingVariableLabel { variable } => {
println!("Variable {} is missing a label", variable);
}
Issue::MissingDatasetLabel { dataset } => {
println!("Dataset {} is missing a label", dataset);
}
Issue::InvalidFormatSyntax { variable, format, reason } => {
println!("Variable {} has invalid format '{}': {}", variable, format, reason);
}
// ... other variants
_ => {}
}
Issue Properties
use xportrs::{Severity, Issue};
let issue: Issue = todo!();
// Severity level
let severity: Severity = issue.severity();
// Target (variable name, dataset name, etc.)
let target: &str = issue.target();
// Display representation
println!("{}", issue);
// Debug representation
println!("{:?}", issue);
Severity Levels
use xportrs::Severity;
let severity = Severity::Error;
match severity {
Severity::Error => {
// Blocks file writing
// File would be rejected by agency
}
Severity::Warning => {
// Does not block writing
// Review recommended
}
Severity::Info => {
// Informational only
// Best practice suggestion
}
}
// Severity is ordered
assert!(Severity::Info < Severity::Warning);
assert!(Severity::Warning < Severity::Error);
Filtering Issues
use xportrs::{Xpt, Dataset, Column, ColumnData, Severity};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
let validated = Xpt::writer(dataset).finalize()?;
// Get only errors
let errors: Vec<_> = validated.issues()
.iter()
.filter(|i| i.severity() == Severity::Error)
.collect();
// Get issues for specific variable
let usubjid_issues: Vec<_> = validated.issues()
.iter()
.filter(|i| i.target() == "USUBJID")
.collect();
// Count by severity
let error_count = validated.issues()
.iter()
.filter(|i| i.severity() == Severity::Error)
.count();
Ok(())
}
Agency-Specific Validation
use xportrs::{Agency, Xpt, Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let dataset = Dataset::new("AE", vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))])?;
// FDA: Strict ASCII validation
let fda_result = Xpt::writer(dataset.clone())
.agency(Agency::FDA)
.finalize()?;
// Check for ASCII violations
for issue in fda_result.issues() {
if format!("{}", issue).contains("ASCII") {
println!("ASCII issue: {}", issue);
}
}
Ok(())
}
Validation Rules
Variable Name Rules
| Rule | Severity | Trigger |
|---|---|---|
| Empty name | Error | Name is empty string |
| Name too long | Error | Name > 8 bytes |
| Invalid characters | Error | Non-alphanumeric (except _) |
| Starts with number | Error | First char is digit |
| Non-uppercase | Info | Lowercase letters present |
Variable Label Rules
| Rule | Severity | Trigger |
|---|---|---|
| Missing label | Warning | Label is None or empty |
| Label too long | Error | Label > 40 bytes |
| Non-ASCII (FDA) | Error | Non-ASCII characters |
Dataset Rules
| Rule | Severity | Trigger |
|---|---|---|
| Empty name | Error | Domain code is empty |
| Name too long | Error | Domain code > 8 bytes |
| Missing label | Warning | Dataset label is None |
| Label too long | Error | Label > 40 bytes |
Data Rules
| Rule | Severity | Trigger |
|---|---|---|
| Column length mismatch | Error | Columns have different lengths |
| Character too long | Error | Character value > 200 bytes |
Custom Pre-Validation
Add custom validation before xportrs validation:
use xportrs::{Dataset, Xpt};
fn custom_validate(dataset: &Dataset) -> Result<(), String> {
// Check for required variables
let required = ["STUDYID", "USUBJID"];
for var in required {
if dataset.column(var).is_none() {
return Err(format!("Missing required variable: {}", var));
}
}
// Check STUDYID consistency
// ... additional checks ...
Ok(())
}
fn write_with_validation(dataset: Dataset, path: &str) -> xportrs::Result<()> {
// Custom validation first
custom_validate(&dataset)
.map_err(|e| xportrs::Error::invalid_data(e))?;
// Then xportrs validation
let validated = Xpt::writer(dataset).finalize()?;
if validated.has_errors() {
return Err(xportrs::Error::invalid_data("Validation failed"));
}
validated.write_path(path)?;
Ok(())
}
Validation Reporting
use xportrs::{Severity, Xpt};
fn report_validation(dataset: xportrs::Dataset) {
let validated = Xpt::writer(dataset).finalize().unwrap();
// Summary
let errors = validated.issues().iter()
.filter(|i| i.severity() == Severity::Error).count();
let warnings = validated.issues().iter()
.filter(|i| i.severity() == Severity::Warning).count();
let infos = validated.issues().iter()
.filter(|i| i.severity() == Severity::Info).count();
println!("Validation Summary:");
println!(" Errors: {}", errors);
println!(" Warnings: {}", warnings);
println!(" Info: {}", infos);
// Detailed report
if !validated.issues().is_empty() {
println!("\nDetails:");
for issue in validated.issues() {
let prefix = match issue.severity() {
Severity::Error => "ERROR",
Severity::Warning => "WARN ",
Severity::Info => "INFO ",
};
println!(" [{}] {} - {}", prefix, issue.target(), issue);
}
}
}
Integration with Pinnacle 21
[!NOTE] xportrs validation covers XPT-level rules. For complete CDISC validation, use Pinnacle 21 or similar tools.
| Validation Area | xportrs | Pinnacle 21 |
|---|---|---|
| Variable names | ✅ | ✅ |
| Variable labels | ✅ | ✅ |
| Format metadata | ✅ | ✅ |
| Controlled terminology | ❌ | ✅ |
| Required variables | ❌ | ✅ |
| Cross-dataset consistency | ❌ | ✅ |
| define.xml matching | ❌ | ✅ |
Best Practices
- Validate early: Check validation before processing large datasets
- Log all issues: Keep records of validation results
- Fail on errors: Don’t write files with validation errors
- Review warnings: Warnings may indicate data quality issues
- Document exceptions: If shipping with warnings, document why
Metadata
xportrs provides rich metadata support for XPT files, ensuring CDISC compliance and data clarity.
Metadata Overview
graph TB
subgraph "Dataset Level"
A[Domain Code] --> B[Dataset Label]
end
subgraph "Variable Level"
C[Variable Name] --> D[Variable Label]
D --> E[Format]
E --> F[Informat]
F --> G[Length]
G --> H[Role]
end
Dataset Metadata
Domain Code
The domain code is the dataset name (1-8 characters):
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
let dataset = Dataset::new("AE", columns)?;
// Access domain code
let code: &str = dataset.domain_code(); // "AE"
Ok(())
}
Dataset Label
The dataset label provides a description (0-40 characters):
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let columns = vec![Column::new("A", ColumnData::F64(vec![Some(1.0)]))];
// Set at construction
let dataset = Dataset::with_label("AE", "Adverse Events", columns.clone())?;
// Or set later
let mut dataset = Dataset::new("AE", columns)?;
dataset.set_label("Adverse Events");
// Access
let label: Option<&str> = dataset.dataset_label();
Ok(())
}
Variable Metadata
Variable Name
Variable names follow SAS naming rules:
use xportrs::{Column, ColumnData, VariableName};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
// Name is set at construction
let col = Column::new("USUBJID", data);
// Access name
let name: &str = col.name();
// VariableName type for validation
let var_name = VariableName::new("USUBJID");
assert_eq!(var_name.as_str(), "USUBJID");
}
Variable Label
Labels describe the variable (0-40 characters):
use xportrs::{Column, ColumnData, Label};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
let col = Column::new("USUBJID", data)
.with_label("Unique Subject Identifier");
// Access label
if let Some(label) = col.label() {
println!("Label: {}", label);
}
// Label type
let label = Label::new("Unique Subject Identifier");
assert_eq!(label.as_str(), "Unique Subject Identifier");
}
Format
Display formats control how values are shown:
use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
// Using Format object
let col = Column::new("AESTDT", data.clone())
.with_format(Format::parse("DATE9.")?);
// Using format string
let col = Column::new("AESTDT", data)
.with_format_str("DATE9.")?;
// Access format
if let Some(format) = col.format() {
println!("Format: {}", format);
}
Ok(())
}
Informat
Input formats control how values are read:
use xportrs::{Column, ColumnData, Format};
fn main() -> xportrs::Result<()> {
let data = ColumnData::F64(vec![Some(1.0)]);
let col = Column::new("RAWDATE", data)
.with_informat(Format::parse("DATE9.")?);
if let Some(informat) = col.informat() {
println!("Informat: {}", informat);
}
Ok(())
}
Length
Explicit length for character variables:
use xportrs::{Column, ColumnData};
fn main() {
// Auto-derived from data
let col = Column::new("VAR", ColumnData::String(vec![
Some("Hello".into()), // 5 characters
Some("World".into()), // 5 characters
]));
// Length will be 5
// Explicit override
let data = ColumnData::String(vec![Some("text".into())]);
let col = Column::new("VAR", data)
.with_length(200); // Force 200 bytes
// Access
if let Some(len) = col.explicit_length() {
println!("Explicit length: {}", len);
}
}
Variable Role
Roles categorize variables per CDISC:
use xportrs::{Column, ColumnData, VariableRole};
fn main() {
let data = ColumnData::String(vec![Some("001".into())]);
let col = Column::with_role(
"USUBJID",
VariableRole::Identifier,
data,
);
// Available roles
let roles = [
VariableRole::Identifier,
VariableRole::Topic,
VariableRole::Timing,
VariableRole::Qualifier,
VariableRole::Rule,
VariableRole::Synonym,
VariableRole::Record,
];
// Access role
if let Some(role) = col.role() {
println!("Role: {:?}", role);
}
}
Metadata Types
DomainCode
use xportrs::DomainCode;
fn main() {
let code = DomainCode::new("AE");
// Access
let s: &str = code.as_str();
let code2 = DomainCode::new("AE");
let owned: String = code2.into_inner();
// Traits
assert_eq!(code, DomainCode::new("AE"));
println!("{}", code); // "AE"
}
Label
use xportrs::Label;
fn main() {
let label = Label::new("Adverse Events");
// Access
let s: &str = label.as_str();
let label2 = Label::new("AE");
let owned: String = label2.into_inner();
// From string
let label: Label = "Test".into();
}
VariableName
use xportrs::VariableName;
fn main() {
let name = VariableName::new("USUBJID");
// Access
let s: &str = name.as_str();
let name2 = VariableName::new("TEST");
let owned: String = name2.into_inner();
// Validation (at construction or later)
// Names are uppercased automatically
let name = VariableName::new("usubjid");
assert_eq!(name.as_str(), "USUBJID");
}
Metadata in XPT Files
NAMESTR Record Storage
Field Offset Size Description
nname 8-15 8 Variable name
nlabel 16-55 40 Variable label
nform 56-63 8 Format name
nfl 64-65 2 Format length
nfd 66-67 2 Format decimals
nfj 68-69 2 Format justification
niform 72-79 8 Informat name
nifl 80-81 2 Informat length
nifd 82-83 2 Informat decimals
Reading Metadata
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
let dataset = Xpt::read("ae.xpt")?;
// Dataset metadata
println!("Domain: {}", dataset.domain_code());
if let Some(label) = dataset.dataset_label() {
println!("Label: {}", label);
}
// Variable metadata
for col in dataset.columns() {
println!("\n{}", col.name());
if let Some(label) = col.label() {
println!(" Label: {}", label);
}
if let Some(format) = col.format() {
println!(" Format: {}", format);
}
if let Some(informat) = col.informat() {
println!(" Informat: {}", informat);
}
if let Some(len) = col.explicit_length() {
println!(" Length: {}", len);
}
if let Some(role) = col.role() {
println!(" Role: {:?}", role);
}
}
Ok(())
}
Preserving Metadata on Roundtrip
use xportrs::Xpt;
fn main() -> xportrs::Result<()> {
// Read
let original = Xpt::read("ae.xpt")?;
// Modify (metadata preserved)
// ...
// Write
Xpt::writer(original.clone())
.finalize()?
.write_path("ae_modified.xpt")?;
// Verify
let reloaded = Xpt::read("ae_modified.xpt")?;
assert_eq!(reloaded.dataset_label(), original.dataset_label());
Ok(())
}
Metadata and Define-XML
[!IMPORTANT] Variable labels in XPT files should match those in define.xml. Pinnacle 21 validates this consistency.
use xportrs::{Dataset, Column, ColumnData};
fn main() -> xportrs::Result<()> {
let data = ColumnData::String(vec![Some("test".into())]);
// Create dataset with labels matching define.xml
let dataset = Dataset::with_label("AE", "Adverse Events", vec![
Column::new("STUDYID", data.clone())
.with_label("Study Identifier"), // Must match define.xml
Column::new("USUBJID", data)
.with_label("Unique Subject Identifier"), // Must match define.xml
// ...
])?;
Ok(())
}
Best Practices
- Always include labels: Labels help reviewers understand data
- Use standard formats: DATE9., DATETIME20., $CHARn.
- Set explicit lengths: Control character variable lengths
- Assign roles: Categorize variables per CDISC
- Verify roundtrip: Ensure metadata survives read/write cycles
use xportrs::{Column, ColumnData, Format, VariableRole};
// Complete metadata example
let col = Column::with_role(
"AESTDTC",
VariableRole::Timing,
ColumnData::String(vec![Some("2024-01-15".into())]),
)
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19);
Regulatory Compliance Overview
xportrs is designed to produce XPT files that meet the requirements of major regulatory agencies for clinical trial data submissions.
Supported Agencies
| Agency | Region | Standards | xportrs Support |
|---|---|---|---|
| FDA | United States | CDISC SDTM/ADaM | Full validation |
| PMDA | Japan | CDISC + J-SDTM extensions | Full validation |
| NMPA | China | CDISC + local requirements | Full validation |
| EMA | Europe | CDISC SDTM/ADaM | Full validation |
Key Requirements
All agencies require XPT V5 format files that conform to the SAS Transport specification (TS-140). The key requirements are:
Variable Requirements
graph LR subgraph "Variable Constraints" A[Name ≤8 bytes] --> B[Uppercase A-Z, 0-9, _] B --> C[Must start with letter] D[Label ≤40 bytes] --> E[ASCII only for FDA] F[Char length ≤200] --> G[Numeric = 8 bytes] end
Dataset Requirements
- Dataset name: 1-8 bytes, uppercase alphanumeric
- Dataset label: 0-40 bytes (recommended for reviewer clarity)
- File size: ≤5GB per file (auto-split supported)
Format Requirements
XPT files must use:
- IBM floating-point encoding (not IEEE 754)
- Big-endian byte order
- SAS epoch (January 1, 1960) for dates
- 80-byte records for headers
Validation Levels
xportrs provides three severity levels for validation issues:
| Severity | Meaning | Example |
|---|---|---|
| Error | File will not be accepted | Variable name >8 bytes |
| Warning | Review recommended | Missing variable label |
| Info | Best practice suggestion | Non-standard format |
[!IMPORTANT] Only Error severity issues block file writing. Warnings and info messages are advisory.
Agency-Specific Rules
FDA (United States)
The FDA requires strict ASCII compliance for all text:
#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;
// Check for FDA-specific issues
for issue in validated.issues() {
println ! ("[{}] {}", issue.severity(), issue);
}
}
PMDA (Japan)
PMDA allows Shift-JIS encoding for Japanese text in certain fields:
#![allow(unused)]
fn main() {
use xportrs::{Agency, TextMode, Xpt};
let validated = Xpt::writer(dataset)
.agency(Agency::PMDA)
.text_mode(TextMode::Latin1) // Extended character support
.finalize() ?;
}
NMPA (China)
NMPA follows CDISC standards with additional local requirements:
#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};
let validated = Xpt::writer(dataset)
.agency(Agency::NMPA)
.finalize() ?;
}
Compliance Verification
Using xportrs Validation
#![allow(unused)]
fn main() {
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;
// Count issues by severity
let errors = validated.issues().iter()
.filter( | i| i.severity() == Severity::Error)
.count();
if errors > 0 {
eprintln ! ("{} blocking errors found", errors);
}
}
External Validation (Pinnacle 21)
After generating XPT files, we recommend validation with Pinnacle 21 Community:
- Download from Pinnacle 21
- Run validation against your XPT files and define.xml
- Review any SD (Study Data) rule violations
[!NOTE] xportrs handles XPT-level compliance. Dataset content validation (controlled terminology, required variables) requires external tools like Pinnacle 21.
Official Sources
- FDA Study Data Technical Conformance Guide
- CDISC SDTM Implementation Guide
- SAS TS-140 XPT Specification
CDISC Standards
The Clinical Data Interchange Standards Consortium (CDISC) defines the data models and metadata standards used in clinical trial submissions.
CDISC Data Models
graph TB
subgraph "CDISC Standards Hierarchy"
CDASH[CDASH<br/>Data Collection] --> SDTM[SDTM<br/>Tabulation]
SDTM --> ADaM[ADaM<br/>Analysis]
ADaM --> TFL[Tables, Figures,<br/>Listings]
end
subgraph "Submission Package"
SDTM --> XPT1[SDTM XPT Files]
ADaM --> XPT2[ADaM XPT Files]
XPT1 --> DEFINE[define.xml]
XPT2 --> DEFINE
end
SDTM (Study Data Tabulation Model)
SDTM is the standard for organizing clinical trial tabulation data. Each domain (dataset) represents a specific type of data:
| Domain | Description | Common Variables |
|---|---|---|
| DM | Demographics | STUDYID, USUBJID, AGE, SEX |
| AE | Adverse Events | AETERM, AESTDTC, AESEV |
| CM | Concomitant Medications | CMTRT, CMDOSE |
| LB | Laboratory Results | LBTESTCD, LBORRES |
| VS | Vital Signs | VSTESTCD, VSORRES |
| EX | Exposure | EXTRT, EXDOSE |
Creating SDTM Datasets
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Dataset, Format, Xpt};
// Demographics (DM) domain
let dm = Dataset::with_label("DM", "Demographics", vec![
Column::new("STUDYID", ColumnData::String(vec![Some("ABC-123".into())]))
.with_label("Study Identifier"),
Column::new("USUBJID", ColumnData::String(vec![Some("ABC-123-001".into())]))
.with_label("Unique Subject Identifier"),
Column::new("AGE", ColumnData::F64(vec![Some(45.0)]))
.with_label("Age"),
Column::new("SEX", ColumnData::String(vec![Some("M".into())]))
.with_label("Sex"),
Column::new("RACE", ColumnData::String(vec![Some("WHITE".into())]))
.with_label("Race"),
]) ?;
}
ADaM (Analysis Data Model)
ADaM is the standard for analysis datasets derived from SDTM:
| Dataset | Description | Purpose |
|---|---|---|
| ADSL | Subject-Level Analysis | One row per subject |
| ADAE | Adverse Events Analysis | One row per event |
| ADLB | Laboratory Analysis | Derived lab values |
| ADTTE | Time-to-Event | Survival analysis |
Creating ADaM Datasets
#![allow(unused)]
fn main() {
// Subject-Level Analysis Dataset (ADSL)
let adsl = Dataset::with_label("ADSL", "Subject Level Analysis", vec![
Column::new("STUDYID", ColumnData::String(vec![Some("ABC-123".into())]))
.with_label("Study Identifier"),
Column::new("USUBJID", ColumnData::String(vec![Some("ABC-123-001".into())]))
.with_label("Unique Subject Identifier"),
Column::new("TRT01P", ColumnData::String(vec![Some("DRUG A".into())]))
.with_label("Planned Treatment for Period 01"),
Column::new("TRT01A", ColumnData::String(vec![Some("DRUG A".into())]))
.with_label("Actual Treatment for Period 01"),
Column::new("SAFFL", ColumnData::String(vec![Some("Y".into())]))
.with_label("Safety Population Flag"),
]) ?;
}
Variable Metadata
CDISC requires specific metadata for each variable:
Required Metadata
| Metadata | XPT Field | xportrs Method |
|---|---|---|
| Variable Name | nname | Column::new(name, ...) |
| Variable Label | nlabel | .with_label(...) |
| Variable Type | ntype | Inferred from ColumnData |
| Display Format | nform | .with_format(...) |
| Variable Length | nlng | .with_length(...) |
Example with Full Metadata
#![allow(unused)]
fn main() {
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19)
}
Controlled Terminology
CDISC defines controlled terminology for many variables:
[!WARNING] xportrs does not validate controlled terminology values. Use Pinnacle 21 or similar tools to verify that coded values match CDISC controlled terminology.
Common controlled terminology:
- AESEV: MILD, MODERATE, SEVERE
- SEX: M, F, U, UNDIFFERENTIATED
- RACE: WHITE, BLACK OR AFRICAN AMERICAN, ASIAN, etc.
- NY (Yes/No): Y, N
SDTM-IG Versions
xportrs supports the latest SDTM-IG metadata requirements:
| Version | Release Date | Key Changes |
|---|---|---|
| SDTM-IG 3.4 | 2023 | Current recommended |
| SDTM-IG 3.3 | 2021 | Labels no longer conformance criteria |
| SDTM-IG 3.2 | 2013 | Legacy support |
[!NOTE] As of SDTM-IG 3.3, variable labels are **recommended ** but not required for conformance. However, xportrs still generates warnings for missing labels since they are important for data reviewers.
Define-XML Integration
The define.xml file provides metadata that complements XPT files:
graph LR
subgraph "Submission Package"
XPT[XPT Files] -->|" Data "| FDA[FDA Review]
DEFINE[define.xml] -->|" Metadata "| FDA
XPT -.->|" Must match "| DEFINE
end
[!IMPORTANT] Variable labels in XPT files should match those in define.xml. Pinnacle 21 rule SD0063 checks for mismatches.
Resources
FDA Technical Conformance Guide
The FDA Study Data Technical Conformance Guide (TCG) defines requirements for electronic study data submissions. This page covers XPT-specific requirements.
Submission Types
XPT files are required for these FDA submission types:
| Submission Type | Description | XPT Required |
|---|---|---|
| NDA | New Drug Application | Yes |
| ANDA | Abbreviated NDA (Generics) | Yes |
| BLA | Biologics License Application | Yes |
| IND | Investigational New Drug | Conditional |
File Size Requirements
graph LR
subgraph "File Size Limits"
A[Single XPT] --> B{">5 GB?"}
B -->|Yes| C[Split into parts]
B -->|No| D[Single file OK]
C --> E[ae_001.xpt<br/>ae_002.xpt<br/>...]
end
Automatic File Splitting
xportrs automatically handles file splitting:
#![allow(unused)]
fn main() {
use xportrs::Xpt;
// Automatically splits if dataset would exceed 5GB
Xpt::writer(large_dataset)
.max_file_size_gb(5.0) // Optional, 5.0 is default
.finalize() ?
.write_path("ae.xpt") ?; // May create ae_001.xpt, ae_002.xpt, ...
}
Character Encoding
[!IMPORTANT] FDA requires **ASCII-only ** characters in variable names and labels. Extended characters may cause validation failures.
ASCII Validation
#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};
let validated = Xpt::writer(dataset)
.agency(Agency::FDA) // Enforces ASCII validation
.finalize() ?;
// Non-ASCII characters will generate errors
for issue in validated.issues() {
if format ! ("{}", issue).contains("ASCII") {
eprintln ! ("ASCII violation: {}", issue);
}
}
}
Variable Requirements
Naming Conventions
| Requirement | FDA TCG Section | xportrs Validation |
|---|---|---|
| 1-8 characters | Section 4.1.5 | Error if violated |
| Uppercase only | Section 4.1.5 | Auto-converted |
| Start with letter | Section 4.1.5 | Error if violated |
| A-Z, 0-9, underscore only | Section 4.1.5 | Error if violated |
Label Requirements
| Requirement | FDA TCG Section | xportrs Validation |
|---|---|---|
| 0-40 characters | Section 4.1.5 | Error if >40 |
| ASCII only | Section 4.1.5 | Error if non-ASCII |
| Recommended for all variables | Section 4.1.5 | Warning if missing |
Numeric Precision
XPT files use IBM floating-point format with specific precision limits:
| Data Type | IEEE 754 Precision | IBM Float Precision | Notes |
|---|---|---|---|
| Integer | Exact to 2^53 | Exact to ~10^14 | Safe for IDs |
| Decimal | ~15-17 digits | ~14-16 digits | Slight loss |
| Date | Varies | SAS epoch-based | Use date formats |
[!NOTE] For maximum precision, consider using the
DATE9.orDATETIME20.formats for date/time values rather than storing as plain numerics.
Date Handling
FDA expects dates in specific formats:
ISO 8601 Character Dates
#![allow(unused)]
fn main() {
// Preferred: Store as ISO 8601 character string
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
}
SAS Numeric Dates
#![allow(unused)]
fn main() {
// Alternative: Store as SAS date number
// Days since January 1, 1960
let sas_date = 23_391.0; // 2024-01-15
Column::new("AESTDT", ColumnData::F64(vec![Some(sas_date)]))
.with_label("Start Date")
.with_format_str("DATE9.") ?
}
Study Data Reviewer’s Guide
FDA recommends including a Reviewer’s Guide with submissions. The guide should reference:
- Dataset locations and naming conventions
- Variable naming patterns
- Any deviations from CDISC standards
- Data transformation documentation
eCTD Placement
XPT files are placed in specific eCTD module locations:
m5/
├── datasets/
│ ├── tabulations/
│ │ ├── sdtm/
│ │ │ ├── ae.xpt
│ │ │ ├── dm.xpt
│ │ │ └── define.xml
│ │ └── send/ (nonclinical)
│ └── analysis/
│ └── adam/
│ ├── adsl.xpt
│ ├── adae.xpt
│ └── define.xml
Validation Checklist
Before submission, verify:
- All XPT files ≤5GB (or properly split)
- Variable names ≤8 characters, uppercase
- Variable labels ≤40 characters, ASCII only
- Dataset names ≤8 characters
- Character variable lengths ≤200 bytes
- define.xml present and valid
- Pinnacle 21 validation passed (or issues documented)
Resources
- FDA Study Data Technical Conformance Guide
- FDA Study Data Standards Resources
- FDA Data Standards Catalog
XPT V5 Specification
The XPT V5 format is defined by the SAS Technical Note TS-140. This page provides a comprehensive overview of the format.
Format Overview
XPT V5 (also known as SAS Transport Version 5) is a binary file format with:
- 80-byte records for headers
- Big-endian byte order
- IBM floating-point number encoding
- Fixed-width text fields (space-padded)
graph TB
subgraph "XPT V5 File Structure"
LH[Library Header<br/>80 bytes] --> FD[First Dataset]
FD -->|" More datasets "| ND[Next Dataset...]
end
subgraph "Dataset Structure"
MH[Member Header<br/>80 bytes] --> DH[DSCRPTR Header<br/>80 bytes]
DH --> DD[Dataset Descriptor<br/>160 bytes]
DD --> NH[NAMESTR Header<br/>80 bytes]
NH --> NR[NAMESTR Records<br/>140 bytes × n]
NR --> OH[OBS Header<br/>80 bytes]
OH --> OD[Observation Data]
end
Library Header
The file begins with a library header identifying the format:
| Offset | Size | Content |
|---|---|---|
| 0-79 | 80 | HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000 |
#![allow(unused)]
fn main() {
const LIBRARY_HEADER: &[u8; 80] =
b"HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000 ";
}
Member Header
Each dataset (member) begins with a member header:
| Offset | Size | Content |
|---|---|---|
| 0-79 | 80 | HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140 |
The numbers at the end indicate:
00000016= 16 bytes for dataset descriptor (hex)0000014= 140 bytes per NAMESTR record (decimal)
Dataset Descriptor
The dataset descriptor contains:
| Offset | Size | Field | Description |
|---|---|---|---|
| 0-7 | 8 | SAS | SAS |
| 8-15 | 8 | SAS | SAS |
| 16-23 | 8 | SASLIB | SASLIB |
| 24-31 | 8 | Version | 9.4 |
| 32-39 | 8 | OS | Operating system |
| 40-47 | 8 | Blanks | Padding |
| 48-63 | 16 | Created | ddMMMyy:hh:mm:ss |
| 64-79 | 16 | Modified | ddMMMyy:hh:mm:ss |
Second Descriptor Record
| Offset | Size | Field | Description |
|---|---|---|---|
| 0-7 | 8 | DSNAME | Dataset name |
| 8-15 | 8 | SASDATA | SASDATA |
| 16-23 | 8 | Version | 9.4 |
| 24-31 | 8 | OS | Operating system |
| 32-39 | 8 | Blanks | Padding |
| 40-79 | 40 | Label | Dataset label |
NAMESTR Records
The NAMESTR header introduces the variable metadata:
| Offset | Size | Content |
|---|---|---|
| 0-53 | 54 | HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!! |
| 54-57 | 4 | Number of variables (zero-padded) |
| 58-79 | 22 | Padding |
Each variable is described by a 140-byte NAMESTR record. See NAMESTR Records for detailed byte layout.
Observation Data
The observation header introduces the data:
| Offset | Size | Content |
|---|---|---|
| 0-79 | 80 | HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000 |
After this, raw observation data follows in row-major order:
[Row 1: Var1][Row 1: Var2]...[Row 1: VarN]
[Row 2: Var1][Row 2: Var2]...[Row 2: VarN]
...
Numeric Variables
All numeric variables are stored as 8-byte IBM floating-point:
- 8 bytes per value
- Big-endian byte order
- IBM base-16 exponent (not IEEE 754)
Character Variables
Character variables are stored as fixed-width text:
- 1-200 bytes per value (as defined in NAMESTR)
- Space-padded on the right
- No null terminators
Missing Values
| Type | Encoding |
|---|---|
Numeric missing (.) | 0x2E in first byte, zeros elsewhere |
Numeric missing (.A-.Z) | 0x41-0x5A in first byte |
| Character missing | All spaces |
Record Padding
XPT uses 80-byte record alignment:
- NAMESTR records: 140 bytes (not aligned)
- Multiple NAMESTRs fill to 80-byte boundary
- Observation rows: variable length (row_length × n)
- File ends with space padding to 80 bytes
Version Differences
| Feature | V5 (TS-140) | V8+ |
|---|---|---|
| Variable name length | 8 bytes | 32 bytes |
| Label length | 40 bytes | 256 bytes |
| Number encoding | IBM float | IEEE 754 |
| Max observations | ~2 billion | Unlimited |
| Regulatory support | FDA/PMDA/NMPA | Limited |
[!IMPORTANT] For regulatory submissions, only V5 format is accepted. xportrs focuses on V5 compliance.
Official Specification
The authoritative source for XPT V5 format is:
SAS Technical Note TS-140: Record Layout of a SAS Version 5 or 6 Data Set in SAS Transport (XPORT) Format
Download PDF | View on SAS Support
Format Family
The Library of Congress maintains format documentation:
XPT V8/V9 Specification
The XPT V8/V9 format extends the original V5 format with support for longer variable names and labels.
[!WARNING] XPT V8/V9 format is not accepted for FDA regulatory submissions. For regulatory submissions, use XPT V5 format only.
Key Differences from V5
| Feature | V5 (TS-140) | V8/V9 |
|---|---|---|
| Variable name length | 8 bytes | 32 bytes |
| Label length | 40 bytes | 256 bytes |
| Number encoding | IBM float | IEEE 754 |
| Max observations | ~2 billion | Unlimited |
| Regulatory support | FDA/PMDA/NMPA | Not accepted |
Format Overview
XPT V8/V9 maintains the same basic structure as V5:
- 80-byte records for headers
- Big-endian byte order
- Fixed-width text fields (space-padded)
However, it differs in:
- Variable names: Extended from 8 to 32 characters
- Labels: Extended from 40 to 256 characters
- Numeric encoding: Uses IEEE 754 instead of IBM floating-point
Use Cases
V8/V9 format may be appropriate for:
- Internal data storage where longer names improve readability
- Non-regulatory data exchange between systems
- Archival purposes where V5 limitations are problematic
- Academic or research datasets not intended for regulatory submission
Regulatory Considerations
FDA Submissions
The FDA Data Standards Catalog explicitly requires XPT V5 format. Files in V8/V9 format will be rejected during technical validation.
CDISC Standards
CDISC standards (SDTM, ADaM) are designed around V5 limitations:
- Variable names: 8 characters maximum
- Labels: 40 characters maximum
Using V8/V9 format with CDISC data defeats the purpose of standardization.
Best Practice
If your data requires longer names or labels:
- Use V5-compliant short names in the XPT file
- Document full names in define.xml metadata
- Use controlled terminology for consistency
Official Specification
SAS Technical Note: Record Layout of a SAS Version 8 or 9 Data Set in SAS Transport Format
Download PDF | View on SAS Support
xportrs Support
xportrs currently focuses on V5 format for regulatory compliance. V8/V9 support is not a priority as it cannot be used for regulatory submissions.
If you need V8/V9 support for non-regulatory purposes, please open an issue to discuss your use case.
Validation Rules
xportrs provides built-in validation to catch compliance issues before file writing. This page documents the validation rules and their severity levels.
Validation Overview
graph LR
subgraph "Validation Pipeline"
A[Dataset] --> B[Agency Rules]
B --> C[V5 Format Rules]
C --> D[Issues Collection]
D --> E{Has Errors?}
E -->|Yes| F[Block Write]
E -->|No| G[Allow Write]
end
Severity Levels
| Severity | Meaning | Blocks Write? |
|---|---|---|
| Error | File would be rejected | Yes |
| Warning | Review recommended | No |
| Info | Best practice suggestion | No |
Built-in Validation Rules
Variable Name Rules
| Rule | Severity | Message |
|---|---|---|
| Name empty | Error | “Variable name cannot be empty” |
| Name >8 bytes | Error | “Variable name exceeds 8 bytes” |
| Invalid characters | Error | “Variable name contains invalid characters” |
| Starts with number | Error | “Variable name must start with a letter” |
Variable Label Rules
| Rule | Severity | Message |
|---|---|---|
| Label missing | Warning | “Variable ‘X’ is missing a label” |
| Label >40 bytes | Error | “Variable label exceeds 40 bytes” |
| Non-ASCII (FDA) | Error | “Variable label contains non-ASCII characters” |
Dataset Rules
| Rule | Severity | Message |
|---|---|---|
| Name empty | Error | “Dataset name cannot be empty” |
| Name >8 bytes | Error | “Dataset name exceeds 8 bytes” |
| Label missing | Warning | “Dataset is missing a label” |
| Label >40 bytes | Error | “Dataset label exceeds 40 bytes” |
Data Rules
| Rule | Severity | Message |
|---|---|---|
| Column length mismatch | Error | “Columns have different lengths” |
| Character >200 bytes | Error | “Character value exceeds 200 bytes” |
Using Validation
Basic Validation
#![allow(unused)]
fn main() {
use xportrs::Xpt;
let validated = Xpt::writer(dataset).finalize() ?;
// Check for any issues
if validated.has_errors() {
eprintln ! ("Cannot write file due to errors:");
for issue in validated.issues() {
if issue.severity() == xportrs::Severity::Error {
eprintln ! (" ERROR: {}", issue);
}
}
return Err("Validation failed".into());
}
// Proceed with write
validated.write_path("output.xpt") ?;
}
Agency-Specific Validation
#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};
// FDA validation (strict ASCII)
let fda_validated = Xpt::writer(dataset.clone())
.agency(Agency::FDA)
.finalize() ?;
// PMDA validation (allows extended characters)
let pmda_validated = Xpt::writer(dataset)
.agency(Agency::PMDA)
.finalize() ?;
}
Filtering Issues
#![allow(unused)]
fn main() {
use xportrs::{Severity, Xpt};
let validated = Xpt::writer(dataset).finalize() ?;
// Get only errors
let errors: Vec<_ > = validated.issues()
.iter()
.filter( | i| i.severity() == Severity::Error)
.collect();
// Get only warnings
let warnings: Vec<_ > = validated.issues()
.iter()
.filter( | i| i.severity() == Severity::Warning)
.collect();
}
Checking Specific Variables
#![allow(unused)]
fn main() {
let validated = Xpt::writer(dataset).finalize() ?;
for issue in validated.issues() {
// Check what the issue targets
match issue.target() {
"USUBJID" => println ! ("Issue with USUBJID: {}", issue),
"AESEQ" => println ! ("Issue with AESEQ: {}", issue),
_ => {}
}
}
}
Pinnacle 21 Rules
xportrs validation covers XPT-level rules. For full CDISC compliance, use Pinnacle 21:
Rules Covered by xportrs
| Pinnacle 21 Rule | Description | xportrs |
|---|---|---|
| SD1001 | Variable name >8 characters | ✅ Error |
| SD1002 | Variable label >40 characters | ✅ Error |
| SD0063 | Missing/mismatched variable label | ✅ Warning |
| SD0063A | Missing/mismatched dataset label | ✅ Warning |
Rules Requiring External Validation
| Pinnacle 21 Rule | Description | Why External |
|---|---|---|
| SD0001 | Missing required variable | Domain-specific |
| SD0002 | Null value in required field | Data content |
| SD0060 | Variable not in define.xml | Requires define.xml |
| CT2002 | Invalid controlled terminology | Requires CDISC CT |
| SE0063 | Label doesn’t match SDTM standard | Requires SDTM metadata |
Custom Validation
You can add custom validation before writing:
use xportrs::{Dataset, Xpt};
fn validate_custom(dataset: &Dataset) -> Vec<String> {
let mut issues = vec![];
// Check for required variables
let required = ["STUDYID", "USUBJID"];
for var in required {
if dataset.column(var).is_none() {
issues.push(format!("Missing required variable: {}", var));
}
}
// Check STUDYID consistency
if let Some(col) = dataset.column("STUDYID") {
if let xportrs::ColumnData::String(values) = col.data() {
let first = values.first().and_then(|v| v.as_ref());
for (i, value) in values.iter().enumerate() {
if value.as_ref() != first {
issues.push(format!("STUDYID inconsistent at row {}", i));
}
}
}
}
issues
}
fn main() -> xportrs::Result<()> {
let dataset = /* ... */;
// Custom validation
let custom_issues = validate_custom(&dataset);
if !custom_issues.is_empty() {
for issue in custom_issues {
eprintln!("Custom validation: {}", issue);
}
return Err(xportrs::Error::invalid_data("Custom validation failed"));
}
// xportrs validation
let validated = Xpt::writer(dataset).finalize()?;
validated.write_path("output.xpt")?;
Ok(())
}
Validation Best Practices
[!TIP] Run validation early in your pipeline to catch issues before processing large datasets.
- Validate incrementally: Check validation after each transformation step
- Log all issues: Even warnings may indicate data quality problems
- Use agency-specific validation: Different agencies have different requirements
- Combine with Pinnacle 21: xportrs + Pinnacle 21 provides comprehensive coverage
- Document exceptions: If you must ship with warnings, document why
XPT File Structure
This page provides a detailed overview of the XPT V5 file structure.
Overall Structure
An XPT file consists of a library (file) level and one or more member (dataset) levels:
graph TB
subgraph "XPT V5 File"
LH["Library Header<br/>80 bytes"]
subgraph "Member 1 (Dataset)"
MH1["Member Header<br/>80 bytes"]
DC1["DSCRPTR Header<br/>80 bytes"]
DD1["Dataset Descriptor<br/>160 bytes"]
NSH1["NAMESTR Header<br/>80 bytes"]
NS1["NAMESTR Records<br/>140 bytes × n"]
OH1["OBS Header<br/>80 bytes"]
OBS1["Observation Data"]
end
subgraph "Member 2 (Optional)"
MH2["Member Header"]
MORE2["..."]
end
LH --> MH1
MH1 --> DC1
DC1 --> DD1
DD1 --> NSH1
NSH1 --> NS1
NS1 --> OH1
OH1 --> OBS1
OBS1 --> MH2
MH2 --> MORE2
end
Header Records
All headers are exactly 80 bytes with a distinctive pattern:
HEADER RECORD*******<type> HEADER RECORD!!!!!!!<numbers>
Library Header
HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
This header identifies the file as an XPT transport file.
Member Header
HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
The numbers indicate:
00000016(hex) = 22 bytes for version information00000140(decimal) = 140 bytes per NAMESTR record
DSCRPTR Header
HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
Introduces the dataset descriptor records.
NAMESTR Header
HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000000000000000000000000
The variable count is embedded in positions 54-57.
OBS Header
HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
Introduces the observation data section.
Dataset Descriptor
The dataset descriptor spans two 80-byte records (160 bytes total):
First Record (80 bytes)
| Offset | Size | Field | Example |
|---|---|---|---|
| 0-7 | 8 | sas1 | SAS |
| 8-15 | 8 | sas2 | SAS |
| 16-23 | 8 | saslib | SASLIB |
| 24-31 | 8 | version | 9.4 |
| 32-39 | 8 | os | X64_10HO |
| 40-47 | 8 | blanks | |
| 48-63 | 16 | created | 01JAN24:00:00:00 |
| 64-79 | 16 | modified | 01JAN24:00:00:00 |
Second Record (80 bytes)
| Offset | Size | Field | Example |
|---|---|---|---|
| 0-7 | 8 | dsname | AE |
| 8-15 | 8 | sasdata | SASDATA |
| 16-23 | 8 | version | 9.4 |
| 24-31 | 8 | os | X64_10HO |
| 32-39 | 8 | blanks | |
| 40-79 | 40 | label | Adverse Events |
NAMESTR Section
After the NAMESTR header, each variable is described by a 140-byte NAMESTR record:
graph LR
subgraph "NAMESTR Layout (140 bytes)"
A["Type Info<br/>0-7"] --> B["Name<br/>8-15"]
B --> C["Label<br/>16-55"]
C --> D["Format<br/>56-69"]
D --> E["Informat<br/>72-83"]
E --> F["Position<br/>84-87"]
F --> G["Reserved<br/>88-139"]
end
See NAMESTR Records for the complete byte-by-byte layout.
NAMESTR Padding
NAMESTR records are packed into 80-byte physical records. Since 140 bytes doesn’t divide evenly into 80:
- 5 NAMESTRs = 700 bytes = 8.75 records → pad to 720 bytes (9 records)
- Formula:
ceil(n_vars * 140 / 80) * 80
Observation Data
After the OBS header, data is stored in row-major order:
[Row 1]──[Var 1][Var 2][Var 3]...[Var N]
[Row 2]──[Var 1][Var 2][Var 3]...[Var N]
...
[Row M]──[Var 1][Var 2][Var 3]...[Var N]
[Padding to 80-byte boundary]
Row Length Calculation
#![allow(unused)]
fn main() {
fn row_length(variables: &[Variable]) -> usize {
variables.iter().map(|v| {
if v.is_numeric() {
8 // Always 8 bytes for numerics
} else {
v.length // 1-200 bytes for characters
}
}).sum()
}
}
End-of-File Padding
The file ends with space padding (0x20) to reach an 80-byte boundary.
Byte Order
All multi-byte integers are big-endian:
#![allow(unused)]
fn main() {
// Reading a 16-bit integer from XPT
let value = i16::from_be_bytes([bytes[0], bytes[1]]);
// Writing a 16-bit integer to XPT
let bytes = value.to_be_bytes();
}
Character Encoding
[!IMPORTANT] For FDA submissions, use ASCII only. xportrs validates this when
Agency::FDAis specified.
Example File (Hex Dump)
00000000: 4845 4144 4552 2052 4543 4f52 442a 2a2a HEADER RECORD***
00000010: 2a2a 2a2a 4c49 4252 4152 5920 4845 4144 ****LIBRARY HEAD
00000020: 4552 2052 4543 4f52 4421 2121 2121 2121 ER RECORD!!!!!!!
00000030: 3030 3030 3030 3030 3030 3030 3030 3030 0000000000000000
00000040: 3030 3030 3030 3030 3030 3030 3030 2020 00000000000000
Multi-Member Files
XPT files can contain multiple datasets (members). Each member has its own:
- Member header
- Dataset descriptor
- NAMESTR section
- Observation data
#![allow(unused)]
fn main() {
use xportrs::Xpt;
// Reading all members
let datasets = Xpt::read_all("multi.xpt")?;
for ds in datasets {
println!("Dataset: {}", ds.domain_code());
}
}
[!NOTE] For FDA submissions, it’s common practice to use one dataset per file, but the format supports multiple.
NAMESTR Records
The NAMESTR (Name String) record describes each variable in the dataset. Each record is exactly 140 bytes.
NAMESTR Layout
%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '11px'}}}%%
graph LR
subgraph "NAMESTR Record (140 bytes)"
A["0-1<br/>ntype"] --> B["2-3<br/>nhfun"]
B --> C["4-5<br/>nlng"]
C --> D["6-7<br/>nvar0"]
D --> E["8-15<br/>nname"]
E --> F["16-55<br/>nlabel"]
F --> G["56-63<br/>nform"]
G --> H["64-65<br/>nfl"]
H --> I["66-67<br/>nfd"]
I --> J["68-69<br/>nfj"]
J --> K["70-71<br/>nfill"]
K --> L["72-79<br/>niform"]
L --> M["80-81<br/>nifl"]
M --> N["82-83<br/>nifd"]
N --> O["84-87<br/>npos"]
O --> P["88-139<br/>rest"]
end
Complete Field Reference
| Offset | Size | Field | Type | Description |
|---|---|---|---|---|
| 0-1 | 2 | ntype | i16 | Variable type: 1=numeric, 2=character |
| 2-3 | 2 | nhfun | i16 | Hash function (always 0) |
| 4-5 | 2 | nlng | i16 | Variable length in bytes |
| 6-7 | 2 | nvar0 | i16 | Variable number (1-based) |
| 8-15 | 8 | nname | char[8] | Variable name (space-padded) |
| 16-55 | 40 | nlabel | char[40] | Variable label (space-padded) |
| 56-63 | 8 | nform | char[8] | Display format name |
| 64-65 | 2 | nfl | i16 | Format length |
| 66-67 | 2 | nfd | i16 | Format decimal places |
| 68-69 | 2 | nfj | i16 | Format justification (0=left, 1=right) |
| 70-71 | 2 | nfill | i16 | Unused padding |
| 72-79 | 8 | niform | char[8] | Input format name |
| 80-81 | 2 | nifl | i16 | Informat length |
| 82-83 | 2 | nifd | i16 | Informat decimal places |
| 84-87 | 4 | npos | i32 | Position in observation |
| 88-139 | 52 | rest | char[52] | Reserved (zeros/spaces) |
Field Details
ntype (Variable Type)
| Value | Meaning | Storage |
|---|---|---|
| 1 | Numeric | 8 bytes, IBM float |
| 2 | Character | 1-200 bytes, space-padded |
nlng (Variable Length)
| Type | Valid Range | Notes |
|---|---|---|
| Numeric | Always 8 | IBM float requires 8 bytes |
| Character | 1-200 | FDA maximum is 200 bytes |
nname (Variable Name)
- 8 bytes, right-padded with spaces
- Uppercase letters A-Z, digits 0-9, underscore
- Must start with a letter
- Example:
USUBJID(note trailing space)
nlabel (Variable Label)
- 40 bytes, right-padded with spaces
- Should be descriptive for data reviewers
- Example:
Unique Subject Identifier
Format Fields (nform, nfl, nfd, nfj)
The display format is stored across four fields:
#![allow(unused)]
fn main() {
// Example: DATE9. format
nform = "DATE " // Format name (8 bytes, space-padded)
nfl = 9 // Total width
nfd = 0 // Decimal places
nfj = 0 // Justification (0=left, 1=right)
// Example: 8.2 format (numeric with 2 decimals)
nform = " " // No named format
nfl = 8 // Total width
nfd = 2 // Decimal places
nfj = 1 // Right-justified (typical for numbers)
// Example: $CHAR200. format
nform = "$CHAR " // Format name with $ prefix
nfl = 200 // Total width
nfd = 0 // Not applicable for character
nfj = 0 // Left-justified (typical for text)
}
Informat Fields (niform, nifl, nifd)
Input format mirrors the display format structure but without justification:
| Field | Size | Description |
|---|---|---|
| niform | 8 | Input format name |
| nifl | 2 | Input format length |
| nifd | 2 | Input format decimals |
npos (Position in Observation)
The byte offset of this variable within each observation row:
Observation Row:
[STUDYID ][USUBJID ][AGE ][SEX]
^ ^ ^ ^
npos=0 npos=20 npos=60 npos=68
Parsing NAMESTR in Rust
#![allow(unused)]
fn main() {
use std::io::{Read, Cursor};
use byteorder::{BigEndian, ReadBytesExt};
struct Namestr {
ntype: i16,
nlng: i16,
nvar0: i16,
nname: String,
nlabel: String,
nform: String,
nfl: i16,
nfd: i16,
nfj: i16,
niform: String,
nifl: i16,
nifd: i16,
npos: i32,
}
fn parse_namestr(bytes: &[u8; 140]) -> Namestr {
let mut cursor = Cursor::new(bytes);
let ntype = cursor.read_i16::<BigEndian>().unwrap();
let _nhfun = cursor.read_i16::<BigEndian>().unwrap();
let nlng = cursor.read_i16::<BigEndian>().unwrap();
let nvar0 = cursor.read_i16::<BigEndian>().unwrap();
let mut nname = [0u8; 8];
cursor.read_exact(&mut nname).unwrap();
let nname = String::from_utf8_lossy(&nname).trim_end().to_string();
let mut nlabel = [0u8; 40];
cursor.read_exact(&mut nlabel).unwrap();
let nlabel = String::from_utf8_lossy(&nlabel).trim_end().to_string();
// ... continue for remaining fields
Namestr { ntype, nlng, nvar0, nname, nlabel, /* ... */ }
}
}
Writing NAMESTR in Rust
#![allow(unused)]
fn main() {
use std::io::Write;
use byteorder::{BigEndian, WriteBytesExt};
fn write_namestr<W: Write>(w: &mut W, var: &Variable, pos: i32) -> std::io::Result<()> {
// ntype
w.write_i16::<BigEndian>(if var.is_numeric { 1 } else { 2 })?;
// nhfun (always 0)
w.write_i16::<BigEndian>(0)?;
// nlng
w.write_i16::<BigEndian>(var.length as i16)?;
// nvar0 (1-based variable number)
w.write_i16::<BigEndian>(var.index as i16 + 1)?;
// nname (8 bytes, space-padded)
let mut name = [b' '; 8];
name[..var.name.len().min(8)].copy_from_slice(var.name.as_bytes());
w.write_all(&name)?;
// nlabel (40 bytes, space-padded)
let mut label = [b' '; 40];
label[..var.label.len().min(40)].copy_from_slice(var.label.as_bytes());
w.write_all(&label)?;
// Format fields...
// Informat fields...
// npos
w.write_i32::<BigEndian>(pos)?;
// rest (52 bytes of zeros)
w.write_all(&[0u8; 52])?;
Ok(())
}
}
xportrs Format API
xportrs provides a high-level API for format handling:
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};
// Create column with format metadata
let col = Column::new("AESTDTC", ColumnData::F64(vec![Some(23391.0)]))
.with_label("Start Date/Time")
.with_format(Format::parse("DATE9.").unwrap());
// The Format struct extracts:
// - name: "DATE"
// - length: 9
// - decimals: 0
// - justification: Right (default for formats)
}
Common Formats
| Format | nform | nfl | nfd | Description |
|---|---|---|---|---|
DATE9. | DATE | 9 | 0 | Date (01JAN2024) |
DATETIME20. | DATETIME | 20 | 0 | Date and time |
8.2 | | 8 | 2 | Numeric with 2 decimals |
BEST12. | BEST | 12 | 0 | Best representation |
$CHAR200. | $CHAR | 200 | 0 | Character (200 bytes) |
$200. | $ | 200 | 0 | Character shorthand |
[!TIP] For FDA submissions, avoid custom formats. Use standard SAS formats like DATE9., DATETIME20., and simple numeric formats.
IBM Floating Point
XPT files use IBM System/360 floating-point format, not IEEE 754. This page explains the format and conversion process.
Format Overview
IBM floating-point uses base-16 (hexadecimal) exponent instead of base-2:
graph LR
subgraph "IBM Float (8 bytes = 64 bits)"
A["Bit 0<br/>Sign"] --> B["Bits 1-7<br/>Exponent<br/>(excess-64)"]
B --> C["Bits 8-63<br/>Mantissa<br/>(56 bits)"]
end
| Field | Bits | Range | Description |
|---|---|---|---|
| Sign | 1 | 0-1 | 0=positive, 1=negative |
| Exponent | 7 | 0-127 | Power of 16, biased by 64 |
| Mantissa | 56 | — | Fractional part in hex |
Key Differences from IEEE 754
| Aspect | IEEE 754 (double) | IBM Float |
|---|---|---|
| Exponent base | 2 | 16 |
| Exponent bias | 1023 | 64 |
| Mantissa bits | 52 | 56 |
| Implied bit | Yes (1.xxx) | No |
| Precision | ~15-17 digits | ~14-16 digits |
| Special values | NaN, ±Inf | Missing values |
Value Calculation
The value of an IBM float is:
value = sign × (0.mantissa) × 16^(exponent - 64)
Where:
sign= +1 if bit 0 is 0, -1 if bit 0 is 1mantissa= fractional value in hexadecimal (0.xxxxxx…)exponent= 7-bit integer from bits 1-7
Conversion Examples
Example 1: Encoding 1.0
1.0 in hex: 0.1 × 16^1
Exponent = 1 + 64 = 65 = 0x41
Mantissa = 0x1000000000000 (1 in top nibble)
Bytes: 41 10 00 00 00 00 00 00
Example 2: Encoding 100.0
100.0 = 0x64 = 0.64 × 16^2
Exponent = 2 + 64 = 66 = 0x42
Mantissa = 0x6400000000000
Bytes: 42 64 00 00 00 00 00 00
Example 3: Encoding -3.14159
3.14159 ≈ 0.3243F6A8885A3 × 16^1
Sign = 1 (negative)
Exponent = 1 + 64 = 65 = 0x41
With sign: 0xC1
Bytes: C1 32 43 F6 A8 88 5A 30
Rust Implementation
Encoding (IEEE → IBM)
#![allow(unused)]
fn main() {
fn ieee_to_ibm(value: f64) -> [u8; 8] {
if value == 0.0 {
return [0u8; 8];
}
let sign = if value < 0.0 { 0x80u8 } else { 0x00u8 };
let abs_value = value.abs();
// Get IEEE 754 components
let bits = abs_value.to_bits();
let ieee_exp = ((bits >> 52) & 0x7FF) as i32 - 1023;
let ieee_mant = bits & 0xFFFFFFFFFFFFF;
// Convert to IBM format
// IBM exponent is power of 16, so divide IEEE exp by 4
let ibm_exp = (ieee_exp + 256) / 4 - 64 + 65; // Adjust for bias
let shift = (ieee_exp + 256) % 4;
// Shift mantissa accordingly
let ibm_mant = ((ieee_mant | 0x10000000000000) >> (4 - shift))
>> (52 - 56); // Extend to 56 bits
let mut result = [0u8; 8];
result[0] = sign | (ibm_exp as u8 & 0x7F);
result[1..8].copy_from_slice(&ibm_mant.to_be_bytes()[1..8]);
result
}
}
Decoding (IBM → IEEE)
#![allow(unused)]
fn main() {
fn ibm_to_ieee(bytes: [u8; 8]) -> f64 {
// Check for zero
if bytes == [0u8; 8] {
return 0.0;
}
// Check for missing value
if bytes[0] == 0x2E || (bytes[0] >= 0x41 && bytes[0] <= 0x5A) {
return f64::NAN; // Represent as NaN
}
let sign = if bytes[0] & 0x80 != 0 { -1.0 } else { 1.0 };
let exp = (bytes[0] & 0x7F) as i32 - 64;
// Extract 56-bit mantissa
let mut mant: u64 = 0;
for i in 1..8 {
mant = (mant << 8) | bytes[i] as u64;
}
// Convert to IEEE
let value = (mant as f64) / (1u64 << 56) as f64;
sign * value * 16.0_f64.powi(exp)
}
}
Missing Values
XPT uses special byte patterns for missing values:
| Missing Type | First Byte | Description |
|---|---|---|
. | 0x2E | Standard missing |
.A | 0x41 | Missing A |
.B | 0x42 | Missing B |
| … | … | … |
.Z | 0x5A | Missing Z |
._ | 0x5F | Missing underscore |
Detecting Missing Values
#![allow(unused)]
fn main() {
fn is_missing(bytes: [u8; 8]) -> Option<char> {
match bytes[0] {
0x2E => Some('.'), // Standard missing
b @ 0x41..=0x5A => Some((b - 0x41 + b'A') as char), // .A-.Z
0x5F => Some('_'), // ._
_ => None,
}
}
}
Precision Considerations
Due to the base-16 exponent, IBM float has variable precision:
| Value Range | Approximate Precision |
|---|---|
| 0.0001 - 0.001 | ~14 digits |
| 0.001 - 1.0 | ~15 digits |
| 1.0 - 1000.0 | ~15-16 digits |
| Large values | ~14 digits |
[!WARNING] When converting from IEEE 754 to IBM float, some precision loss may occur. For critical values, consider storing as character strings.
xportrs Handling
xportrs handles IBM float conversion automatically:
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Dataset, Xpt};
// Numeric values are automatically converted to IBM float on write
let dataset = Dataset::new("LB", vec![
Column::new("LBSTRESN", ColumnData::F64(vec![
Some(3.14159265358979),
Some(100.0),
None, // Becomes SAS missing value
])),
]) ?;
Xpt::writer(dataset)
.finalize() ?
.write_path("lb.xpt") ?;
// On read, IBM floats are automatically converted back to f64
let loaded = Xpt::read("lb.xpt") ?;
}
Testing Conversion
#![allow(unused)]
fn main() {
#[test]
fn test_roundtrip() {
let values = [1.0, -1.0, 100.0, 0.001, 3.14159, 1e10, 1e-10];
for &v in &values {
let ibm = ieee_to_ibm(v);
let back = ibm_to_ieee(ibm);
// Allow for small precision loss
let rel_error = ((v - back) / v).abs();
assert!(rel_error < 1e-14, "Value {} roundtrip error: {}", v, rel_error);
}
}
}
References
Timestamps and Dates
XPT files use the SAS date system for timestamps and dates. This page explains date handling in xportrs.
SAS Epoch
SAS uses January 1, 1960 as its epoch (day zero), different from Unix (1970):
graph LR
subgraph "Date Epochs"
SAS["SAS Epoch<br/>1960-01-01<br/>Day 0"]
UNIX["Unix Epoch<br/>1970-01-01<br/>Day 3653"]
TODAY["2024-01-15<br/>Day 23391"]
end
SAS --> |"3653 days"| UNIX
UNIX --> |"19738 days"| TODAY
Date Types
| Type | Storage | Unit | Example Format |
|---|---|---|---|
| Date | f64 | Days since 1960-01-01 | DATE9. |
| Time | f64 | Seconds since midnight | TIME8. |
| DateTime | f64 | Seconds since 1960-01-01 00:00:00 | DATETIME20. |
Conversion Formulas
Date Conversions
#![allow(unused)]
fn main() {
use chrono::{NaiveDate, Datelike};
// SAS epoch
const SAS_EPOCH: NaiveDate = NaiveDate::from_ymd_opt(1960, 1, 1).unwrap();
/// Convert NaiveDate to SAS date number
fn to_sas_date(date: NaiveDate) -> f64 {
(date - SAS_EPOCH).num_days() as f64
}
/// Convert SAS date number to NaiveDate
fn from_sas_date(sas_date: f64) -> NaiveDate {
SAS_EPOCH + chrono::Duration::days(sas_date as i64)
}
// Examples:
// 1960-01-01 → 0
// 1970-01-01 → 3653
// 2024-01-15 → 23391
}
DateTime Conversions
#![allow(unused)]
fn main() {
use chrono::{NaiveDateTime, NaiveDate, NaiveTime};
/// Convert NaiveDateTime to SAS datetime number
fn to_sas_datetime(dt: NaiveDateTime) -> f64 {
let epoch = NaiveDateTime::new(
NaiveDate::from_ymd_opt(1960, 1, 1).unwrap(),
NaiveTime::from_hms_opt(0, 0, 0).unwrap(),
);
(dt - epoch).num_seconds() as f64
}
/// Convert SAS datetime number to NaiveDateTime
fn from_sas_datetime(sas_dt: f64) -> NaiveDateTime {
let epoch = NaiveDateTime::new(
NaiveDate::from_ymd_opt(1960, 1, 1).unwrap(),
NaiveTime::from_hms_opt(0, 0, 0).unwrap(),
);
epoch + chrono::Duration::seconds(sas_dt as i64)
}
}
Time Conversions
#![allow(unused)]
fn main() {
use chrono::NaiveTime;
/// Convert NaiveTime to SAS time number
fn to_sas_time(time: NaiveTime) -> f64 {
time.num_seconds_from_midnight() as f64
}
/// Convert SAS time number to NaiveTime
fn from_sas_time(sas_time: f64) -> NaiveTime {
let seconds = sas_time as u32;
NaiveTime::from_num_seconds_from_midnight_opt(seconds, 0).unwrap()
}
}
Date Formats
Common Date Formats
| Format | Example Output | Description |
|---|---|---|
DATE9. | 15JAN2024 | Standard SAS date |
DATE7. | 15JAN24 | Short year |
MMDDYY10. | 01/15/2024 | US format |
DDMMYY10. | 15/01/2024 | European format |
YYMMDD10. | 2024-01-15 | ISO format |
E8601DA. | 2024-01-15 | ISO 8601 |
DateTime Formats
| Format | Example Output |
|---|---|
DATETIME20. | 15JAN2024:14:30:00 |
E8601DT. | 2024-01-15T14:30:00 |
Time Formats
| Format | Example Output |
|---|---|
TIME8. | 14:30:00 |
TIME5. | 14:30 |
HHMM. | 14:30 |
Using Dates in xportrs
Storing as Numeric with Format
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};
// Calculate SAS date for 2024-01-15
let sas_date = 23391.0; // Days since 1960-01-01
Column::new("AESTDT", ColumnData::F64(vec![Some(sas_date)]))
.with_label("Start Date")
.with_format_str("DATE9.")?
}
Storing as ISO 8601 String (Recommended)
For SDTM submissions, dates are typically stored as ISO 8601 character strings:
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};
// ISO 8601 date string
Column::new("AESTDTC", ColumnData::String(vec![Some("2024-01-15".into())]))
.with_label("Start Date/Time of Adverse Event")
.with_format(Format::character(19))
.with_length(19)
}
[!TIP] SDTM uses
--DTCvariables (character) for dates/times, while ADaM often uses--DT/--TM(numeric) variables with date formats.
Partial Dates
SDTM allows partial dates in character variables:
| Precision | Example | Description |
|---|---|---|
| Complete | 2024-01-15 | Full date |
| Month | 2024-01 | Unknown day |
| Year | 2024 | Unknown month/day |
#![allow(unused)]
fn main() {
// Partial date examples
let dates = vec![
Some("2024-01-15".to_string()), // Complete
Some("2024-01".to_string()), // Month only
Some("2024".to_string()), // Year only
None, // Missing
];
Column::new("AESTDTC", ColumnData::String(dates))
.with_label("Start Date/Time")
.with_format(Format::character(19))
}
File Timestamps
XPT files contain creation and modification timestamps in the dataset descriptor:
Position 48-63: Creation timestamp (ddMMMyy:hh:mm:ss)
Position 64-79: Modified timestamp (ddMMMyy:hh:mm:ss)
Example: "01JAN24:14:30:00"
Reading File Timestamps
#![allow(unused)]
fn main() {
use xportrs::Xpt;
let info = Xpt::inspect("ae.xpt")?;
if let Some(created) = &info.created {
println!("Created: {}", created);
}
if let Some(modified) = &info.modified {
println!("Modified: {}", modified);
}
}
Time Zone Considerations
[!WARNING] XPT files do not store time zone information. All times are assumed to be in the local time zone where the data was collected.
For SDTM submissions:
- Store times in ISO 8601 format with explicit time zone when known
- Document time zone assumptions in the Reviewer’s Guide
Best Practices
- Use ISO 8601 for SDTM: Store dates as character strings (
AESTDTC) rather than numeric - Use numeric for ADaM: ADaM analysis dates (
ASTDT) are typically numeric with formats - Document partial dates: Use imputation flags (
AESTDTF) to indicate partial date handling - Consider precision: Numeric dates have ~15 digit precision; sub-second precision may be lost
Reference
Text Encoding
XPT files store text as fixed-width byte strings. This page covers character encoding considerations.
Encoding Overview
graph LR
subgraph "Text Encoding Flow"
A[Rust String<br/>UTF-8] --> B{Agency?}
B -->|FDA| C[ASCII Only]
B -->|PMDA| D[Shift-JIS/Latin-1]
B -->|Other| E[Latin-1]
C --> F[XPT File]
D --> F
E --> F
end
Supported Encodings
| Encoding | xportrs Support | Use Case |
|---|---|---|
| ASCII | Full | FDA submissions |
| Latin-1 (ISO-8859-1) | Full | Extended European |
| UTF-8 | Input only | Converted to target |
FDA ASCII Requirements
For FDA submissions, all text must be ASCII (bytes 0x00-0x7F):
#![allow(unused)]
fn main() {
use xportrs::{Agency, Xpt};
// ASCII validation is automatic with FDA agency
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize() ?;
// Non-ASCII characters will generate errors
for issue in validated.issues() {
println ! ("{}", issue);
}
}
Valid ASCII Characters
| Category | Characters |
|---|---|
| Letters | A-Z, a-z |
| Digits | 0-9 |
| Punctuation | !\"#$%&'()*+,-./:;<=>?@[\\]^_\{ |
| Space | (0x20) |
Common Non-ASCII Issues
| Character | Unicode | Issue |
|---|---|---|
| é (e-acute) | U+00E9 | Not ASCII |
| ° (degree) | U+00B0 | Not ASCII |
| µ (micro) | U+00B5 | Not ASCII |
| ® (registered) | U+00AE | Not ASCII |
| — (em dash) | U+2014 | Not ASCII |
| “ “ (smart quotes) | U+201C/D | Not ASCII |
Handling Non-ASCII in FDA Submissions
#![allow(unused)]
fn main() {
/// Replace common non-ASCII characters with ASCII equivalents
fn ascii_safe(s: &str) -> String {
s.chars().map(|c| match c {
'é' | 'è' | 'ê' | 'ë' => 'e',
'á' | 'à' | 'â' | 'ä' => 'a',
'ó' | 'ò' | 'ô' | 'ö' => 'o',
'ú' | 'ù' | 'û' | 'ü' => 'u',
'í' | 'ì' | 'î' | 'ï' => 'i',
'ñ' => 'n',
'ç' => 'c',
'°' => ' ', // or "deg"
'µ' => 'u', // or "micro"
'®' => '(
R)',
'™' => '(TM)',
'"' | '"' => '"',
''' | ''
' => '\'',
'—' | '–' => '-',
c if c.is_ascii() => c,
_ => '?', // Unknown non-ASCII
}).collect()
}
}
Latin-1 Encoding
For non-FDA submissions, Latin-1 (ISO-8859-1) provides extended character support:
#![allow(unused)]
fn main() {
use xportrs::{TextMode, Xpt};
let validated = Xpt::writer(dataset)
.text_mode(TextMode::Latin1)
.finalize() ?;
}
Latin-1 Character Range
| Range | Description |
|---|---|
| 0x00-0x7F | ASCII (same as UTF-8) |
| 0x80-0x9F | Control characters (avoid) |
| 0xA0-0xFF | Extended Latin (accents, symbols) |
Character Variable Length
XPT character variables have a fixed length (1-200 bytes):
graph LR subgraph "Character Field (20 bytes)" A["H"] --> B["e"] --> C["l"] --> D["l"] --> E["o"] E --> F[" "] --> G[" "] --> H["..."] --> I[" "] end
- Values shorter than the field length are right-padded with spaces
- Values longer than the field length are truncated
Explicit Length Control
#![allow(unused)]
fn main() {
use xportrs::{Column, ColumnData, Format};
// Set explicit length to 200 bytes for long text
Column::new("AETERM", ColumnData::String(vec![Some("Headache".into())]))
.with_label("Reported Term")
.with_format(Format::character(200))
.with_length(200)
}
Auto-Derived Length
When no explicit length is set, xportrs derives the length from the data:
#![allow(unused)]
fn main() {
// Length will be max(len("Hello"), len("World")) = 5
let data = vec![Some("Hello".into()), Some("World".into())];
Column::new("VAR", ColumnData::String(data))
}
UTF-8 to Encoding Conversion
xportrs accepts UTF-8 strings and converts to the target encoding:
#![allow(unused)]
fn main() {
// UTF-8 input (Rust default)
let utf8_string = "Héllo Wörld"; // Contains non-ASCII
// With ASCII mode (FDA)
// Error: contains non-ASCII characters
// With Latin-1 mode
// Converted: "Héllo Wörld" → Latin-1 bytes
}
Conversion Errors
Non-representable characters cause errors:
#![allow(unused)]
fn main() {
// Japanese text cannot be represented in Latin-1
let japanese = "日本語";
// This will fail with Latin-1 encoding
// Use ASCII transliteration or Shift-JIS for PMDA
}
Space Padding
XPT uses space (0x20) for padding, not null (0x00):
#![allow(unused)]
fn main() {
fn pad_to_length(s: &str, len: usize) -> Vec<u8> {
let mut bytes = s.as_bytes().to_vec();
bytes.resize(len, b' '); // Space padding
bytes
}
// "Hi" with length 8 → [72, 105, 32, 32, 32, 32, 32, 32]
// 'H' 'i' ' ' ' ' ' ' ' ' ' ' ' '
}
Reading Encoded Text
When reading XPT files, xportrs trims trailing spaces and converts to UTF-8:
#![allow(unused)]
fn main() {
use xportrs::Xpt;
let dataset = Xpt::read("data.xpt") ?;
for col in dataset.columns() {
if let ColumnData::String(values) = col.data() {
for value in values {
if let Some(s) = value {
// s is a Rust String (UTF-8)
println ! ("{}", s);
}
}
}
}
}
Best Practices
- Use ASCII for FDA submissions: Avoid accented characters and symbols
- Validate early: Check for encoding issues before building datasets
- Document character sets: Note any extended character usage in metadata
- Prefer explicit lengths: Set character lengths explicitly for predictable behavior
- Test roundtrip: Verify that read → write → read preserves text correctly
[!IMPORTANT] The FDA Technical Conformance Guide requires ASCII text. Non-ASCII characters may cause validation failures or data integrity issues during regulatory review.
Architecture Overview
This page provides a high-level view of xportrs internal architecture.
Module Structure
graph TB
subgraph "Public API"
XPT[Xpt] --> READER[XptReaderBuilder]
XPT --> WRITER[XptWriterBuilder]
DATASET[Dataset] --> COLUMN[Column]
COLUMN --> COLDATA[ColumnData]
COLUMN --> FORMAT[Format]
end
subgraph "Core Modules"
SCHEMA[schema] --> DERIVE[derive.rs]
SCHEMA --> PLAN[plan.rs]
VALIDATE[validate] --> CHECKS[checks_v5.rs]
VALIDATE --> ISSUES[issues.rs]
end
subgraph "XPT V5 Implementation"
V5[xpt/v5] --> READ[read/]
V5 --> WRITE[write/]
READ --> PARSER[parse.rs]
READ --> OBS[obs.rs]
WRITE --> NAMESTR[namestr.rs]
WRITE --> SPLIT[split.rs]
end
subgraph "Low-Level"
IBM[ibm_float.rs]
RECORD[record.rs]
TIMESTAMP[timestamp.rs]
end
Key Components
Public API Layer
| Component | Purpose |
|---|---|
Xpt | Entry point for reading/writing |
Dataset | Collection of columns with metadata |
Column | Variable data and metadata |
ColumnData | Typed data storage |
Format | SAS format parsing and representation |
Schema Layer
| Component | Purpose |
|---|---|
DatasetSchema | Computed schema for writing |
VariableSpec | Per-variable write plan |
derive_schema_plan() | Computes schema from Dataset |
Validation Layer
| Component | Purpose |
|---|---|
ValidatedWrite | Validated dataset ready to write |
Issue | Validation problem description |
Severity | Error/Warning/Info classification |
XPT V5 Layer
| Component | Purpose |
|---|---|
XptReader | Reads XPT files |
XptWriter | Writes XPT files |
SplitWriter | Handles file splitting |
pack_namestr() | Creates NAMESTR records |
Low-Level Layer
| Component | Purpose |
|---|---|
ibm_float | IBM float encoding/decoding |
record | 80-byte record handling |
timestamp | SAS epoch date handling |
Design Principles
1. Type Safety
Rust’s type system prevents common errors:
#![allow(unused)]
fn main() {
// DomainCode, Label, VariableName are distinct types
let domain = DomainCode::new("AE");
let label = Label::new("Adverse Events");
// Can't accidentally swap them
// ColumnData enforces type consistency
let data = ColumnData::F64(vec![Some(1.0)]);
// Can't mix types within a column
}
2. Builder Pattern
Complex objects use builders for ergonomic construction:
#![allow(unused)]
fn main() {
// Reader builder
let dataset = Xpt::reader("file.xpt")
.row_limit(100)
.read()?;
// Writer builder
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
}
3. Validation Pipeline
Validation happens before writing:
graph LR
A[Dataset] --> B[XptWriterBuilder]
B --> C[finalize]
C --> D[validate_v5_schema]
D --> E[ValidatedWrite]
E --> F{has_errors?}
F --> |No| G[write_path]
F --> |Yes| H[Return Issues]
4. Metadata Preservation
Metadata flows through all operations:
graph LR
subgraph "Read Path"
XPT1[XPT File] --> NS1[NAMESTR]
NS1 --> COL1[Column]
end
subgraph "Storage"
COL1 --> DS[Dataset]
end
subgraph "Write Path"
DS --> VS[VariableSpec]
VS --> NS2[NAMESTR]
NS2 --> XPT2[XPT File]
end
5. Zero-Copy Where Possible
String data uses references where safe:
#![allow(unused)]
fn main() {
// Reading: borrows from buffer where possible
// Writing: uses slices directly when aligned
}
Error Handling
xportrs uses a unified Error type:
#![allow(unused)]
fn main() {
pub enum Error {
Io(std::io::Error),
InvalidHeader { message: String },
InvalidData { message: String },
InvalidSchema { message: String },
MemberNotFound { domain_code: String },
// ...
}
}
Errors implement std::error::Error and are Send + Sync + 'static.
Thread Safety
All public types are Send + Sync:
#![allow(unused)]
fn main() {
// Can be shared across threads
let dataset = Arc::new(Xpt::read("data.xpt")?);
// Can be sent to other threads
std::thread::spawn(move || {
for col in dataset.columns() {
println!("{}", col.name());
}
});
}
Memory Layout
Dataset
Dataset {
domain_code: DomainCode(String),
dataset_label: Option<Label>,
columns: Vec<Column>,
}
Column
Column {
name: VariableName(String),
role: Option<VariableRole>,
data: ColumnData,
label: Option<Label>,
format: Option<Format>,
informat: Option<Format>,
length: Option<usize>,
}
ColumnData
enum ColumnData {
F64(Vec<Option<f64>>),
I64(Vec<Option<i64>>),
Bool(Vec<Option<bool>>),
String(Vec<Option<String>>),
Bytes(Vec<Option<Vec<u8>>>),
Date(Vec<Option<NaiveDate>>),
DateTime(Vec<Option<NaiveDateTime>>),
Time(Vec<Option<NaiveTime>>),
}
Extension Points
Adding New Validation Rules
- Add variant to
Issueenum - Implement
severity()andDisplay - Add check in
validate_v5_schema()
Supporting New Agencies
- Add variant to
Agencyenum - Add agency-specific validation in
checks_v5.rs
Adding Column Types
- Add variant to
ColumnData - Handle in reader/writer
- Add
Fromimplementation
Data Flow
This page details how data flows through xportrs during reading and writing.
Reading Flow
flowchart TB
subgraph "1. File Parsing"
A[XPT File] --> B[parse_header]
B --> C[XptMemberInfo]
C --> D[NamestrV5 records]
end
subgraph "2. Data Reading"
D --> E[ObservationReader]
E --> F[decode_ibm_float]
E --> G[decode_text]
F --> H[ObsValue::Numeric]
G --> I[ObsValue::Character]
end
subgraph "3. Type Conversion"
H --> J[ColumnData::F64]
I --> K[ColumnData::String]
end
subgraph "4. Assembly"
J --> L[Column]
K --> L
D --> |metadata| L
L --> M[Dataset]
end
Step-by-Step Reading
1. Parse File Header
#![allow(unused)]
fn main() {
// In parse.rs
pub fn parse_header<R: Read + Seek>(reader: &mut R) -> Result<XptInfo> {
// Read library header (80 bytes)
let lib_header = read_record(reader)?;
verify_library_header(&lib_header)?;
// Read each member
let mut members = Vec::new();
while let Some(member) = parse_member_header(reader)? {
members.push(member);
}
Ok(XptInfo { members, ... })
}
}
2. Parse NAMESTR Records
#![allow(unused)]
fn main() {
// In namestr.rs
pub fn unpack_namestr(bytes: &[u8; 140]) -> Result<NamestrV5> {
let ntype = i16::from_be_bytes([bytes[0], bytes[1]]);
let nlng = i16::from_be_bytes([bytes[4], bytes[5]]);
let nname = parse_string(&bytes[8..16]);
let nlabel = parse_string(&bytes[16..56]);
let nform = parse_string(&bytes[56..64]);
let nfl = i16::from_be_bytes([bytes[64], bytes[65]]);
// ... more fields
Ok(NamestrV5 { ntype, nlng, nname, nlabel, ... })
}
}
3. Read Observations
#![allow(unused)]
fn main() {
// In obs.rs
pub fn read_observation(&mut self) -> Result<Option<Vec<ObsValue>>> {
let mut row = Vec::with_capacity(self.variables.len());
for var in &self.variables {
if var.is_numeric() {
let bytes = self.read_bytes(8)?;
let value = decode_ibm_float(bytes);
row.push(ObsValue::Numeric(value));
} else {
let bytes = self.read_bytes(var.length)?;
let value = decode_text(bytes);
row.push(ObsValue::Character(value));
}
}
Ok(Some(row))
}
}
4. Build Column with Metadata
#![allow(unused)]
fn main() {
// In reader.rs
let cols: Vec<Column> = member.variables.iter()
.zip(columns)
.map(|(var, data)| {
let mut col = Column::new(&var.nname, data);
// Transfer metadata from NAMESTR
if !var.nlabel.is_empty() {
col = col.with_label(var.nlabel.as_str());
}
if !var.nform.is_empty() {
col = col.with_format(Format::from_namestr(
&var.nform, var.nfl, var.nfd, var.nfj
));
}
if var.is_character() {
col = col.with_length(var.length());
}
col
})
.collect();
}
Writing Flow
flowchart TB
subgraph "1. Schema Planning"
A[Dataset] --> B[derive_schema_plan]
B --> C[DatasetSchema]
C --> D[VariableSpec per column]
end
subgraph "2. Validation"
D --> E[validate_v5_schema]
E --> F[Issue collection]
F --> G{has_errors?}
G --> |Yes| H[Block write]
G --> |No| I[ValidatedWrite]
end
subgraph "3. Writing"
I --> J[XptWriter]
J --> K[write_headers]
J --> L[pack_namestr]
J --> M[write_observations]
end
subgraph "4. Encoding"
M --> N[encode_ibm_float]
M --> O[encode_text]
N --> P[XPT File]
O --> P
end
Step-by-Step Writing
1. Derive Schema
#![allow(unused)]
fn main() {
// In derive.rs
pub fn derive_schema_plan(
dataset: &Dataset,
metadata: Option<&VariableMetadata>,
) -> DatasetSchema {
let variables: Vec<VariableSpec> = dataset.columns()
.iter()
.enumerate()
.map(|(i, col)| {
let mut spec = VariableSpec {
name: col.name().to_uppercase(),
is_numeric: col.data().is_numeric(),
length: compute_length(col),
position: 0, // Computed later
...
};
// Apply Column metadata
if let Some(label) = col.label() {
spec.label = label.to_string();
}
if let Some(format) = col.format() {
spec.format = Some(format.clone());
}
spec
})
.collect();
DatasetSchema { variables, ... }
}
}
2. Validate
#![allow(unused)]
fn main() {
// In checks_v5.rs
pub fn validate_v5_schema(
schema: &DatasetSchema,
options: &WriteOptions,
) -> Vec<Issue> {
let mut issues = Vec::new();
// Dataset-level checks
if schema.dataset_label.is_empty() {
issues.push(Issue::MissingDatasetLabel {
dataset: schema.name.clone()
});
}
// Variable-level checks
for var in &schema.variables {
if var.name.len() > 8 {
issues.push(Issue::VariableNameTooLong { ... });
}
if var.label.is_empty() {
issues.push(Issue::MissingVariableLabel {
variable: var.name.clone()
});
}
// ... more checks
}
issues
}
}
3. Pack NAMESTR
#![allow(unused)]
fn main() {
// In namestr.rs
pub fn pack_namestr<W: Write>(
writer: &mut W,
var: &VariableSpec,
position: i32,
) -> Result<()> {
// ntype
writer.write_i16::<BigEndian>(
if var.is_numeric { 1 } else { 2 }
)?;
// nhfun (always 0)
writer.write_i16::<BigEndian>(0)?;
// nlng
writer.write_i16::<BigEndian>(var.length as i16)?;
// nvar0
writer.write_i16::<BigEndian>(var.index as i16 + 1)?;
// nname (8 bytes, space-padded)
let mut name = [b' '; 8];
name[..var.name.len().min(8)]
.copy_from_slice(var.name.as_bytes());
writer.write_all(&name)?;
// nlabel (40 bytes, space-padded)
let mut label = [b' '; 40];
label[..var.label.len().min(40)]
.copy_from_slice(var.label.as_bytes());
writer.write_all(&label)?;
// Format fields
if let Some(ref format) = var.format {
write_format_fields(writer, format)?;
} else {
write_empty_format_fields(writer)?;
}
// ... remaining fields
Ok(())
}
}
4. Write Observations
#![allow(unused)]
fn main() {
// In writer.rs
fn write_observations<W: Write>(
writer: &mut W,
dataset: &Dataset,
schema: &DatasetSchema,
) -> Result<()> {
for row_idx in 0..dataset.nrows() {
for (col, spec) in dataset.columns().iter()
.zip(&schema.variables)
{
if spec.is_numeric {
let value = get_numeric_value(col, row_idx);
let ibm = encode_ibm_float(value);
writer.write_all(&ibm)?;
} else {
let value = get_string_value(col, row_idx);
let padded = pad_to_length(&value, spec.length);
writer.write_all(&padded)?;
}
}
}
// Pad to 80-byte boundary
pad_to_record_boundary(writer)?;
Ok(())
}
}
Metadata Flow
sequenceDiagram
participant User
participant Column
participant VariableSpec
participant NAMESTR
participant XPT
User->>Column: with_label("Label")
User->>Column: with_format(Format)
Column->>VariableSpec: derive_schema_plan()
Note right of VariableSpec: label, format copied
VariableSpec->>NAMESTR: pack_namestr()
Note right of NAMESTR: nlabel, nform, nfl, nfd
NAMESTR->>XPT: Written to file
Note over XPT,Column: Reading reverses the flow
XPT->>NAMESTR: unpack_namestr()
NAMESTR->>Column: Transfer metadata
Note left of Column: Label, format restored
Error Flow
flowchart TB
A[Operation] --> B{Success?}
B --> |Yes| C[Return Ok]
B --> |No| D[Create Error]
D --> E[Add context]
E --> F[Return Err]
F --> G{Caller handles?}
G --> |Yes| H[Recovery/Fallback]
G --> |No| I[Propagate up]
All errors are:
- Enriched with context
Send + Sync + 'static- Implement
std::error::Error
Schema Derivation
This page explains how xportrs derives the write schema from a Dataset.
Schema Overview
The schema contains all information needed to write an XPT file:
#![allow(unused)]
fn main() {
pub struct DatasetSchema {
pub name: String, // Dataset name (uppercase)
pub label: String, // Dataset label
pub variables: Vec<VariableSpec>,
pub row_length: usize, // Bytes per observation row
}
pub struct VariableSpec {
pub name: String, // Variable name
pub label: String, // Variable label
pub is_numeric: bool, // Type flag
pub length: usize, // Bytes per value
pub position: usize, // Offset in row
pub format: Option<Format>, // Display format
pub informat: Option<Format>, // Input format
}
}
Derivation Process
flowchart TB
subgraph "Input"
A[Dataset] --> B[Columns]
C[VariableMetadata] --> D[Overrides]
end
subgraph "Derivation"
B --> E[For each Column]
E --> F[Compute base spec]
D --> G[Apply overrides]
F --> G
G --> H[VariableSpec]
end
subgraph "Post-Processing"
H --> I[Compute positions]
I --> J[Compute row length]
J --> K[DatasetSchema]
end
Step 1: Compute Base Spec
For each column, derive the base specification:
#![allow(unused)]
fn main() {
fn compute_base_spec(col: &Column, index: usize) -> VariableSpec {
// Determine type
let is_numeric = matches!(col.data(),
ColumnData::F64(_) | ColumnData::I64(_) | ColumnData::Bool(_) |
ColumnData::Date(_) | ColumnData::DateTime(_) | ColumnData::Time(_)
);
// Compute length
let length = if is_numeric {
8 // Always 8 bytes for numerics
} else {
compute_character_length(col)
};
VariableSpec {
name: col.name().to_uppercase(),
label: String::new(),
is_numeric,
length,
position: 0,
format: None,
informat: None,
}
}
}
Step 2: Character Length Computation
Character length is computed from data unless explicitly set:
#![allow(unused)]
fn main() {
fn compute_character_length(col: &Column) -> usize {
// Priority 1: Explicit length override
if let Some(len) = col.explicit_length() {
return len.min(200); // Cap at 200
}
// Priority 2: Derive from data
if let ColumnData::String(values) = col.data() {
let max_len = values.iter()
.filter_map(|v| v.as_ref())
.map(|s| s.len())
.max()
.unwrap_or(1);
// Round up to reasonable size, cap at 200
max_len.max(1).min(200)
} else if let ColumnData::Bytes(values) = col.data() {
let max_len = values.iter()
.filter_map(|v| v.as_ref())
.map(|b| b.len())
.max()
.unwrap_or(1);
max_len.max(1).min(200)
} else {
8 // Default for numeric types
}
}
}
Step 3: Apply Column Metadata
Column metadata is applied to the spec:
#![allow(unused)]
fn main() {
fn apply_column_metadata(spec: &mut VariableSpec, col: &Column) {
// Label from Column
if let Some(label) = col.label() {
spec.label = truncate_to_bytes(label.as_ref(), 40);
}
// Format from Column
if let Some(format) = col.format() {
spec.format = Some(format.clone());
}
// Informat from Column
if let Some(informat) = col.informat() {
spec.informat = Some(informat.clone());
}
// Length override from Column (for character)
if !spec.is_numeric {
if let Some(len) = col.explicit_length() {
spec.length = len.min(200);
}
}
}
}
Step 4: Apply External Metadata
Optional external metadata can override Column values:
#![allow(unused)]
fn main() {
fn apply_external_metadata(
spec: &mut VariableSpec,
meta: Option<&VariableMetadata>,
) {
if let Some(meta) = meta {
// External metadata takes priority
if let Some(label) = &meta.label {
spec.label = truncate_to_bytes(label, 40);
}
if let Some(format) = &meta.format {
spec.format = Some(format.clone());
}
if let Some(length) = meta.length {
if !spec.is_numeric {
spec.length = length.min(200);
}
}
}
}
}
Step 5: Compute Positions
After all specs are created, compute byte positions:
#![allow(unused)]
fn main() {
fn compute_positions(specs: &mut [VariableSpec]) {
let mut position = 0;
for spec in specs {
spec.position = position;
position += spec.length;
}
}
fn compute_row_length(specs: &[VariableSpec]) -> usize {
specs.iter().map(|s| s.length).sum()
}
}
Complete Flow
#![allow(unused)]
fn main() {
pub fn derive_schema_plan(
dataset: &Dataset,
metadata: Option<&VariableMetadata>,
) -> DatasetSchema {
// 1. Derive base specs
let mut variables: Vec<VariableSpec> = dataset.columns()
.iter()
.enumerate()
.map(|(i, col)| compute_base_spec(col, i))
.collect();
// 2. Apply Column metadata
for (spec, col) in variables.iter_mut()
.zip(dataset.columns())
{
apply_column_metadata(spec, col);
}
// 3. Apply external metadata
for spec in &mut variables {
apply_external_metadata(spec, metadata);
}
// 4. Compute positions
compute_positions(&mut variables);
// 5. Build schema
DatasetSchema {
name: dataset.domain_code().to_uppercase(),
label: dataset.dataset_label()
.map(|l| truncate_to_bytes(l, 40))
.unwrap_or_default(),
row_length: compute_row_length(&variables),
variables,
}
}
}
Priority Order
Metadata is applied with this priority (highest to lowest):
- External VariableMetadata - Programmatic overrides
- Column metadata -
.with_label(),.with_format(), etc. - Computed defaults - Derived from data
graph TB
A[Computed Default] --> B[Column Metadata]
B --> C[External Metadata]
C --> D[Final VariableSpec]
style D fill:#90EE90
Validation
The schema is validated after derivation:
#![allow(unused)]
fn main() {
fn validate_schema(schema: &DatasetSchema) -> Vec<Issue> {
let mut issues = Vec::new();
for var in &schema.variables {
// Name validation
if var.name.is_empty() {
issues.push(Issue::InvalidVariableName { ... });
}
if var.name.len() > 8 {
issues.push(Issue::VariableNameTooLong { ... });
}
// Label validation
if var.label.is_empty() {
issues.push(Issue::MissingVariableLabel { ... });
}
// Length validation
if !var.is_numeric && var.length > 200 {
issues.push(Issue::CharacterTooLong { ... });
}
}
issues
}
}
Example
#![allow(unused)]
fn main() {
// Input
let col = Column::new("USUBJID", ColumnData::String(vec![
Some("ABC-001".into()),
Some("ABC-002".into()),
]))
.with_label("Unique Subject Identifier")
.with_format(Format::character(40))
.with_length(40);
// Derived VariableSpec
VariableSpec {
name: "USUBJID",
label: "Unique Subject Identifier",
is_numeric: false,
length: 40,
position: 0,
format: Some(Format::character(40)),
informat: None,
}
}
Official Sources
This page lists authoritative sources for XPT format and regulatory requirements.
SAS Documentation
TS-140: XPT V5 Specification
The authoritative specification for XPT V5 format.
- Title: Record Layout of a SAS Version 5 or 6 Data Set in SAS Transport (XPORT) Format
- Publisher: SAS Institute Inc.
- Link: TS-140 PDF
Key contents:
- File structure (headers, NAMESTR, observations)
- 140-byte NAMESTR record layout
- IBM floating-point encoding
- Character encoding rules
FDA Documentation
Study Data Technical Conformance Guide
Requirements for electronic study data submissions.
- Publisher: U.S. Food and Drug Administration
- Link: TCG PDF
Key contents:
FDA Data Standards Catalog
Supported CDISC standards and versions.
- Link: Data Standards Resources
CDISC Standards
SDTM Implementation Guide
Standard for tabulation data structure.
- Publisher: CDISC
- Link: SDTM-IG
Key contents:
- Domain structures (DM, AE, LB, etc.)
- Variable naming conventions
- Controlled terminology requirements
- Metadata requirements
ADaM Implementation Guide
Standard for analysis datasets.
- Publisher: CDISC
- Link: ADaM-IG
Key contents:
- Analysis dataset structures
- Derived variable conventions
- Traceability requirements
CDISC Controlled Terminology
Standard coded values for CDISC variables.
- Link: CDISC CT
Format Registries
Library of Congress
Format documentation and preservation information.
- XPT Format Family: FDD 000464
- XPT V5 Specific: FDD 000466
Validation Tools
Pinnacle 21
Industry-standard CDISC validation tool.
- Publisher: Certara
- Link: Pinnacle 21
Validates:
- XPT file structure
- CDISC standard compliance
- define.xml consistency
- Controlled terminology
OpenCDISC (Legacy)
Open-source validation (now Pinnacle 21 Community).
- Link: Community Downloads
International Regulators
PMDA (Japan)
NMPA (China)
- Link: NMPA Drug Center
EMA (Europe)
- Link: EMA eSubmission
Technical References
IBM Floating-Point
- Wikipedia: IBM Hexadecimal Floating-Point
- IBM Documentation: System/360 Principles
ISO 8601 Date/Time
- Standard: ISO 8601
Used for SDTM timing variables (--DTC).
Character Encodings
- ASCII: ANSI X3.4
- Latin-1: ISO/IEC 8859-1
Related Tools
xportr (R Package)
R package for XPT file handling:
- Link: xportr on GitHub
- Docs: xportr Documentation
pyreadstat (Python)
Python library for reading statistical file formats:
- Link: pyreadstat
haven (R Package)
R package for reading SAS files:
- Link: haven
Citation
When referencing xportrs in academic or regulatory contexts:
@software{xportrs,
title = {xportrs: SAS Transport (XPT) file format library for Rust},
author = {xportrs contributors},
year = {2024},
url = {https://github.com/rubentalstra/xportrs},
license = {MIT OR Apache-2.0},
}
Glossary
This glossary defines key terms used in xportrs and clinical trial data management.
A
- ADaM (Analysis Data Model)
- CDISC standard for analysis-ready datasets derived from SDTM data. Common datasets include ADSL (subject-level), ADAE (adverse events analysis), and ADLB (laboratory analysis).
- Agency
- Regulatory authority that reviews drug submissions. Major agencies include FDA (US), PMDA (Japan), NMPA (China), and EMA (Europe).
- ANDA (Abbreviated New Drug Application)
- FDA submission type for generic drugs.
- ASCII
- American Standard Code for Information Interchange. Character encoding required by FDA for XPT file text content. Uses bytes 0x00-0x7F.
B
- BLA (Biologics License Application)
- FDA submission type for biological products.
- Big-endian
- Byte order where the most significant byte is stored first. Used in XPT files.
C
- CDASH (Clinical Data Acquisition Standards Harmonization)
- CDISC standard for data collection forms. Upstream of SDTM.
- CDISC (Clinical Data Interchange Standards Consortium)
- Organization that develops data standards for clinical research, including SDTM, ADaM, and controlled terminology.
- Column
- In xportrs, represents a variable with its data and metadata. Corresponds to a variable in XPT terminology.
- ColumnData
- Enum in xportrs representing typed data storage (F64, String, Date, etc.).
- Controlled Terminology
- CDISC-defined standard values for coded variables. Example: SEX must be M, F, U, or UNDIFFERENTIATED.
D
- Dataset
- In xportrs, a collection of columns representing an XPT member. Also called a domain in SDTM context.
- Define-XML
- XML file describing the metadata for CDISC datasets. Required alongside XPT files in submissions.
- Domain
- SDTM term for a dataset representing a specific type of data (DM=Demographics, AE=Adverse Events, etc.).
- DomainCode
- In xportrs, the 1-8 character dataset identifier (e.g., “AE”, “DM”).
E
- eCTD (Electronic Common Technical Document)
- Standard format for regulatory submissions. XPT files are placed in specific eCTD modules.
- EMA (European Medicines Agency)
- Regulatory authority for the European Union.
- Epoch
- Reference date for date calculations. SAS uses January 1, 1960. Unix uses January 1, 1970.
F
- FDA (Food and Drug Administration)
- U.S. regulatory authority for drugs and medical devices.
- Format
- In xportrs, represents a SAS display format (e.g., DATE9., 8.2, $CHAR200.).
I
- IBM Floating-Point
- Hexadecimal (base-16) floating-point format used in XPT files. Different from IEEE 754.
- IND (Investigational New Drug)
- FDA application to begin clinical trials.
- Informat
- SAS input format specifying how data is read. Stored in XPT NAMESTR records.
- Issue
- In xportrs, represents a validation problem (Error, Warning, or Info severity).
L
- Label
- Descriptive text for a dataset or variable. Limited to 40 bytes in XPT V5.
- Latin-1 (ISO-8859-1)
- Character encoding supporting Western European characters. Allowed for non-FDA submissions.
M
- Member
- XPT term for a dataset within a transport file. An XPT file can contain multiple members.
- Missing Value
- XPT uses special byte patterns for missing data. Standard missing is 0x2E (period). Special missing values .A-.Z and ._ are also supported.
N
- NAMESTR
- 140-byte record in XPT files describing a variable’s metadata (name, label, format, type, length).
- NDA (New Drug Application)
- FDA submission type for new drugs.
- NMPA (National Medical Products Administration)
- Regulatory authority for China.
P
- Pinnacle 21
- Industry-standard validation tool for CDISC compliance. Checks XPT files and define.xml.
- PMDA (Pharmaceuticals and Medical Devices Agency)
- Regulatory authority for Japan.
S
- SAS
- Statistical Analysis System. Software that created the XPT format.
- SAS Epoch
- January 1, 1960. Reference date for SAS date values.
- SDTM (Study Data Tabulation Model)
- CDISC standard for tabulation data structure. Defines domains like DM, AE, LB, VS.
- SEND (Standard for Exchange of Nonclinical Data)
- CDISC standard for nonclinical (animal) study data.
- Severity
- Validation issue classification in xportrs: Error (blocks write), Warning (review recommended), Info (suggestion).
T
- TCG (Technical Conformance Guide)
- FDA document specifying electronic submission requirements.
- TS-140
- SAS Technical Note defining the XPT V5 format specification.
U
- USUBJID (Unique Subject Identifier)
- Standard SDTM variable uniquely identifying a subject across all datasets.
V
- ValidatedWrite
- In xportrs, a validated dataset ready to be written to a file.
- VariableName
- In xportrs, the 1-8 character variable identifier (e.g., “USUBJID”).
- VariableRole
- CDISC classification of variables: Identifier, Topic, Timing, Qualifier, Rule, Synonym, Record.
- VariableSpec
- Internal xportrs structure containing computed write specification for a variable.
X
- XPT
- SAS Transport file format. XPT V5 is required for regulatory submissions.
- XPT V5
- Version 5 of SAS Transport format (also called Version 5/6). Uses 8-byte variable names, IBM floating-point, and 80-byte records.
- XPT V8
- Newer SAS Transport format with longer names and IEEE floating-point. Not accepted for FDA submissions.
Changelog
All notable changes to xportrs are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Comprehensive mdbook documentation with mermaid diagrams
- Full regulatory compliance documentation
- API reference documentation
- Usage guides and troubleshooting
[0.0.6] - 2026
Added
-
Format type with parsing: New
Formatstruct for SAS format handlingFormat::parse("DATE9.")- Parse format stringsFormat::numeric(8, 2)- Create numeric formatsFormat::character(200)- Create character formatsFormat::from_namestr()- Reconstruct from XPT fields
-
Column metadata support: Extended
Columnstruct.with_label("...")- Set variable label.with_format(Format)- Set display format.with_format_str("DATE9.")- Parse and set format.with_informat(Format)- Set input format.with_length(n)- Set explicit character length
-
Metadata preservation on read: XPT files now preserve:
- Variable labels from NAMESTR
nlabel - Display formats from NAMESTR
nform,nfl,nfd,nfj - Input formats from NAMESTR
niform,nifl,nifd - Explicit character lengths
- Variable labels from NAMESTR
-
Validation warnings: New validation issues
MissingVariableLabel- Warning when label is emptyMissingDatasetLabel- Warning when dataset label is emptyInvalidFormatSyntax- Error for malformed format strings
Changed
Dataset::with_label()now takesimpl Into<Label>instead ofOption<impl Into<Label>>- Added
Dataset::set_label()for conditional label setting - Format metadata now correctly written to NAMESTR records (previously hardcoded to 0)
Fixed
- Format fields
nfl,nfd,nfj,nifl,nifdnow contain actual values instead of zeros - Metadata roundtrip: labels and formats preserved through read → write cycle
[0.0.5] - 2026
Added
- CITATION.cff for academic citation
- codemeta.json for metadata
- JSON schema support
[0.0.4] - 2026
Added
- Initial public release
- XPT V5 reading and writing
- FDA/PMDA/NMPA agency validation
- Automatic file splitting at 5GB
- IBM floating-point encoding/decoding
- SAS epoch date handling
Features
Xpt::read()- Read XPT filesXpt::write()- Write XPT filesXpt::inspect()- Get file metadataDataset,Column,ColumnDatatypesAgencyenum for regulatory validationIssueandSeverityfor validation results
Migration Guide
From 0.0.5 to 0.0.6
Dataset::with_label signature change
#![allow(unused)]
fn main() {
// Before (0.0.5)
Dataset::with_label("AE", Some("Adverse Events"), columns)
// After (0.0.6)
Dataset::with_label("AE", "Adverse Events", columns)
// For conditional labels
let mut ds = Dataset::new("AE", columns)?;
if let Some(label) = maybe_label {
ds.set_label(label);
}
}
Adding metadata to columns
#![allow(unused)]
fn main() {
// New in 0.0.6
Column::new("VAR", data)
.with_label("Variable Label")
.with_format(Format::character(200))
.with_length(200)
}
Checking for warnings
#![allow(unused)]
fn main() {
// New warnings in 0.0.6
let validated = Xpt::writer(dataset).finalize()?;
for issue in validated.issues() {
match issue {
Issue::MissingVariableLabel { variable } => {
println!("Warning: {} missing label", variable);
}
Issue::MissingDatasetLabel { dataset } => {
println!("Warning: {} missing label", dataset);
}
_ => {}
}
}
}
Compatibility
| xportrs Version | Rust Version | MSRV |
|---|---|---|
| 0.0.6 | 1.70+ | 1.70 |
| 0.0.5 | 1.70+ | 1.70 |
| 0.0.4 | 1.70+ | 1.70 |
License
xportrs is dual-licensed under MIT and Apache 2.0.
For Developers
This section contains information for contributors and developers working on xportrs.
Getting Started
Prerequisites
- Rust 1.70 or later
- Git
Clone and Build
git clone https://github.com/rubentalstra/xportrs.git
cd xportrs
cargo build
Run Tests
cargo test --all-features
Run Clippy
cargo clippy -- -D warnings
Project Structure
xportrs/
├── src/
│ ├── lib.rs # Public API exports
│ ├── dataset/ # Dataset, Column, ColumnData
│ ├── schema/ # Schema derivation
│ ├── validate/ # Validation rules
│ ├── xpt/
│ │ └── v5/ # XPT V5 implementation
│ │ ├── read/ # Reading logic
│ │ └── write/ # Writing logic
│ ├── config/ # Configuration types
│ ├── error/ # Error types
│ └── metadata/ # Metadata types
├── tests/ # Integration tests
├── docs/ # mdbook documentation
└── benches/ # Benchmarks (if any)
Adding New Features
Adding a New Validation Rule
- Add variant to
Issueenum insrc/validate/issues.rs - Implement
severity()method for the new variant - Implement
Displayfor the new variant - Add check in
src/validate/checks_v5.rs - Add tests
Adding a New Column Type
- Add variant to
ColumnDataenum insrc/dataset/domain_dataset.rs - Handle in reader (
src/xpt/v5/read/reader.rs) - Handle in writer (
src/xpt/v5/write/writer.rs) - Add
Fromimplementation - Add tests
Supporting a New Agency
- Add variant to
Agencyenum - Add agency-specific validation in
src/validate/checks_v5.rs - Document in regulatory section
Code Style
- Follow Rust API Guidelines
- Use
cargo fmtbefore committing - Ensure
cargo clippy -- -D warningspasses - Add doc comments for public items
- Include examples in documentation
Testing
Unit Tests
Located alongside the code in mod tests blocks.
Integration Tests
Located in tests/ directory:
tests/v5/read.rs- Reading teststests/v5/write.rs- Writing teststests/api_guidelines.rs- API compliance tests
Test Data
Test XPT files are in tests/data/.
Documentation
Building Docs
cd docs
mdbook build
Serving Locally
cd docs
mdbook serve
Then open http://localhost:3000
Adding Pages
- Create
.mdfile in appropriate directory - Add entry to
SUMMARY.md - Use mermaid for diagrams
Pull Request Guidelines
- Fork the repository
- Create a feature branch
- Make changes with tests
- Ensure CI passes
- Submit PR with clear description
Release Process
- Update version in
Cargo.toml - Update
CHANGELOG.md - Create git tag
- CI publishes to crates.io
Contributors
Here is a list of the contributors who have helped to improve xportrs. Big shout-out to them!
- @rubentalstra
if you feel you’re missing from this list, please open a pull request or issue to get added!