Architecture Overview
This page provides a high-level view of xportrs internal architecture.
Module Structure
graph TB
subgraph "Public API"
XPT[Xpt] --> READER[XptReaderBuilder]
XPT --> WRITER[XptWriterBuilder]
DATASET[Dataset] --> COLUMN[Column]
COLUMN --> COLDATA[ColumnData]
COLUMN --> FORMAT[Format]
end
subgraph "Core Modules"
SCHEMA[schema] --> DERIVE[derive.rs]
SCHEMA --> PLAN[plan.rs]
VALIDATE[validate] --> CHECKS[checks_v5.rs]
VALIDATE --> ISSUES[issues.rs]
end
subgraph "XPT V5 Implementation"
V5[xpt/v5] --> READ[read/]
V5 --> WRITE[write/]
READ --> PARSER[parse.rs]
READ --> OBS[obs.rs]
WRITE --> NAMESTR[namestr.rs]
WRITE --> SPLIT[split.rs]
end
subgraph "Low-Level"
IBM[ibm_float.rs]
RECORD[record.rs]
TIMESTAMP[timestamp.rs]
end
Key Components
Public API Layer
| Component | Purpose |
|---|---|
Xpt | Entry point for reading/writing |
Dataset | Collection of columns with metadata |
Column | Variable data and metadata |
ColumnData | Typed data storage |
Format | SAS format parsing and representation |
Schema Layer
| Component | Purpose |
|---|---|
DatasetSchema | Computed schema for writing |
VariableSpec | Per-variable write plan |
derive_schema_plan() | Computes schema from Dataset |
Validation Layer
| Component | Purpose |
|---|---|
ValidatedWrite | Validated dataset ready to write |
Issue | Validation problem description |
Severity | Error/Warning/Info classification |
XPT V5 Layer
| Component | Purpose |
|---|---|
XptReader | Reads XPT files |
XptWriter | Writes XPT files |
SplitWriter | Handles file splitting |
pack_namestr() | Creates NAMESTR records |
Low-Level Layer
| Component | Purpose |
|---|---|
ibm_float | IBM float encoding/decoding |
record | 80-byte record handling |
timestamp | SAS epoch date handling |
Design Principles
1. Type Safety
Rust’s type system prevents common errors:
#![allow(unused)]
fn main() {
// DomainCode, Label, VariableName are distinct types
let domain = DomainCode::new("AE");
let label = Label::new("Adverse Events");
// Can't accidentally swap them
// ColumnData enforces type consistency
let data = ColumnData::F64(vec![Some(1.0)]);
// Can't mix types within a column
}
2. Builder Pattern
Complex objects use builders for ergonomic construction:
#![allow(unused)]
fn main() {
// Reader builder
let dataset = Xpt::reader("file.xpt")
.row_limit(100)
.read()?;
// Writer builder
let validated = Xpt::writer(dataset)
.agency(Agency::FDA)
.finalize()?;
}
3. Validation Pipeline
Validation happens before writing:
graph LR
A[Dataset] --> B[XptWriterBuilder]
B --> C[finalize]
C --> D[validate_v5_schema]
D --> E[ValidatedWrite]
E --> F{has_errors?}
F --> |No| G[write_path]
F --> |Yes| H[Return Issues]
4. Metadata Preservation
Metadata flows through all operations:
graph LR
subgraph "Read Path"
XPT1[XPT File] --> NS1[NAMESTR]
NS1 --> COL1[Column]
end
subgraph "Storage"
COL1 --> DS[Dataset]
end
subgraph "Write Path"
DS --> VS[VariableSpec]
VS --> NS2[NAMESTR]
NS2 --> XPT2[XPT File]
end
5. Zero-Copy Where Possible
String data uses references where safe:
#![allow(unused)]
fn main() {
// Reading: borrows from buffer where possible
// Writing: uses slices directly when aligned
}
Error Handling
xportrs uses a unified Error type:
#![allow(unused)]
fn main() {
pub enum Error {
Io(std::io::Error),
InvalidHeader { message: String },
InvalidData { message: String },
InvalidSchema { message: String },
MemberNotFound { domain_code: String },
// ...
}
}
Errors implement std::error::Error and are Send + Sync + 'static.
Thread Safety
All public types are Send + Sync:
#![allow(unused)]
fn main() {
// Can be shared across threads
let dataset = Arc::new(Xpt::read("data.xpt")?);
// Can be sent to other threads
std::thread::spawn(move || {
for col in dataset.columns() {
println!("{}", col.name());
}
});
}
Memory Layout
Dataset
Dataset {
domain_code: DomainCode(String),
dataset_label: Option<Label>,
columns: Vec<Column>,
}
Column
Column {
name: VariableName(String),
role: Option<VariableRole>,
data: ColumnData,
label: Option<Label>,
format: Option<Format>,
informat: Option<Format>,
length: Option<usize>,
}
ColumnData
enum ColumnData {
F64(Vec<Option<f64>>),
I64(Vec<Option<i64>>),
Bool(Vec<Option<bool>>),
String(Vec<Option<String>>),
Bytes(Vec<Option<Vec<u8>>>),
Date(Vec<Option<NaiveDate>>),
DateTime(Vec<Option<NaiveDateTime>>),
Time(Vec<Option<NaiveTime>>),
}
Extension Points
Adding New Validation Rules
- Add variant to
Issueenum - Implement
severity()andDisplay - Add check in
validate_v5_schema()
Supporting New Agencies
- Add variant to
Agencyenum - Add agency-specific validation in
checks_v5.rs
Adding Column Types
- Add variant to
ColumnData - Handle in reader/writer
- Add
Fromimplementation