Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Design Decisions

Key architectural decisions and their rationale.

Why Rust?

Chosen: Rust

Rationale:

  • Memory safety without garbage collection
  • Performance comparable to C/C++
  • Type system catches errors at compile time
  • Cross-platform compilation to native binaries
  • Growing ecosystem for data processing

Alternatives Considered

LanguageProsCons
PythonFamiliar, many librariesPerformance, distribution
JavaCross-platform, matureJVM dependency, startup time
C++PerformanceMemory safety, complexity
GoSimple, fast compilationLess expressive types

Why Iced for GUI?

Chosen: Iced 0.14.0

Rationale:

  • Elm architecture - Predictable state management with unidirectional data flow
  • Pure Rust - No FFI complexity, native performance
  • Cross-platform - macOS, Windows, Linux
  • Type-safe messages - Compile-time guarantees for all user interactions
  • Async-first - Built-in Task system for background operations
  • Multi-window - Native support for dialog windows

Architecture Benefits

flowchart LR
    subgraph "Elm Architecture"
        View["View<br/>(render UI)"]
        Message["Message<br/>(user action)"]
        Update["Update<br/>(handle message)"]
        State["State<br/>(app data)"]
    end

    View --> Message
    Message --> Update
    Update --> State
    State --> View

    style View fill:#4a90d9,color:#fff
    style State fill:#50c878,color:#fff

The Elm architecture ensures:

  • State is the single source of truth
  • All state changes flow through update()
  • Views are pure functions of state
  • Easy debugging and testing

Alternatives Considered

FrameworkProsCons
eguiSimple immediate mode, rapid prototypingHarder state management at scale, no multi-window
TauriWeb tech, flexibleBundle size, two languages (Rust + JS)
GTK-rsNative lookPlatform differences, complex bindings
QtMature, richLicense complexity, C++ bindings

Why Polars for Data?

Chosen: Polars

Rationale:

  • Performance - Lazy evaluation, parallelism
  • Rust native - No Python dependency
  • DataFrame API - Familiar for data work
  • Memory efficient - Arrow-based

Alternatives Considered

LibraryProsCons
ndarrayLow-level controlMore manual work
ArrowStandard formatLess DataFrame features
CustomFull controlDevelopment time

Why Embed Standards?

Chosen: Embedded CDISC data

Rationale:

  • Offline operation - No network dependency
  • Deterministic - Consistent across runs
  • Fast - No API latency
  • Regulatory - Audit trail

Alternatives Considered

ApproachProsCons
API-basedAlways currentNetwork required, latency
Download on demandSmaller binaryCaching complexity
Plugin systemFlexibleDistribution complexity

Workspace Architecture

Chosen: Multi-crate workspace

Rationale:

  • Separation of concerns - Clear boundaries
  • Parallel compilation - Faster builds
  • Selective testing - Test only changed crates
  • Reusability - Crates can be used independently

Crate Boundaries

CratePrinciple
tss-guiUI only, delegates all processing to other crates
tss-submitCore pipeline (map, normalize, validate, export)
tss-ingestCSV parsing only, no transformation logic
tss-standardsPure data loading, no transformation logic
tss-updaterUpdate mechanism, no UI dependencies
tss-updater-helpermacOS-only binary, minimal dependencies

Data Processing Pipeline

Chosen: Lazy evaluation with checkpoints

Rationale:

  • Memory efficiency - Don’t load all data at once
  • Performance - Optimize query plans
  • Transparency - User sees intermediate results
  • Recoverability - Can resume from checkpoints

Pipeline Stages

flowchart LR
    subgraph Stage1[Import]
        I1[CSV File]
        I2[Schema Detection]
    end

    subgraph Stage2[Map]
        M1[Column Matching]
        M2[Type Conversion]
    end

    subgraph Stage3[Validate]
        V1[Structure Rules]
        V2[CT Validation]
        V3[Cross-Domain]
    end

    subgraph Stage4[Export]
        E1[XPT Generation]
        E2[XML Output]
    end

    I1 --> I2 --> M1 --> M2 --> V1 --> V2 --> V3 --> E1
    V3 --> E2
    V1 -.->|Errors| M1
    V2 -.->|Warnings| M1
    style I1 fill: #e8f4f8, stroke: #333
    style E1 fill: #d4edda, stroke: #333
    style E2 fill: #d4edda, stroke: #333

Validation Strategy

Chosen: Multi-level validation

Rationale:

  • Early feedback - Catch issues during mapping
  • Complete checking - Full validation before export
  • Severity levels - Error vs. warning vs. info
  • Actionable - Clear fix suggestions

Validation Levels

flowchart TB
    subgraph "Validation Layers"
        direction TB
        L1[Schema Validation<br/>File structure, encoding]
        L2[Mapping Validation<br/>Variable compatibility, types]
        L3[Content Validation<br/>CDISC compliance, CT checks]
        L4[Output Validation<br/>Format conformance, checksums]
    end

    IMPORT[Import] --> L1
    L1 --> MAP[Map]
    MAP --> L2
    L2 --> TRANSFORM[Transform]
    TRANSFORM --> L3
    L3 --> EXPORT[Export]
    EXPORT --> L4
    L4 --> OUTPUT[Output Files]
    L1 -.->|Schema Error| IMPORT
    L2 -.->|Type Mismatch| MAP
    L3 -.->|CT Error| TRANSFORM
    style L1 fill: #ffeeba, stroke: #333
    style L2 fill: #ffeeba, stroke: #333
    style L3 fill: #ffeeba, stroke: #333
    style L4 fill: #ffeeba, stroke: #333
    style OUTPUT fill: #d4edda, stroke: #333
LevelWhenPurpose
SchemaImportFile structure
MappingMap stepVariable compatibility
ContentPre-exportCDISC compliance
OutputExportFormat conformance

Error Handling

Chosen: Result types with context

Rationale:

  • No panics - Graceful error handling
  • Context - Where and why errors occurred
  • Recovery - Allow user to fix and continue
  • Logging - Full trace for debugging

Error Categories

CategoryHandling
User errorDisplay message, allow retry
Data errorShow affected rows, suggest fix
System errorLog, display generic message
BugLog with context, fail gracefully

File Format Choices

XPT V5 as Default

Rationale:

  • FDA requirement for submissions
  • Maximum compatibility
  • Well-documented format

XPT V8 as Option

Rationale:

  • Longer variable names
  • Larger labels
  • Future-proofing

Security Considerations

Data Privacy

  • No cloud - All processing local
  • No telemetry - No usage data collection
  • No network - Works fully offline

Code Security

  • Dependency audit - Regular cargo audit
  • Minimal dependencies - Reduce attack surface
  • Memory safety - Rust’s guarantees

Performance Goals

Target Metrics

OperationTarget
Import 100K rows< 2 seconds
Validation< 5 seconds
Export to XPT< 3 seconds
Application startup< 1 second

Optimization Strategies

  • Lazy evaluation
  • Parallel processing
  • Memory mapping for large files
  • Incremental validation

Future Considerations

Extensibility

The architecture supports future additions:

  • New CDISC standards (ADaM, SEND)
  • Additional output formats
  • Plugin system (potential)
  • CLI interface (potential)

Backward Compatibility

  • Configuration format versioning
  • Data migration paths
  • Deprecation warnings

Next Steps