Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Welcome to Trial Submission Studio

Trial Submission Studio

Transform clinical trial data into FDA-compliant CDISC formats with confidence.

Trial Submission Studio is a free, open-source desktop application for transforming clinical trial source data (CSV) into CDISC-compliant submission formats.

Caution

ALPHA SOFTWARE - ACTIVE DEVELOPMENT

Trial Submission Studio is currently in early development. Features are incomplete, APIs may change, and bugs are expected. Do not use for production regulatory submissions.

Always validate all outputs with qualified regulatory professionals before submission to regulatory authorities.


See It in Action

Select your CDISC standard and open your study data:

Welcome Screen

Automatic domain discovery with intelligent column mapping:

Column Mapping

Built-in validation against CDISC standards:

Validation



Key Features

FeatureDescription
Multi-format OutputXPT V5/V8, Dataset-XML, Define-XML 2.1
Intelligent MappingFuzzy matching for automatic column-to-variable mapping
CDISC ValidationBuilt-in controlled terminology validation
Cross-platformNative GUI for macOS, Windows, and Linux
Offline OperationAll CDISC standards embedded locally

Supported Standards

Currently Supported:

  • SDTM-IG v3.4
  • Controlled Terminology (2024-2025 versions)

Planned:

  • ADaM-IG v1.3
  • SEND-IG v3.1.1

Getting Help


License

Trial Submission Studio is open source software licensed under the MIT License.


Built with Rust and egui

Installation

Download the latest release for your platform from our GitHub Releases page.

Download Options

PlatformArchitectureFormatDownload
macOSApple Silicon (M1/M2/M3+).dmg or .zipDownload
macOSIntel (x86_64).dmg or .zipDownload
Windowsx86_64 (64-bit).zipDownload
WindowsARM64.zipDownload
Linuxx86_64 (64-bit).tar.gzDownload

Verifying Your Download

Each release includes SHA256 checksum files (.sha256) for security verification.

macOS/Linux

# Download the checksum file and binary, then verify
shasum -a 256 -c trial-submission-studio-*.sha256

Windows (PowerShell)

# Compare the checksum
Get-FileHash trial-submission-studio-*.zip -Algorithm SHA256

Platform-Specific Instructions

macOS

  1. Download the .dmg file for your architecture
  2. Open the .dmg file
  3. Drag Trial Submission Studio to your Applications folder
  4. On first launch, you may need to right-click and select “Open” to bypass Gatekeeper

[!TIP] Which version do I need?

Click the Apple menu () > About This Mac:

  • Chip: Apple M1/M2/M3 → Download the Apple Silicon version
  • Processor: Intel → Download the Intel version

Windows

  1. Download the .zip file for your architecture
  2. Extract the archive to your preferred location
  3. Run trial-submission-studio.exe

Linux

  1. Download the .tar.gz file
  2. Extract: tar -xzf trial-submission-studio-*.tar.gz
  3. Run: ./trial-submission-studio

Uninstalling

Trial Submission Studio is a portable application that does not modify system settings or registry entries.

Windows

  1. Delete the extracted folder containing trial-submission-studio.exe
  2. Optionally delete settings from %APPDATA%\trial-submission-studio\

macOS

  1. Drag Trial Submission Studio from Applications to Trash
  2. Optionally delete settings from ~/Library/Application Support/trial-submission-studio/

Linux

  1. Delete the AppImage file or extracted folder
  2. Optionally delete settings from ~/.config/trial-submission-studio/

Next Steps

Quick Start Guide

Get up and running with Trial Submission Studio in 5 minutes.

Overview

This guide walks you through the basic workflow:

  1. Import your source CSV data
  2. Map columns to SDTM variables
  3. Validate against CDISC standards
  4. Export to XPT format

Step 1: Launch the Application

After installing Trial Submission Studio, launch the application:

  • macOS: Open from Applications folder
  • Windows: Run trial-submission-studio.exe
  • Linux: Run ./trial-submission-studio

You’ll see the welcome screen where you can select your CDISC standard:

Welcome Screen


Step 2: Import Your Data

  1. Click Open Study Folder and select your data folder
  2. Trial Submission Studio will automatically:
    • Detect column types
    • Identify potential SDTM domains
    • Parse date formats

Tip

Your data should have column headers in the first row.


Step 3: Review Discovered Domains

Trial Submission Studio automatically discovers domains from your source data:

Study Overview

  1. Review the list of discovered domains (DM, AE, VS, etc.)
  2. Click on a domain to configure its mappings

Step 4: Map Columns

  1. Review the suggested column mappings
  2. For each source column, select the corresponding SDTM variable
  3. Use the fuzzy matching suggestions to speed up mapping

Column Mapping

The mapping interface shows:

  • Source Column: Your CSV column name
  • Target Variable: The SDTM variable
  • Match Score: Confidence of the suggested mapping (e.g., 93% match)

Step 5: Validate

  1. Switch to the Validation tab to check your data against CDISC rules
  2. Review any validation messages:
    • Errors: Must be fixed before export
    • Warnings: Should be reviewed
    • Info: Informational messages

Validation Results

Each validation issue includes the rule ID, a description, and suggestions on how to fix it.


Step 6: Export

  1. Click Go to Export or navigate to the Export screen
  2. Select which domains to export
  3. Choose your output format:
    • XPT (SAS Transport) (FDA standard)
    • Dataset-XML (CDISC data exchange)
  4. Click Export

Export Settings


Next Steps

Now that you’ve completed the basic workflow:

System Requirements

Trial Submission Studio is designed to run on modern desktop systems with minimal resource requirements.

Supported Platforms

PlatformArchitectureMinimum VersionStatus
macOSApple Silicon (M1/M2/M3+)macOS 11.0 (Big Sur)Supported
macOSIntel (x86_64)macOS 10.15 (Catalina)Supported
Windowsx86_64 (64-bit)Windows 10Supported
WindowsARM64Windows 11Supported
Linuxx86_64 (64-bit)Ubuntu 20.04 or equivalentSupported

Hardware Requirements

ComponentMinimumRecommended
RAM4 GB8 GB+
Disk Space200 MB500 MB
Display1280x7201920x1080+

Software Dependencies

Trial Submission Studio is a standalone application with no external dependencies:

  • No SAS installation required
  • No Java runtime required
  • No internet connection required (works fully offline)
  • All CDISC standards are embedded in the application

Performance Considerations

Large Datasets

Trial Submission Studio can handle datasets with:

  • Hundreds of thousands of rows
  • Hundreds of columns

For very large datasets (1M+ rows), consider:

  • Ensuring adequate RAM (8GB+)
  • Using SSD storage for faster I/O
  • Processing data in batches if needed

Memory Usage

Memory usage scales with dataset size. Approximate guidelines:

  • Small datasets (<10,000 rows): ~100 MB RAM
  • Medium datasets (10,000-100,000 rows): ~500 MB RAM
  • Large datasets (100,000+ rows): 1+ GB RAM

Troubleshooting

macOS Gatekeeper

On first launch, macOS may block the application. To resolve:

  1. Right-click the application
  2. Select “Open”
  3. Click “Open” in the dialog

Linux Permissions

Ensure the executable has run permissions:

chmod +x trial-submission-studio

Windows SmartScreen

If Windows SmartScreen blocks the application:

  1. Click “More info”
  2. Click “Run anyway”

Next Steps

Building from Source

For developers who want to compile Trial Submission Studio from source code.

Prerequisites

Required

  • Rust 1.92+ - Install via rustup
  • Git - For cloning the repository

Platform-Specific Dependencies

macOS

No additional dependencies required.

Linux (Ubuntu/Debian)

sudo apt-get install libgtk-3-dev libxdo-dev

Windows

No additional dependencies required.

Clone the Repository

git clone https://github.com/rubentalstra/trial-submission-studio.git
cd trial-submission-studio

Verify Rust Version

rustup show

Ensure you have Rust 1.92 or higher. To update:

rustup update stable

Build

Debug Build (faster compilation)

cargo build

Release Build (optimized, slower compilation)

cargo build --release

Run

Debug

cargo run --package tss-gui

Release

cargo run --release --package tss-gui

Or run the compiled binary directly:

./target/release/tss-gui        # macOS/Linux
.\target\release\tss-gui.exe    # Windows

Run Tests

# All tests
cargo test

# Specific crate
cargo test --package xport

# With output
cargo test -- --nocapture

Run Lints

# Format check
cargo fmt --check

# Clippy lints
cargo clippy -- -D warnings

Project Structure

Trial Submission Studio is organized as a Rust workspace with multiple crates:

trial-submission-studio/
├── crates/
│   ├── tss-gui/          # Desktop application
│   ├── xport/            # XPT file I/O
│   ├── tss-validate/     # CDISC validation
│   ├── tss-map/          # Column mapping
│   ├── tss-normalization/    # Data transformations
│   ├── tss-ingest/       # CSV loading
│   ├── tss-output/       # Multi-format export
│   ├── tss-standards/    # CDISC standards loader
│   ├── tss-model/        # Core types + Polars utilities
│   └── tss-updater/      # Update mechanism
├── standards/            # Embedded CDISC standards
├── mockdata/             # Test datasets
└── docs/                 # Documentation (this site)

Third-Party Licenses

When adding or updating dependencies, regenerate the licenses file:

# Install cargo-about (one-time)
cargo install cargo-about

# Generate licenses
cargo about generate about.hbs -o THIRD_PARTY_LICENSES.md

IDE Setup

RustRover / IntelliJ IDEA

  1. Open the project folder
  2. The Rust plugin will detect the workspace automatically

VS Code

  1. Install the rust-analyzer extension
  2. Open the project folder

Next Steps

Interface Overview

Trial Submission Studio features a clean, intuitive interface designed for clinical data programmers.

Welcome Screen

When you first launch the application, you’ll see the welcome screen where you can select your target CDISC standard and open a study folder:

Welcome Screen

Study Overview

After opening a study folder, Trial Submission Studio automatically discovers domains from your source data:

Study Overview

Main Window Layout

The application is organized into several key areas:

┌─────────────────────────────────────────────────────────────┐
│  Menu Bar                                                    │
├─────────────────────────────────────────────────────────────┤
│  Toolbar                                                     │
├──────────────────┬──────────────────────────────────────────┤
│                  │                                           │
│  Navigation      │  Main Content Area                        │
│  Panel           │                                           │
│                  │  - Data Preview                           │
│  - Import        │  - Mapping Interface                      │
│  - Mapping       │  - Validation Results                     │
│  - Validation    │  - Export Options                         │
│  - Export        │                                           │
│                  │                                           │
├──────────────────┴──────────────────────────────────────────┤
│  Status Bar                                                  │
└─────────────────────────────────────────────────────────────┘

File Menu

  • Import CSV - Load source data
  • Export - Save to XPT/XML formats
  • Recent Files - Quick access to recent projects
  • Exit - Close the application

Edit Menu

  • Undo/Redo - Reverse or repeat actions
  • Preferences - Application settings

Help Menu

  • Documentation - Open this documentation
  • About - Version and license information
  • Third-Party Licenses - Dependency attributions

About Dialog

Toolbar

Quick access to common actions:

  • Import - Load CSV file
  • Validate - Run validation checks
  • Export - Save output files

The left sidebar provides step-by-step workflow navigation:

  1. Import - Load and preview source data
  2. Domain - Select target SDTM domain
  3. Mapping - Map columns to variables
  4. Validation - Review validation results
  5. Export - Generate output files

Main Content Area

The central area displays context-sensitive content based on the current workflow step:

Import View

  • File selection
  • Data preview table
  • Column type detection
  • Schema information

Mapping View

  • Source columns list
  • Target variables list
  • Mapping connections
  • Match confidence scores

Validation View

  • Validation rule results
  • Error/warning/info messages
  • Affected rows and columns
  • Suggested fixes

Validation View

Preview View

Preview your SDTM-compliant data before export:

SDTM Preview

Export View

  • Format selection
  • Output options
  • File destination
  • Progress indicator

Status Bar

The bottom bar displays:

  • Current file name
  • Row/column counts
  • Validation status
  • Progress for long operations

Keyboard Shortcuts

ActionmacOSWindows/Linux
Import⌘OCtrl+O
Export⌘ECtrl+E
Validate⌘RCtrl+R
Undo⌘ZCtrl+Z
Redo⌘⇧ZCtrl+Shift+Z
Preferences⌘,Ctrl+,
Quit⌘QAlt+F4

Themes

Trial Submission Studio supports light and dark themes. Change via: Edit → Preferences → Appearance

Next Steps

Importing Data

Trial Submission Studio accepts CSV files as input and automatically detects schema information.

Supported Input Format

Currently, Trial Submission Studio supports:

  • CSV files (.csv)
  • UTF-8 or ASCII encoding
  • Comma-separated values
  • Headers in first row

Import Methods

Drag and Drop

Simply drag a CSV file from your file manager and drop it onto the application window.

File Menu

  1. Click File → Import CSV
  2. Navigate to your file
  3. Click Open

Toolbar Button

Click the Import button in the toolbar.

Automatic Detection

When you import a file, Trial Submission Studio automatically:

Column Type Detection

Analyzes sample values to determine:

  • Numeric - Integer or floating-point numbers
  • Date/Time - Various date formats
  • Text - Character strings

Domain Suggestion

Based on column names, suggests likely SDTM domains:

  • USUBJID, AGE, SEX → Demographics (DM)
  • AETERM, AESTDTC → Adverse Events (AE)
  • VSTESTCD, VSSTRESN → Vital Signs (VS)

Date Format Detection

Automatically recognizes common date formats:

  • ISO 8601: 2024-01-15
  • US format: 01/15/2024
  • EU format: 15-01-2024
  • With time: 2024-01-15T09:30:00

Data Preview

After import, you’ll see:

Data Grid

  • First 100 rows displayed
  • Scroll to view more data
  • Column headers with detected types

Summary Panel

  • Total row count
  • Total column count
  • File size
  • Encoding detected

Column Information

  • Column name
  • Detected type
  • Sample values
  • Null count

Handling Issues

Encoding Problems

If you see garbled characters:

  1. Ensure your file is UTF-8 encoded
  2. Re-save from your source application with UTF-8 encoding

Missing Headers

If your CSV lacks headers:

  1. Add a header row to your file
  2. Re-import

Large Files

For files with millions of rows:

  • Import may take longer
  • A progress indicator will show status
  • Consider splitting into smaller files if needed

Best Practices

  1. Clean your data before import

    • Remove trailing whitespace
    • Standardize date formats
    • Check for encoding issues
  2. Use descriptive column names

    • Helps with automatic mapping suggestions
    • Use SDTM-like naming when possible
  3. Include all required data

    • USUBJID for subject identification
    • Domain-specific required variables

Next Steps

Column Mapping

The mapping interface helps you connect your source CSV columns to SDTM variables.

Mapping Interface

Overview

Column mapping is a critical step that defines how your source data transforms into SDTM-compliant output.

flowchart LR
    subgraph Source[Source CSV]
        S1[SUBJ_ID]
        S2[PATIENT_AGE]
        S3[GENDER]
        S4[VISIT_DATE]
    end

    subgraph Mapping[Fuzzy Matching]
        M[Match<br/>Algorithm]
    end

    subgraph Target[SDTM Variables]
        T1[USUBJID]
        T2[AGE]
        T3[SEX]
        T4[RFSTDTC]
    end

    S1 --> M --> T1
    S2 --> M --> T2
    S3 --> M --> T3
    S4 --> M --> T4
    style M fill: #4a90d9, color: #fff

The Mapping Interface

┌─────────────────────────────────────────────────────────────┐
│ Source Columns          │  Target Variables                 │
├─────────────────────────┼───────────────────────────────────┤
│ SUBJ_ID         ────────│──▶  USUBJID                       │
│ PATIENT_AGE     ────────│──▶  AGE                           │
│ GENDER          ────────│──▶  SEX                           │
│ VISIT_DATE      ────────│──▶  RFSTDTC                       │
│ RACE_DESC       ────────│──▶  RACE                          │
│ [Unmapped]              │     ETHNIC (Required)             │
└─────────────────────────┴───────────────────────────────────┘

Automatic Mapping

Trial Submission Studio uses fuzzy matching to suggest mappings:

How It Works

  1. Analyzes source column names
  2. Compares against SDTM variable names
  3. Calculates similarity scores
  4. Suggests best matches

Match Confidence

  • High (>80%) - Strong name similarity, auto-accepted
  • Medium (50-80%) - Review recommended
  • Low (<50%) - Manual mapping needed

Example Matches

Source ColumnSuggested VariableConfidence
SUBJECT_IDUSUBJID85%
AGEAGE100%
GENDERSEX75%
VSTESTVALVSSTRESN70%

Manual Mapping

To Map a Column

  1. Click on the source column
  2. Click on the target variable
  3. A connection line appears

To Unmap a Column

  1. Click on the connection line
  2. Or right-click and select “Remove Mapping”

To Change a Mapping

  1. Remove the existing mapping
  2. Create a new mapping

Required vs Optional Variables

Required Variables

Shown with a red indicator. Must be mapped for valid output:

  • STUDYID - Study identifier
  • DOMAIN - Domain abbreviation
  • USUBJID - Unique subject identifier

Optional Variables

Shown without indicator. Map if data is available.

Expected Variables

Shown with yellow indicator. Expected for the domain but not strictly required.

Data Type Considerations

The mapping interface warns about type mismatches:

WarningDescription
Type MismatchSource is text, target is numeric
Length ExceededSource values exceed SDTM length limits
Format WarningDate format needs conversion

Controlled Terminology

For variables with controlled terminology:

  • The interface shows valid values
  • Warns if source values don’t match
  • Suggests value mappings

CT Normalization

The Transform tab allows you to normalize values to CDISC Controlled Terminology:

CT Normalization

Values are automatically transformed to their standardized form (e.g., “Years” → “YEARS”).

Supplemental Qualifiers (SUPP)

For non-standard variables that need to be captured as supplemental qualifiers, use the SUPP tab:

SUPP Configuration

Configure QNAM, QLABEL, QORIG, and QEVAL for each supplemental qualifier variable.

Mapping Templates

Save a Template

  1. Complete your mappings
  2. File → Save Mapping Template
  3. Name your template

Load a Template

  1. Import your data
  2. File → Load Mapping Template
  3. Select the template
  4. Review and adjust as needed

Best Practices

  1. Review all automatic mappings - Don’t blindly accept
  2. Map required variables first - Ensure compliance
  3. Check controlled terminology - Validate allowed values
  4. Save templates - Reuse for similar datasets

Next Steps

Validation

Trial Submission Studio validates your data against CDISC standards before export.

Validation Results

Validation Overview

flowchart LR
    subgraph Input
        DATA[Mapped Data]
    end

    subgraph Checks
        STRUCT[Structure<br/>Required variables]
        CT[Terminology<br/>Codelist values]
        CROSS[Cross-Domain<br/>Consistency]
    end

    subgraph Output
        ERR[Errors]
        WARN[Warnings]
        INFO[Info]
    end

    DATA --> STRUCT --> CT --> CROSS
    STRUCT --> ERR
    CT --> WARN
    CROSS --> INFO
    style ERR fill: #f8d7da, stroke: #721c24
    style WARN fill: #fff3cd, stroke: #856404
    style INFO fill: #d1ecf1, stroke: #0c5460

Validation checks ensure your data:

  • Conforms to SDTM structure
  • Uses correct controlled terminology
  • Meets FDA submission requirements

Running Validation

Automatic Validation

Validation runs automatically when you:

  • Complete column mapping
  • Make changes to mappings
  • Prepare for export

Manual Validation

Click Validate in the toolbar or press Ctrl+R (⌘R on macOS).

Validation Results

Result Categories

CategoryIconDescription
ErrorRedMust be fixed before export
WarningYellowShould be reviewed
InfoBlueInformational, no action required

Results Panel

┌─────────────────────────────────────────────────────────────┐
│ Validation Results                           [✓] [⚠] [ℹ]   │
├─────────────────────────────────────────────────────────────┤
│ ❌ SD0001: USUBJID is required but not mapped               │
│    Rows affected: All                                        │
│    Fix: Map a column to USUBJID                             │
├─────────────────────────────────────────────────────────────┤
│ ⚠️ CT0015: Value "M" not in SEX codelist                    │
│    Rows affected: 45, 67, 89                                │
│    Expected: MALE, FEMALE, UNKNOWN                          │
├─────────────────────────────────────────────────────────────┤
│ ℹ️ INFO: 1250 rows will be exported                         │
└─────────────────────────────────────────────────────────────┘

Validation Rules

Structural Rules

Rule IDDescription
SD0001Required variable missing
SD0002Invalid variable name
SD0003Variable length exceeded
SD0004Invalid data type

Controlled Terminology Rules

Rule IDDescription
CT0001Value not in codelist
CT0002Codelist not found
CT0003Invalid date format

Cross-Domain Rules

Rule IDDescription
XD0001USUBJID not consistent
XD0002Missing parent record
XD0003Duplicate keys

Fixing Validation Errors

Mapping Errors

  1. Click on the error message
  2. The relevant mapping is highlighted
  3. Adjust the mapping or source data

Data Errors

  1. Note the affected rows
  2. Correct the source data
  3. Re-import and re-validate

Terminology Errors

  1. Review the expected values
  2. Map source values to controlled terms
  3. Or update source data to use standard terms

Controlled Terminology Validation

Supported Codelists

Trial Submission Studio includes embedded controlled terminology:

  • CDISC CT 2025-09-26 (latest)
  • CDISC CT 2025-03-28
  • CDISC CT 2024-03-29

Codelist Validation

For variables like SEX, RACE, COUNTRY:

  • Source values are checked against valid terms
  • Invalid values are flagged
  • Suggestions for correct values are provided

Validation Reports

Export Validation Report

  1. Complete validation
  2. File → Export Validation Report
  3. Choose format (PDF, HTML, CSV)
  4. Save the report

Report Contents

  • Summary statistics
  • All validation messages
  • Affected data rows
  • Recommendations

Best Practices

  1. Validate early and often - Fix issues as you go
  2. Address errors first - Then warnings
  3. Document exceptions - If warnings are intentional
  4. Keep validation reports - For audit trails

Next Steps

Exporting Data

After mapping and validation, export your data to CDISC-compliant formats.

Export Dialog

Export Formats

Trial Submission Studio supports multiple output formats:

FormatVersionDescriptionUse Case
XPTV5SAS Transport (FDA standard)FDA submissions
XPTV8Extended SAS TransportLonger names/labels
Dataset-XML1.0CDISC XML formatData exchange
Define-XML2.1Metadata documentationSubmission package

XPT Export

XPT Version 5 (Default)

The FDA standard format with these constraints:

  • Variable names: 8 characters max
  • Labels: 40 characters max
  • Compatible with SAS V5 Transport

XPT Version 8

Extended format supporting:

  • Variable names: 32 characters
  • Labels: 256 characters
  • Note: Not all systems support V8

Export Steps

  1. Click Export in the toolbar
  2. Select XPT V5 or XPT V8
  3. Choose output location
  4. Click Save

XPT Options

OptionDescription
Include all variablesExport mapped and derived variables
Sort by keysOrder rows by key variables
CompressReduce file size

Dataset-XML Export

CDISC ODM-based XML format for data exchange.

Features

  • Human-readable format
  • Full Unicode support
  • Metadata included
  • Schema validation

Export Steps

  1. Click Export
  2. Select Dataset-XML
  3. Configure options
  4. Click Save

Define-XML Export

Generate submission metadata documentation.

Define-XML 2.1

  • Dataset definitions
  • Variable metadata
  • Controlled terminology
  • Computational methods
  • Value-level metadata

Export Steps

  1. Click Export
  2. Select Define-XML
  3. Review metadata
  4. Click Save

Batch Export

Export multiple domains at once:

  1. File → Batch Export
  2. Select domains to export
  3. Choose format(s)
  4. Set output directory
  5. Click Export All

Export Validation

Before export completes, the system verifies:

  • All required variables are present
  • Data types are correct
  • Lengths don’t exceed limits
  • Controlled terms are valid

Output Files

File Naming

Default naming convention:

  • {domain}.xpt - e.g., dm.xpt, ae.xpt
  • {domain}.xml - for Dataset-XML
  • define.xml - for Define-XML

Checksums

Each export generates:

  • SHA256 checksum file (.sha256)
  • Useful for submission verification

Quality Checks

Post-Export Verification

  1. Open the exported file in a viewer
  2. Verify row counts match
  3. Check variable order
  4. Review sample values

External Validation

Consider validating with:

  • Pinnacle 21 Community
  • SAS (if available)
  • Other CDISC validators

Best Practices

  1. Validate before export - Fix all errors first
  2. Use XPT V5 for FDA - Standard format
  3. Generate checksums - For integrity verification
  4. Test with validators - Confirm compliance
  5. Keep source files - Maintain audit trail

Troubleshooting

Export Fails

IssueSolution
Validation errorsFix errors before export
Disk fullFree up space
Permission deniedCheck write permissions
File in useClose file in other apps

Output Issues

IssueSolution
Truncated valuesCheck length limits
Missing dataVerify mappings
Wrong encodingEnsure UTF-8 source

Next Steps

Common Workflows

Step-by-step guides for typical Trial Submission Studio use cases.

Workflow Overview

flowchart LR
    subgraph "1. Import"
        A[Load CSV]
    end

    subgraph "2. Configure"
        B[Select Domain]
        C[Map Columns]
    end

    subgraph "3. Quality"
        D[Handle CT]
        E[Validate]
    end

    subgraph "4. Output"
        F[Export XPT]
    end

    A --> B --> C --> D --> E --> F
    E -.->|Fix Issues| C
    style A fill: #e8f4f8, stroke: #333
    style F fill: #d4edda, stroke: #333

Workflow 1: Demographics (DM) Domain

Transform demographics source data to SDTM DM domain.

Source Data Example

SUBJECT_ID,AGE,SEX,RACE,ETHNIC,COUNTRY,SITE_ID
SUBJ001,45,Male,WHITE,NOT HISPANIC,USA,101
SUBJ002,38,Female,ASIAN,NOT HISPANIC,USA,102
SUBJ003,52,Male,BLACK,HISPANIC,USA,101

Steps

  1. Import the CSV

    • File → Import CSV
    • Select your demographics file
  2. Select DM Domain

    • Click on “Domain Selection”
    • Choose “DM - Demographics”
  3. Map Columns

    SourceTargetNotes
    SUBJECT_IDUSUBJIDSubject identifier
    AGEAGEAge in years
    SEXSEXMaps to controlled terminology
    RACERACEControlled terminology
    ETHNICETHNICControlled terminology
    COUNTRYCOUNTRYISO 3166 codes
    SITE_IDSITEIDSite identifier
  4. Handle Controlled Terminology

    • “Male” → “M” (or keep if using extensible CT)
    • “Female” → “F”
    • Review RACE and ETHNIC values
  5. Validate

    • Click Validate
    • Address any errors
  6. Export

    • Export → XPT V5
    • Save as dm.xpt

Workflow 2: Adverse Events (AE) Domain

Transform adverse event data to SDTM AE domain.

Source Data Example

SUBJECT_ID,AE_TERM,START_DATE,END_DATE,SEVERITY,SERIOUS
SUBJ001,Headache,2024-01-15,2024-01-17,MILD,N
SUBJ001,Nausea,2024-02-01,,MODERATE,N
SUBJ002,Rash,2024-01-20,2024-01-25,SEVERE,Y

Steps

  1. Import CSV

  2. Select AE Domain

  3. Map Columns

    SourceTargetNotes
    SUBJECT_IDUSUBJID
    AE_TERMAETERMVerbatim term
    START_DATEAESTDTCStart date
    END_DATEAEENDTCEnd date (can be blank)
    SEVERITYAESEVControlled terminology
    SERIOUSAESERY/N
  4. Derive Required Variables

    • AESEQ (sequence number) - auto-generated
    • AEDECOD (dictionary term) - if available
  5. Validate and Export


Workflow 3: Vital Signs (VS) Domain

Transform vital signs measurements to SDTM VS domain.

Source Data Example

SUBJECT_ID,VISIT,TEST,RESULT,UNIT,DATE
SUBJ001,BASELINE,SYSBP,120,mmHg,2024-01-10
SUBJ001,BASELINE,DIABP,80,mmHg,2024-01-10
SUBJ001,WEEK 4,SYSBP,118,mmHg,2024-02-07

Steps

  1. Import CSV

  2. Select VS Domain

  3. Map Columns

    SourceTargetNotes
    SUBJECT_IDUSUBJID
    VISITVISITVisit name
    TESTVSTESTCDTest code
    RESULTVSSTRESNNumeric result
    UNITVSSTRESUResult unit
    DATEVSDTCCollection date
  4. Map Test Codes

    • SYSBP → Systolic Blood Pressure
    • DIABP → Diastolic Blood Pressure
  5. Validate and Export


Workflow 4: Batch Processing

Process multiple domains from one source file.

Source Data

A comprehensive dataset with columns for multiple domains.

Steps

  1. Import the source file
  2. Process each domain
    • Filter relevant columns
    • Map to domain variables
    • Validate
  3. Batch Export
    • File → Batch Export
    • Select all processed domains
    • Export to output folder

Workflow 5: Re-processing with Template

Use a saved mapping template for similar data.

Steps

  1. First Time Setup

    • Import sample data
    • Create mappings
    • Save template: File → Save Mapping Template
  2. Subsequent Processing

    • Import new data (same structure)
    • Load template: File → Load Mapping Template
    • Review and adjust if needed
    • Validate and export

Tips for All Workflows

Before You Start

  • Review source data quality
  • Identify required variables
  • Prepare controlled terminology mappings

During Processing

  • Validate after each major step
  • Document any decisions
  • Keep notes on exceptions

After Export

  • Verify output files
  • Run external validation
  • Archive source and output files

Next Steps

Troubleshooting

Common issues and their solutions when using Trial Submission Studio.

Import Issues

File Won’t Import

SymptomCauseSolution
“Invalid file format”Not a CSV fileEnsure file is CSV format
“Encoding error”Non-UTF8 encodingRe-save as UTF-8
“No data found”Empty file or wrong delimiterCheck file contents
“Parse error”Malformed CSVFix CSV structure

Data Appears Garbled

Cause: Encoding mismatch

Solution:

  1. Open the file in a text editor
  2. Save with UTF-8 encoding
  3. Re-import

Missing Columns

Cause: Header row issues

Solution:

  1. Verify first row contains headers
  2. Check for BOM (byte order mark) issues
  3. Remove hidden characters

Mapping Issues

No Suggested Mappings

Cause: Column names don’t match SDTM variables

Solution:

  1. Manually map columns
  2. Consider renaming source columns
  3. Create a mapping template for reuse

Wrong Automatic Mappings

Cause: Fuzzy matching misidentified variables

Solution:

  1. Review all automatic mappings
  2. Manually correct incorrect mappings
  3. Adjust match confidence threshold in settings

Can’t Map Required Variable

Cause: Source data missing required information

Solution:

  1. Add the missing data to source file
  2. Derive from other columns if possible
  3. Consult with data manager

Validation Issues

Too Many Errors

Cause: Data quality issues or incorrect mappings

Solution:

  1. Address errors in priority order
  2. Fix mapping issues first
  3. Clean source data if needed
  4. Re-validate after each fix

Controlled Terminology Errors

Cause: Values don’t match CDISC CT

Solution:

  1. Review expected values in the error message
  2. Map source values to standard terms
  3. Update source data if appropriate

Date Format Errors

Cause: Non-ISO date formats

Solution:

  1. Convert dates to ISO 8601 format (YYYY-MM-DD)
  2. Or use partial dates where appropriate (YYYY-MM, YYYY)

Export Issues

Export Fails

ErrorCauseSolution
“Validation errors exist”Unresolved errorsFix all errors first
“Permission denied”No write accessCheck folder permissions
“Disk full”Insufficient spaceFree up disk space
“File in use”File open elsewhereClose file in other apps

Truncated Data in XPT

Cause: Values exceed XPT limits

Solution:

  1. XPT V5: Max 200 chars per variable
  2. Check variable lengths before export
  3. Consider using XPT V8 for longer values

Missing Variables in Output

Cause: Variables not mapped or derived

Solution:

  1. Verify all required mappings
  2. Check if derived variables were created
  3. Review export settings

Performance Issues

Slow Import

Cause: Large file size

Solution:

  1. Allow time for large files
  2. Consider splitting into smaller files
  3. Close other applications
  4. Increase available RAM

Application Freezes

Cause: Processing large datasets

Solution:

  1. Wait for operation to complete
  2. Check progress indicator
  3. If unresponsive after 5+ minutes, restart
  4. Process smaller datasets

High Memory Usage

Cause: Large dataset in memory

Solution:

  1. Close unused files
  2. Process one domain at a time
  3. Restart application to free memory

Application Issues

Application Won’t Start

macOS:

# If blocked by Gatekeeper
xattr -d com.apple.quarantine /Applications/Trial\ Submission\ Studio.app

Linux:

# Ensure executable permission
chmod +x trial-submission-studio

Windows:

  • Run as administrator
  • Check antivirus isn’t blocking

Crashes on Startup

Solution:

  1. Delete configuration files:
    • macOS: ~/Library/Application Support/trial-submission-studio/
    • Windows: %APPDATA%\trial-submission-studio\
    • Linux: ~/.config/trial-submission-studio/
  2. Reinstall the application

Settings Not Saved

Cause: Permission issues

Solution:

  1. Ensure write access to config directory
  2. Run application with appropriate permissions

Getting Help

Collect Information

Before reporting an issue, gather:

  1. Application version (Help → About)
  2. Operating system and version
  3. Steps to reproduce
  4. Error messages (screenshots)
  5. Sample data (anonymized)

Report an Issue

  1. Check existing issues
  2. Create a new issue
  3. Include collected information

Community Support


Quick Reference

Keyboard Shortcuts for Recovery

ActionWindows/LinuxmacOS
Force quitAlt+F4⌘Q
Cancel operationEscEsc
UndoCtrl+Z⌘Z

Log Files

Application logs are located at:

  • macOS: ~/Library/Logs/trial-submission-studio/
  • Windows: %LOCALAPPDATA%\trial-submission-studio\logs\
  • Linux: ~/.local/share/trial-submission-studio/logs/

Include relevant log excerpts when reporting issues.

CDISC Standards Overview

Trial Submission Studio supports CDISC (Clinical Data Interchange Standards Consortium) standards for regulatory submissions.

What is CDISC?

CDISC develops global data standards that streamline clinical research and enable connections to healthcare. These standards are required by regulatory agencies including the FDA and PMDA.

Supported Standards

Currently Implemented

StandardVersionStatus
SDTM-IG3.4Supported
Controlled Terminology2024-2025Supported

Planned Support

StandardVersionStatus
ADaM-IG1.3Planned
SEND-IG3.1.1Planned

SDTM (Study Data Tabulation Model)

SDTM is the standard structure for submitting study data to regulatory authorities.

Key Concepts

  • Domains: Logical groupings of data (e.g., Demographics, Adverse Events)
  • Variables: Individual data elements within domains
  • Controlled Terminology: Standardized values for specific variables

Learn More

Controlled Terminology

CDISC Controlled Terminology (CT) provides standardized values for SDTM variables.

Embedded Versions

Trial Submission Studio includes the following CT packages:

  • CDISC CT 2025-09-26 (latest)
  • CDISC CT 2025-03-28
  • CDISC CT 2024-03-29

Learn More

ADaM (Analysis Data Model)

ADaM is the standard for analysis-ready datasets derived from SDTM.

Note

ADaM support is planned for a future release.

SEND (Standard for Exchange of Nonclinical Data)

SEND is SDTM for nonclinical (animal) studies.

Note

SEND support is planned for a future release.

FDA Requirements

Electronic Submissions

The FDA requires CDISC standards for:

  • New Drug Applications (NDA)
  • Biologics License Applications (BLA)
  • Abbreviated New Drug Applications (ANDA)

Study Data Technical Conformance Guide

Trial Submission Studio aligns with FDA’s Study Data Technical Conformance Guide requirements:

  • XPT V5 format
  • Define-XML 2.1
  • Controlled Terminology validation

Resources

Official CDISC Resources

FDA Resources

Next Steps

SDTM Introduction

The Study Data Tabulation Model (SDTM) is the standard for organizing and formatting human clinical trial data for submission to regulatory authorities.

Purpose

SDTM provides:

  • Consistent structure for clinical trial data
  • Standardized naming conventions
  • Regulatory compliance with FDA requirements
  • Interoperability between systems and organizations

Key Concepts

Domains

SDTM organizes data into domains - logical groupings of related observations:

CategoryExamples
Special PurposeDM (Demographics), CO (Comments), SE (Subject Elements), SV (Subject Visits)
InterventionsCM (Concomitant Meds), EX (Exposure), SU (Substance Use)
EventsAE (Adverse Events), DS (Disposition), MH (Medical History)
FindingsLB (Labs), VS (Vital Signs), EG (ECG), PE (Physical Exam)

Variables

Each domain contains variables - individual data elements:

TypeDescriptionExamples
IdentifierSubject/study identificationSTUDYID, USUBJID, DOMAIN
TopicFocus of the observationAETERM, VSTEST, LBTEST
TimingWhen observation occurredAESTDTC, VSDTC, VISITNUM
QualifierAdditional contextAESEV, VSPOS, LBORRES

Controlled Terminology

Many variables require values from controlled terminology (CT):

  • Standardized value lists
  • Ensures consistency across studies
  • Required for regulatory submissions

SDTM Structure

flowchart TB
    subgraph "SDTM Domain Classes"
        direction TB
        SP[Special Purpose<br/>DM, CO, SE, SV]
        INT[Interventions<br/>CM, EX, SU]
        EVT[Events<br/>AE, DS, MH]
        FIND[Findings<br/>LB, VS, EG, PE]
    end

    subgraph "Variable Types"
        ID[Identifiers<br/>STUDYID, USUBJID]
        TOPIC[Topic Variables<br/>--TERM, --TEST]
        TIMING[Timing Variables<br/>--STDTC, --ENDTC]
        QUAL[Qualifiers<br/>--SEV, --RES]
    end

    SP --> ID
    INT --> ID
    EVT --> ID
    FIND --> ID
    ID --> TOPIC
    TOPIC --> TIMING
    TIMING --> QUAL
    style SP fill: #4a90d9, color: #fff
    style INT fill: #50c878, color: #fff
    style EVT fill: #f5a623, color: #fff
    style FIND fill: #9b59b6, color: #fff

General Observation Classes

  1. Interventions: Treatments applied to subjects
  2. Events: Occurrences during study participation
  3. Findings: Observations and test results

Variable Roles

RolePurposeExample
IdentifierLink records across domainsUSUBJID
TopicDescribe the observationAETERM
TimingCapture whenAESTDTC
QualifierProvide contextAESEV
RuleLink to analysis rules(via Define-XML)

Working with SDTM in Trial Submission Studio

Import Flow

  1. Load source CSV data
  2. Select target SDTM domain
  3. Map source columns to SDTM variables
  4. Handle controlled terminology
  5. Validate against SDTM rules
  6. Export to XPT format

Variable Requirements

  • Required: Must be present and populated
  • Expected: Should be present if applicable
  • Permissible: Allowed but not required

Best Practices

  1. Map identifiers first: STUDYID, DOMAIN, USUBJID
  2. Use controlled terminology: For variables requiring CT
  3. Follow naming conventions: Variable names, labels
  4. Validate early: Catch issues before export

SDTM Versions

Trial Submission Studio currently supports:

  • SDTM-IG 3.4 (current FDA standard)

Version History

VersionReleaseNotes
3.42021Current FDA standard
3.32018
3.22013
3.1.22008

Next Steps

SDTM Domains

SDTM organizes clinical trial data into domains based on the type of observation.

Domain Categories

Special Purpose Domains

Core structural domains required for all submissions.

DomainNameDescription
DMDemographicsSubject demographic information
COCommentsFree-text comments
SESubject ElementsSubject milestones
SVSubject VisitsVisits for each subject
TATrial ArmsPlanned study arms
TDTrial DiseaseDisease descriptions
TETrial ElementsPlanned protocol elements
TITrial Inclusion/ExclusionEligibility criteria
TSTrial SummaryStudy-level parameters
TVTrial VisitsPlanned visits

Interventions Domains

Treatments and substances given to or used by subjects.

DomainNameDescription
CMConcomitant MedicationsNon-study medications
ECExposure as CollectedExposure data as collected
EXExposureStudy treatment exposure
PRProceduresNon-study procedures
SUSubstance UseTobacco, alcohol, etc.

Events Domains

Discrete occurrences during study participation.

DomainNameDescription
AEAdverse EventsAll adverse events
CEClinical EventsNon-adverse clinical events
DSDispositionSubject status at milestones
DVProtocol DeviationsProtocol violations
HOHealthcare EncountersHospitalizations, ER visits
MHMedical HistoryPrior conditions

Findings Domains

Observations and measurements.

DomainNameDescription
DADrug AccountabilityDrug dispensing/return
DDDeath DetailsCause of death details
EGECG ResultsElectrocardiogram data
FTFunctional TestsFunctional assessments
IEInclusion/ExclusionSubject eligibility
ISImmunogenicity SpecimenSample assessments
LBLab ResultsLaboratory tests
MBMicrobiology SpecimenMicrobiology samples
MIMicroscopic FindingsHistopathology
MKMusculoskeletalMusculoskeletal findings
MOMorphologyImaging morphology
MSMicrobiology SusceptibilityAntibiotic susceptibility
NVNervous SystemNeurological findings
OEOphthalmologyEye exam results
PCPharmacokinetics ConcentrationsDrug concentrations
PEPhysical ExamPhysical examination
PPPK ParametersPharmacokinetic parameters
QSQuestionnairesPRO/questionnaire data
RERespiratoryPulmonary function
RPReproductiveReproductive findings
RSDisease ResponseTumor response
SCSubject CharacteristicsAdditional demographics
SSSubject StatusSubject enrollment status
TRTumor/Lesion ResultsTumor measurements
TUTumor/Lesion IdentificationTumor identification
URUrinary SystemUrological findings
VSVital SignsVital sign measurements

Common Domain Details

DM - Demographics

Required for all studies. Contains one record per subject.

Key Variables:

  • USUBJID (Unique Subject ID)
  • AGE, AGEU (Age and units)
  • SEX, RACE, ETHNIC
  • ARM, ARMCD (Study arm)
  • RFSTDTC, RFENDTC (Reference dates)
  • COUNTRY, SITEID

AE - Adverse Events

Captures all adverse events during the study.

Key Variables:

  • AETERM (Verbatim term)
  • AEDECOD (Dictionary-coded term)
  • AESTDTC, AEENDTC (Start/end dates)
  • AESEV (Severity)
  • AESER (Serious)
  • AEREL (Relationship to treatment)
  • AEOUT (Outcome)

VS - Vital Signs

Captures vital sign measurements.

Key Variables:

  • VSTESTCD, VSTEST (Test code/name)
  • VSORRES, VSSTRESC, VSSTRESN (Results)
  • VSORRESU, VSSTRESU (Units)
  • VSPOS (Position)
  • VSDTC (Date/time)
  • VISITNUM, VISIT

LB - Laboratory Results

Captures laboratory test results.

Key Variables:

  • LBTESTCD, LBTEST (Test code/name)
  • LBORRES, LBSTRESC, LBSTRESN (Results)
  • LBORRESU, LBSTRESU (Units)
  • LBSPEC (Specimen type)
  • LBDTC (Date/time)
  • LBNRIND (Reference range indicator)

Custom Domains

For data not fitting standard domains, create custom domains:

  • Two-letter code starting with X, Y, or Z
  • Follow general observation class rules
  • Document in Define-XML

Next Steps

SDTM Variables

Variables are the individual data elements within SDTM domains.

Variable Categories

Identifier Variables

Identify the study, subject, and domain.

VariableLabelDescription
STUDYIDStudy IdentifierUnique study ID
DOMAINDomain AbbreviationTwo-letter domain code
USUBJIDUnique Subject IDUnique across all studies
SUBJIDSubject IDSubject ID within study
SITEIDStudy Site IdentifierSite number

Topic Variables

Describe what was observed.

DomainVariableDescription
AEAETERMAdverse event term
CMCMTRTMedication name
LBLBTESTLab test name
VSVSTESTVital sign test

Timing Variables

Capture when observations occurred.

VariableLabelDescription
–DTCDate/TimeISO 8601 date/time
–STDTCStart Date/TimeStart of observation
–ENDTCEnd Date/TimeEnd of observation
–DYStudy DayStudy day number
VISITNUMVisit NumberNumeric visit identifier
VISITVisit NameVisit label

Qualifier Variables

Provide additional context.

TypeExamplesDescription
Grouping–CAT, –SCATCategory, subcategory
Result–ORRES, –STRESCOriginal/standard result
Record–SEQ, –GRPIDSequence, grouping
Synonym–DECOD, –MODIFYCoded/modified terms

Variable Naming Conventions

Prefix Pattern

Most variables use a domain-specific prefix:

  • AE + TERM = AETERM
  • VS + TESTCD = VSTESTCD
  • LB + ORRES = LBORRES

Common Suffixes

SuffixMeaningExample
--TESTCDTest CodeVSTESTCD, LBTESTCD
--TESTTest NameVSTEST, LBTEST
--ORRESOriginal ResultVSORRES, LBORRES
--ORRESUOriginal UnitsVSORRESU, LBORRESU
--STRESCStandardized Result (Char)VSSTRESC
--STRESNStandardized Result (Num)VSSTRESN
--STRESUStandardized UnitsVSSTRESU
--STATStatusVSSTAT (NOT DONE)
--REASNDReason Not DoneVSREASND
--LOCLocationVSLOC
--DTCDate/TimeVSDTC, AESTDTC

Data Types

Character Variables

  • Text values
  • Max length: 200 characters (XPT V5)
  • Example: AETERM, VSTEST

Numeric Variables

  • Integer or floating-point
  • Example: AGE, VSSTRESN, LBSTRESN

Date/Time Variables

ISO 8601 format:

  • Full: 2024-01-15T09:30:00
  • Date only: 2024-01-15
  • Partial: 2024-01, 2024

Variable Requirements

Required Variables

Must be present and populated for every record.

DomainRequired Variables
AllSTUDYID, DOMAIN, USUBJID
DMRFSTDTC, RFENDTC, SITEID, ARM, ARMCD
AEAETERM, AEDECOD, AESTDTC
VSVSTESTCD, VSTEST, VSORRES, VSDTC

Expected Variables

Should be present when applicable.

DomainExpected Variables
AEAEENDTC, AESEV, AESER, AEREL
VSVSSTRESN, VSSTRESU, VISITNUM

Permissible Variables

Can be included if relevant data exists.

Controlled Terminology

Variables requiring controlled terminology:

VariableCodelist
SEXSex
RACERace
ETHNICEthnicity
COUNTRYCountry
AESEVSeverity
AESERNo Yes Response
VSTESTCDVital Signs Test Code
LBTESTCDLab Test Code

Variable Metadata

Label

40 characters max (XPT V5):

  • Descriptive text
  • Example: “Adverse Event Reported Term”

Length

Define appropriate length for each variable:

  • Consider actual data values
  • XPT V5 max: 200 characters

Order

Maintain consistent variable ordering:

  1. Identifier variables
  2. Topic variables
  3. Qualifier variables
  4. Timing variables

Next Steps

SDTM Validation Rules

Trial Submission Studio validates data against SDTM implementation guide rules.

Validation Categories

Structural Validation

Checks data structure and format.

Rule IDDescriptionSeverity
SD0001Required variable missingError
SD0002Invalid variable nameError
SD0003Variable length exceededError
SD0004Invalid data typeError
SD0005Duplicate recordsWarning
SD0006Invalid domain codeError

Content Validation

Checks data values and relationships.

Rule IDDescriptionSeverity
CT0001Value not in controlled terminologyError
CT0002Invalid date formatError
CT0003Date out of valid rangeWarning
CT0004Numeric value out of rangeWarning
CT0005Missing required valueError

Cross-Record Validation

Checks relationships between records.

Rule IDDescriptionSeverity
XR0001USUBJID not in DMError
XR0002Duplicate key valuesError
XR0003Missing parent recordWarning
XR0004Inconsistent dates across domainsWarning

Common Validation Rules

Identifier Rules

STUDYID

  • Must be present in all records
  • Must be consistent across domains
  • Cannot be null or empty

USUBJID

  • Must be present in all records
  • Must exist in DM domain
  • Must be unique per subject

DOMAIN

  • Must match the domain abbreviation
  • Must be uppercase
  • Must be 2 characters

Date/Time Rules

–DTC Variables

  • Must follow ISO 8601 format
  • Supported formats:
    • YYYY-MM-DDTHH:MM:SS
    • YYYY-MM-DD
    • YYYY-MM
    • YYYY

Date Ranges

  • End date cannot precede start date
  • Study dates should be within study period

Controlled Terminology Rules

SEX

Valid values:

  • M (Male)
  • F (Female)
  • U (Unknown)
  • UNDIFFERENTIATED

AESEV

Valid values:

  • MILD
  • MODERATE
  • SEVERE

AESER

Valid values:

  • Y (Yes)
  • N (No)

Validation Report

Error Summary

┌─────────────────────────────────────────────────────────────┐
│ Validation Summary                                          │
├─────────────────────────────────────────────────────────────┤
│ Errors:   5                                                 │
│ Warnings: 12                                                │
│ Info:     3                                                 │
├─────────────────────────────────────────────────────────────┤
│ Domain: DM                                                  │
│   - 2 Errors                                                │
│   - 3 Warnings                                              │
│                                                             │
│ Domain: AE                                                  │
│   - 3 Errors                                                │
│   - 9 Warnings                                              │
└─────────────────────────────────────────────────────────────┘

Error Details

Each error includes:

  • Rule ID: Unique identifier
  • Severity: Error/Warning/Info
  • Description: What’s wrong
  • Location: Affected rows/columns
  • Suggestion: How to fix

Fixing Validation Issues

Mapping Issues

  1. Verify correct source column is mapped
  2. Check data type compatibility
  3. Ensure all required variables are mapped

Data Issues

  1. Review affected rows
  2. Correct values in source data
  3. Re-import and re-validate

Terminology Issues

  1. Check expected values in codelist
  2. Map source values to standard terms
  3. Use value-level mapping if needed

Custom Validation

Severity Overrides

Some warnings can be suppressed if intentional:

  1. Review the warning
  2. Document the reason
  3. Mark as reviewed (if applicable)

Adding Context

For validation reports:

  • Add comments explaining exceptions
  • Document data collection differences
  • Note protocol-specific variations

Best Practices

  1. Validate incrementally

    • After initial mapping
    • After each significant change
    • Before final export
  2. Address errors first

    • Errors block export
    • Warnings should be reviewed
    • Info messages are FYI
  3. Document exceptions

    • Why a warning is acceptable
    • Protocol-specific reasons
    • Historical data limitations
  4. Review validation reports

    • Keep for audit trail
    • Share with data management
    • Include in submission package

Next Steps

Controlled Terminology

CDISC Controlled Terminology (CT) provides standardized values for SDTM variables.

Overview

Controlled Terminology ensures:

  • Consistency across studies and organizations
  • Interoperability between systems
  • Regulatory compliance with FDA requirements

Embedded CT Packages

Trial Submission Studio includes the following CT versions:

VersionRelease DateStatus
2024-12-20December 2024Current
2024-09-27September 2024Supported
2024-06-28June 2024Supported

Common Codelists

SEX (C66731)

CodeDecoded Value
MMALE
FFEMALE
UUNKNOWN
UNDIFFERENTIATEDUNDIFFERENTIATED

RACE (C74457)

Decoded Value
AMERICAN INDIAN OR ALASKA NATIVE
ASIAN
BLACK OR AFRICAN AMERICAN
NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
WHITE
MULTIPLE
NOT REPORTED
UNKNOWN

ETHNIC (C66790)

Decoded Value
HISPANIC OR LATINO
NOT HISPANIC OR LATINO
NOT REPORTED
UNKNOWN

COUNTRY (C66729)

ISO 3166-1 alpha-3 country codes:

  • USA, CAN, GBR, DEU, FRA, JPN, etc.

AESEV (C66769) - Severity

Decoded Value
MILD
MODERATE
SEVERE

AESER (C66742) - Serious

CodeDecoded Value
YY
NN

NY (C66742) - No Yes Response

CodeDecoded Value
YY
NN

VSTESTCD (C66741) - Vital Signs Test Codes

CodeDecoded Value
BMIBody Mass Index
DIABPDiastolic Blood Pressure
HEIGHTHeight
HRHeart Rate
PULSEPulse Rate
RESPRespiratory Rate
SYSBPSystolic Blood Pressure
TEMPTemperature
WEIGHTWeight

LBTESTCD - Lab Test Codes

Common examples:

CodeDescription
ALBAlbumin
ALTAlanine Aminotransferase
ASTAspartate Aminotransferase
BILIBilirubin
BUNBlood Urea Nitrogen
CREATCreatinine
GLUCGlucose
HGBHemoglobin
PLATPlatelet Count
WBCWhite Blood Cell Count

Extensible vs Non-Extensible

Non-Extensible Codelists

Values must exactly match the codelist:

  • SEX
  • COUNTRY
  • Unit codelists

Extensible Codelists

Additional values allowed with sponsor definition:

  • RACE (can add study-specific values)
  • Some test codes

Using CT in Trial Submission Studio

Automatic Validation

When you map variables with controlled terminology:

  1. Values are checked against the codelist
  2. Non-matching values are flagged
  3. Suggestions are provided

Value Mapping

For source values not in CT format:

  1. Create value-level mappings
  2. Map “Male” → “M”, “Female” → “F”
  3. Apply consistently

CT Version Selection

  1. Go to Settings → Controlled Terminology
  2. Select the appropriate CT version
  3. Validation uses selected version

Handling CT Errors

Value Not in Codelist

Error: “Value ‘XYZ’ not found in codelist”

Solutions:

  1. Check spelling/case
  2. Find the correct CT value
  3. Map source value to CT value
  4. For extensible codelists, document new value

Common Mappings

Source ValueCT Value
MaleM
FemaleF
YesY
NoN
CaucasianWHITE
African AmericanBLACK OR AFRICAN AMERICAN

Updating CT

New CT versions are released quarterly by CDISC. To use newer versions:

  1. Check for Trial Submission Studio updates
  2. New CT is included in app updates
  3. Select version in settings

Resources

Official References

Next Steps

ADaM (Preview)

The Analysis Data Model (ADaM) defines standards for analysis-ready datasets.

Note

ADaM support is planned for a future release of Trial Submission Studio.

What is ADaM?

ADaM (Analysis Data Model) provides:

  • Standards for analysis datasets
  • Derived from SDTM data
  • Ready for statistical analysis
  • Required for FDA submissions

ADaM vs SDTM

AspectSDTMADaM
PurposeData tabulationData analysis
TimingRaw data collectionDerived for analysis
StructureObservation-basedAnalysis-ready
AudienceData managersStatisticians

ADaM Dataset Types

ADSL - Subject-Level Analysis Dataset

One record per subject containing:

  • Demographics
  • Treatment information
  • Key baseline characteristics
  • Analysis flags

BDS - Basic Data Structure

Vertical structure for:

  • Laboratory data (ADLB)
  • Vital signs (ADVS)
  • Efficacy parameters

OCCDS - Occurrence Data Structure

For event data:

  • Adverse events (ADAE)
  • Concomitant medications (ADCM)

Other Structures

  • Time-to-Event (ADTTE)
  • Medical History (ADMH)

Planned Features

When ADaM support is added, Trial Submission Studio will provide:

ADaM Generation

  • Derive ADSL from DM and other SDTM domains
  • Create BDS datasets from SDTM findings
  • Generate OCCDS from events domains

ADaM Validation

  • Check ADaM IG compliance
  • Validate traceability to SDTM
  • Verify required variables

ADaM Export

  • Export to XPT format
  • Generate Define-XML for ADaM
  • Include in submission package

Current Workarounds

Until ADaM support is available:

  1. Export SDTM first

    • Use Trial Submission Studio for SDTM
    • Generate XPT files
  2. Derive ADaM externally

    • Use SAS or R
    • Apply ADaM derivation rules
    • Generate analysis datasets
  3. Validate separately

    • Use external validation tools
    • Check ADaM compliance

Timeline

ADaM support is on our roadmap. Priority features:

  • ADSL generation
  • BDS for VS and LB
  • OCCDS for AE

Resources

CDISC ADaM Resources

Stay Updated

SEND (Preview)

The Standard for Exchange of Nonclinical Data (SEND) extends SDTM for animal studies.

Note

SEND support is planned for a future release of Trial Submission Studio.

What is SEND?

SEND (Standard for Exchange of Nonclinical Data) provides:

  • Standardized format for nonclinical (animal) study data
  • Based on SDTM structure
  • Required for FDA nonclinical submissions
  • Supports toxicology and pharmacology studies

SEND vs SDTM

AspectSDTMSEND
SubjectsHumanAnimal
StudiesClinical trialsNonclinical studies
DomainsClinical domainsNonclinical domains
RequirementsNDA, BLAIND, NDA (nonclinical)

SEND Domains

Special Purpose

DomainName
DMDemographics
DSDisposition
TATrial Arms
TETrial Elements
TSTrial Summary
TXTrial Sets

Findings

DomainName
BWBody Weight
BGBody Weight Gain
CLClinical Observations
DDDeath Diagnosis
FWFood/Water Consumption
LBLaboratory Results
MAMacroscopic Findings
MIMicroscopic Findings
OMOrgan Measurements
PCPharmacokinetic Concentrations
PPPharmacokinetic Parameters
TFTumor Findings
VSVital Signs

Interventions

DomainName
EXExposure

Key Differences from SDTM

Subject Identification

  • USUBJID format differs for animals
  • Species and strain information required
  • Group/cage identification

Domain-Specific Variables

SEND includes nonclinical-specific variables:

  • Species, strain, sex
  • Dose group information
  • Study day calculations
  • Sacrifice/necropsy data

Controlled Terminology

SEND uses specific CT:

  • Animal species
  • Strain/substrain
  • Route of administration (nonclinical)
  • Specimen types

Planned Features

When SEND support is added, Trial Submission Studio will provide:

SEND Import/Mapping

  • Support nonclinical data formats
  • Map to SEND domains
  • Handle group-level data

SEND Validation

  • SEND-IG compliance checking
  • Nonclinical-specific rules
  • Controlled terminology for SEND

SEND Export

  • XPT V5 format
  • Define-XML for SEND
  • Submission-ready packages

Current Workarounds

Until SEND support is available:

  1. Manual Mapping

    • Use current SDTM workflow
    • Manually adjust for SEND differences
    • Export to XPT
  2. External Tools

    • Use specialized nonclinical tools
    • Validate with SEND validators

SEND Versions

VersionDescription
SEND 3.1.1Current FDA standard
SEND 3.1Previous version
SEND 3.0Initial release

Resources

CDISC SEND Resources

FDA Resources

Stay Updated

  • Check the Roadmap for SEND progress
  • Watch for announcements on GitHub

XPT (SAS Transport) Format

XPT is the FDA-standard format for regulatory data submissions.

Overview

The SAS Transport Format (XPT) is:

  • Required by FDA for electronic submissions
  • A platform-independent binary format
  • Compatible with SAS and other tools
  • The de facto standard for clinical data exchange

XPT Versions

Trial Submission Studio supports two XPT versions:

XPT Version 5 (FDA Standard)

CharacteristicLimit
Variable name length8 characters
Variable label length40 characters
Record length8,192 bytes
Numeric precision8 bytes (IEEE)

Use for: FDA submissions, regulatory requirements

XPT Version 8 (Extended)

CharacteristicLimit
Variable name length32 characters
Variable label length256 characters
Record length131,072 bytes
Numeric precision8 bytes (IEEE)

Use for: Internal use, longer names needed

File Structure

Header Records

XPT files contain metadata headers:

  • Library header (first record)
  • Member header (dataset info)
  • Namestr records (variable definitions)

Data Records

  • Fixed-width records
  • Packed binary format
  • IEEE floating-point numbers

Creating XPT Files

Export Steps

  1. Complete data mapping
  2. Run validation
  3. Click Export → XPT
  4. Select version (V5 or V8)
  5. Choose output location
  6. Click Save

Export Options

OptionDescription
VersionV5 (default) or V8
Sort by keysOrder records by key variables
Include metadataDataset label, variable labels

XPT Constraints

Variable Names

V5 Requirements:

  • Maximum 8 characters
  • Start with letter or underscore
  • Alphanumeric and underscore only
  • Uppercase recommended

V8 Requirements:

  • Maximum 32 characters
  • Same character restrictions

Variable Labels

V5: 40 characters max V8: 256 characters max

Data Values

Character variables:

  • V5: Max 200 bytes per value
  • Trailing spaces trimmed
  • Missing = blank

Numeric variables:

  • 8-byte IEEE format
  • 28 SAS missing value codes supported (.A through .Z, ._)
  • Precision: ~15 significant digits

Numeric Precision

IEEE to SAS Conversion

Trial Submission Studio handles:

  • IEEE 754 double precision
  • SAS missing value encoding
  • Proper byte ordering

Missing Values

SAS/XPT supports 28 missing value codes:

CodeMeaning
.Standard missing
.A - .ZSpecial missing A-Z
._Underscore missing

Validation Before Export

Automatic Checks

  • Variable name lengths
  • Label lengths
  • Data type compatibility
  • Value length limits

Common Issues

IssueSolution
Name too longUse V8 or rename
Label truncatedShorten label
Value too longTruncate or split

Post-Export Verification

  1. Check file size - Matches expected data volume
  2. Open in viewer - Verify structure
  3. Validate with external tools - Pinnacle 21, SAS
  4. Compare row counts - Match source data

External Validation

Consider validating with:

  • Pinnacle 21 Community (free)
  • SAS Universal Viewer
  • Other XPT readers

FDA Submission Requirements

Required Format

  • XPT Version 5 for FDA submissions
  • Define-XML 2.1 for metadata
  • Appropriate file naming (lowercase domain codes)

File Naming Convention

  • dm.xpt - Demographics
  • ae.xpt - Adverse Events
  • vs.xpt - Vital Signs
  • (lowercase domain abbreviation)

Dataset Limits

ConstraintLimit
File size5 GB (practical limit)
Variables per datasetNo formal limit
Records per datasetNo formal limit

Technical Details

Byte Order

  • XPT uses big-endian byte order
  • Trial Submission Studio handles conversion automatically

Character Encoding

  • ASCII-compatible
  • Extended ASCII for special characters
  • UTF-8 source data converted appropriately

Record Blocking

  • 80-byte logical records
  • Blocked for efficiency
  • Headers use fixed-format records

Next Steps

Dataset-XML Format

Dataset-XML is a CDISC standard XML format for clinical data exchange.

Overview

Dataset-XML provides:

  • Human-readable data format
  • Full Unicode support
  • Embedded metadata
  • Alternative to XPT binary format

When to Use Dataset-XML

Use CaseRecommendation
FDA submissionUse XPT (required)
Internal data exchangeDataset-XML works well
Archive/audit trailGood for documentation
Non-SAS environmentsEasier integration
Full character supportUnicode capable

Format Structure

ODM Container

Dataset-XML is based on CDISC ODM (Operational Data Model):

<?xml version="1.0" encoding="UTF-8"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
     xmlns:data="http://www.cdisc.org/ns/Dataset-XML/v1.0"
     FileType="Snapshot">
    <ClinicalData StudyOID="..." MetaDataVersionOID="...">
        <SubjectData SubjectKey="...">
            <StudyEventData StudyEventOID="...">
                <ItemGroupData ItemGroupOID="DM">
                    <ItemData ItemOID="STUDYID">ABC123</ItemData>
                    <ItemData ItemOID="USUBJID">ABC123-001</ItemData>
                    <!-- More items -->
                </ItemGroupData>
            </StudyEventData>
        </SubjectData>
    </ClinicalData>
</ODM>

Key Elements

ElementDescription
ODMRoot container
ClinicalDataStudy data container
SubjectDataPer-subject data
ItemGroupDataDomain records
ItemDataIndividual values

Creating Dataset-XML

Export Steps

  1. Complete data mapping
  2. Run validation
  3. Click Export → Dataset-XML
  4. Configure options
  5. Choose output location
  6. Click Save

Export Options

OptionDescription
Include metadataEmbed variable definitions
Pretty printFormat XML for readability
CompressReduce file size
Single fileOne file vs. file per domain

Dataset-XML vs XPT

AspectDataset-XMLXPT
FormatText (XML)Binary
ReadabilityHuman-readableRequires tools
SizeLargerSmaller
UnicodeFull supportLimited
FDA submissionAcceptedRequired
IntegrationEasierSAS-focused

Advantages

Human Readable

  • Open in any text editor
  • Easily inspectable
  • Good for debugging

Full Unicode

  • International characters
  • Special symbols
  • No character limitations

Self-Describing

  • Metadata embedded
  • Schema validation
  • No external dependencies

Platform Independent

  • Standard XML format
  • Any programming language
  • No proprietary tools needed

Limitations

File Size

  • Larger than binary XPT
  • Compression recommended for large datasets

FDA Preference

  • FDA prefers XPT for submissions
  • Dataset-XML accepted but less common

Processing Overhead

  • XML parsing slower than binary
  • More memory for large files

Validation

Schema Validation

Dataset-XML can be validated against:

  • CDISC Dataset-XML schema
  • ODM schema
  • Custom validation rules

Common Checks

  • Well-formed XML
  • Valid element structure
  • Data type conformance
  • Required elements present

Working with Dataset-XML

Reading Files

Dataset-XML can be read by:

  • Any XML parser
  • CDISC-compatible tools
  • Statistical software with XML support

Converting to Other Formats

From Dataset-XML, you can convert to:

  • XPT (for FDA submission)
  • CSV (for analysis)
  • Database tables

Technical Details

Encoding

  • UTF-8 (default and recommended)
  • UTF-16 supported
  • Encoding declared in XML header

Namespaces

xmlns="http://www.cdisc.org/ns/odm/v1.3"
        xmlns:data="http://www.cdisc.org/ns/Dataset-XML/v1.0"

File Extension

  • .xml for Dataset-XML files
  • Optionally: domain.xml (e.g., dm.xml)

Next Steps

Define-XML 2.1

Define-XML provides metadata documentation for CDISC datasets.

Overview

Define-XML is:

  • Required for FDA electronic submissions
  • Describes dataset structure and content
  • Documents variable definitions
  • Provides value-level metadata

What Define-XML Contains

Dataset Metadata

  • Dataset names and descriptions
  • Domain structure
  • Keys and sort order
  • Dataset locations

Variable Metadata

  • Variable names and labels
  • Data types and lengths
  • Origin information
  • Controlled terminology references

Value-Level Metadata

  • Specific value definitions
  • Conditional logic
  • Derivation methods

Computational Methods

  • Derivation algorithms
  • Imputation rules
  • Analysis methods

Define-XML 2.1 Structure

Root Element

<?xml version="1.0" encoding="UTF-8"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
     xmlns:def="http://www.cdisc.org/ns/def/v2.1"
     ODMVersion="1.3.2"
     FileType="Snapshot"
     FileOID="DEFINE-XML-EXAMPLE">

Key Components

ComponentDescription
StudyStudy-level information
MetaDataVersionMetadata container
ItemGroupDefDataset definitions
ItemDefVariable definitions
CodeListControlled terminology
MethodDefComputational methods
CommentDefComments and notes

Creating Define-XML

Automatic Generation

Trial Submission Studio generates Define-XML from:

  1. Mapped datasets
  2. Variable definitions
  3. Controlled terminology
  4. Validation rules

Export Steps

  1. Complete all domain mappings
  2. Run validation
  3. Click Export → Define-XML
  4. Review generated metadata
  5. Add comments/methods if needed
  6. Click Save

Generated Content

The exported Define-XML includes:

ElementSource
Dataset definitionsFrom mapped domains
Variable definitionsFrom SDTM standards
OriginsFrom mapping configuration
CodelistsFrom controlled terminology

Define-XML Elements

ItemGroupDef (Datasets)


<ItemGroupDef OID="IG.DM"
              Name="DM"
              Repeating="No"
              Domain="DM"
              def:Structure="One record per subject"
              def:Class="SPECIAL PURPOSE">
    <Description>
        <TranslatedText xml:lang="en">Demographics</TranslatedText>
    </Description>
    <ItemRef ItemOID="IT.DM.STUDYID" OrderNumber="1" Mandatory="Yes"/>
    <!-- More ItemRefs -->
</ItemGroupDef>

ItemDef (Variables)


<ItemDef OID="IT.DM.USUBJID"
         Name="USUBJID"
         DataType="text"
         Length="50"
         def:Origin="CRF">
    <Description>
        <TranslatedText xml:lang="en">Unique Subject Identifier</TranslatedText>
    </Description>
</ItemDef>

CodeList (Controlled Terminology)


<CodeList OID="CL.SEX"
          Name="Sex"
          DataType="text">
    <CodeListItem CodedValue="M">
        <Decode>
            <TranslatedText xml:lang="en">Male</TranslatedText>
        </Decode>
    </CodeListItem>
    <CodeListItem CodedValue="F">
        <Decode>
            <TranslatedText xml:lang="en">Female</TranslatedText>
        </Decode>
    </CodeListItem>
</CodeList>

Variable Origins

Define-XML documents where data comes from:

OriginDescription
CRFCase Report Form
DerivedCalculated from other data
AssignedAssigned by sponsor
ProtocolFrom study protocol
eDTElectronic data transfer

Customizing Define-XML

Adding Comments

Add explanatory comments for:

  • Complex derivations
  • Data collection notes
  • Exception documentation

Computational Methods

Document derivation algorithms:

  • Formulas
  • Conditions
  • Source variables

Value-Level Metadata

For variables with parameter-dependent definitions:

  • Different units by test
  • Conditional codelists
  • Test-specific origins

Validation

Schema Validation

Define-XML is validated against:

  • CDISC Define-XML 2.1 schema
  • Stylesheet rendering rules

Common Issues

IssueSolution
Missing required elementsAdd required metadata
Invalid referencesCheck OID references
Codelist mismatchesVerify CT alignment

FDA Requirements

Submission Package

  • define.xml - Metadata file
  • define.pdf - Rendered stylesheet (optional)
  • Referenced XPT datasets

Naming Convention

  • File: define.xml (lowercase)
  • Location: Study root folder

Stylesheet

Include the CDISC stylesheet for rendering:

<?xml-stylesheet type="text/xsl" href="define2-1.xsl"?>

Best Practices

  1. Generate early - Create Define-XML as you build datasets
  2. Review carefully - Verify all metadata is accurate
  3. Document derivations - Explain complex logic
  4. Test rendering - View with stylesheet before submission
  5. Validate - Use Define-XML validators

Next Steps

Architecture Overview

Trial Submission Studio is built as a modular Rust workspace with 10 specialized crates.

Design Philosophy

Core Principles

  1. Separation of Concerns - Each crate has a single responsibility
  2. Deterministic Output - Reproducible results for regulatory compliance
  3. Offline Operation - All standards embedded, no network dependencies
  4. Type Safety - Rust’s type system prevents data errors

Key Design Decisions

  • Pure Functions - Mapping and validation logic is side-effect free
  • Embedded Standards - CDISC data bundled in binary
  • No External APIs - Works without internet connection
  • Auditable - Clear data lineage and transformations

Workspace Structure

trial-submission-studio/
├── Cargo.toml              # Workspace configuration
├── crates/
│   ├── tss-gui/            # Desktop application
│   ├── xport/              # XPT file I/O
│   ├── tss-validate/       # CDISC validation
│   ├── tss-map/            # Column mapping
│   ├── tss-normalization/      # Data transformations
│   ├── tss-ingest/         # CSV loading
│   ├── tss-output/         # Multi-format export
│   ├── tss-standards/      # CDISC standards loader
│   ├── tss-model/          # Core types + Polars utilities
│   └── tss-updater/        # App update mechanism
├── standards/              # Embedded CDISC data
├── mockdata/               # Test datasets
└── docs/                   # This documentation

Crate Dependency Graph

flowchart TD
    subgraph Application
        GUI[tss-gui]
    end

    subgraph Processing
        MAP[tss-map]
        OUTPUT[tss-output]
        INGEST[tss-ingest]
        TRANSFORM[tss-normalization]
    end

    subgraph Validation
        VALIDATE[tss-validate]
    end

    subgraph I/O
        XPT[xport]
    end

    subgraph Core
        STANDARDS[tss-standards]
        MODEL[tss-model]
    end

    subgraph Utility
        UPDATER[tss-updater]
    end

    GUI --> MAP
    GUI --> OUTPUT
    GUI --> INGEST
    GUI --> UPDATER
    MAP --> VALIDATE
    MAP --> STANDARDS
    OUTPUT --> XPT
    OUTPUT --> STANDARDS
    INGEST --> STANDARDS
    VALIDATE --> STANDARDS
    STANDARDS --> MODEL
    style GUI fill: #4a90d9, color: #fff
    style STANDARDS fill: #50c878, color: #fff
    style MODEL fill: #f5a623, color: #fff

Crate Responsibilities

CratePurposeKey Dependencies
tss-guiDesktop applicationegui, eframe
xportXPT file I/Obyteorder, encoding_rs
tss-validateCDISC validationtss-standards
tss-mapFuzzy column mappingrapidfuzz
tss-normalizationData transformationspolars
tss-ingestCSV loadingcsv, polars
tss-outputMulti-format exportquick-xml
tss-standardsCDISC standards loaderserde, serde_json
tss-modelCore types + Polars utilitieschrono, polars
tss-updaterApp updatesreqwest

Data Flow

Import → Transform → Export

flowchart LR
    subgraph Input
        CSV[CSV File]
    end

    subgraph Processing
        INGEST[Ingest]
        MAP[Map & Transform]
        VALIDATE[Validate]
    end

    subgraph Output
        XPT[XPT File]
        XML[Dataset-XML]
        DEFINE[Define-XML]
    end

    CSV --> INGEST
    INGEST --> MAP
    MAP --> VALIDATE
    VALIDATE --> XPT
    VALIDATE --> XML
    VALIDATE --> DEFINE
    VALIDATE -.->|errors| MAP
    style CSV fill: #e8f4f8, stroke: #333
    style XPT fill: #d4edda, stroke: #333
    style XML fill: #d4edda, stroke: #333
    style DEFINE fill: #d4edda, stroke: #333

Standards Integration

flowchart TB
    subgraph "Embedded CDISC Data"
        SDTM[SDTM-IG 3.4]
        CT[Controlled Terminology]
        DOMAINS[Domain Definitions]
    end

    STANDARDS[tss-standards]
    SDTM --> STANDARDS
    CT --> STANDARDS
    DOMAINS --> STANDARDS
    STANDARDS --> MAP[tss-map]
    STANDARDS --> VALIDATE[tss-validate]
    STANDARDS --> OUTPUT[tss-output]
    style STANDARDS fill: #50c878, color: #fff

Key Technologies

Core Stack

ComponentTechnology
LanguageRust 1.92+
GUI Frameworkegui/eframe
Data ProcessingPolars
SerializationSerde
TestingInsta, Proptest

External Crates

PurposeCrate
Fuzzy matchingrapidfuzz
XML processingquick-xml
XPT handlingCustom (xport)
Loggingtracing
HTTP clientreqwest

Embedded Data

Standards Directory

standards/
├── sdtm/
│   └── ig/v3.4/
│       ├── Datasets.csv         # Domain definitions
│       ├── Variables.csv        # Variable metadata
│       ├── metadata.toml        # Version info
│       └── chapters/            # IG chapter documentation
├── adam/
│   └── ig/v1.3/
│       ├── DataStructures.csv   # ADaM structures
│       ├── Variables.csv        # Variable metadata
│       └── metadata.toml
├── send/
│   └── ig/v3.1.1/
│       ├── Datasets.csv         # SEND domains
│       ├── Variables.csv        # Variable metadata
│       └── metadata.toml
├── terminology/
│   ├── 2024-03-29/              # CT release date
│   │   ├── SDTM_CT_*.csv
│   │   ├── SEND_CT_*.csv
│   │   └── ADaM_CT_*.csv
│   ├── 2025-03-28/
│   └── 2025-09-26/              # Latest CT
├── validation/
│   ├── sdtm/Rules.csv           # SDTM validation rules
│   ├── adam/Rules.csv           # ADaM validation rules
│   └── send/Rules.csv           # SEND validation rules
└── xsl/
    ├── define2-0-0.xsl          # Define-XML stylesheets
    └── define2-1.xsl

Testing Strategy

Test Types

TypePurposeCrates
UnitFunction-levelAll
IntegrationCross-cratetss-gui
SnapshotOutput stabilityxport, tss-output
PropertyEdge casestss-map, tss-validate

Test Data

Mock datasets in mockdata/ for:

  • Various domain types
  • Edge cases
  • Validation testing

Next Steps

tss-gui

The desktop application crate providing the graphical user interface.

Overview

tss-gui is the main entry point for Trial Submission Studio, built with egui/eframe.

Responsibilities

  • Application window and layout
  • User interaction handling
  • Navigation between workflow steps
  • Data visualization
  • File dialogs and system integration

Dependencies

[dependencies]
eframe = "0.29"
egui = "0.29"
tss-ingest = { path = "../tss-ingest" }
tss-map = { path = "../tss-map" }
tss-validate = { path = "../tss-validate" }
tss-output = { path = "../tss-output" }
tss-updater = { path = "../tss-updater" }

Architecture

Application Structure

tss-gui/
├── src/
│   ├── main.rs           # Entry point
│   ├── app.rs            # Application state
│   ├── views/
│   │   ├── mod.rs
│   │   ├── import.rs     # Import view
│   │   ├── mapping.rs    # Mapping view
│   │   ├── validation.rs # Validation view
│   │   └── export.rs     # Export view
│   ├── widgets/
│   │   ├── mod.rs
│   │   ├── data_grid.rs  # Data table widget
│   │   └── mapping.rs    # Mapping connection widget
│   └── state/
│       ├── mod.rs
│       └── workflow.rs   # Workflow state machine
└── assets/
    ├── icon.svg
    └── icon.png

State Management

The application uses a centralized state pattern:

#![allow(unused)]
fn main() {
pub struct App {
    workflow: WorkflowState,
    data: Option<DataFrame>,
    mappings: Vec<Mapping>,
    validation_results: Vec<ValidationResult>,
}
}

View Pattern

Each view implements a common trait:

#![allow(unused)]
fn main() {
pub trait View {
    fn ui(&mut self, ctx: &egui::Context, state: &mut AppState);
    fn title(&self) -> &str;
}
}

Key Components

Main Window

  • Menu bar with file operations
  • Sidebar navigation
  • Main content area
  • Status bar

Data Grid

Custom widget for displaying large datasets:

  • Virtual scrolling for performance
  • Column sorting
  • Row selection
  • Type-aware formatting

Mapping Interface

Visual mapping between source and target:

  • Drag-and-drop connections
  • Match confidence display
  • Automatic suggestions

Validation Panel

Results display with:

  • Severity filtering
  • Row highlighting
  • Quick navigation to issues

Configuration

Settings Storage

User preferences stored in:

  • macOS: ~/Library/Application Support/trial-submission-studio/
  • Windows: %APPDATA%\trial-submission-studio\
  • Linux: ~/.config/trial-submission-studio/

Configurable Options

  • Theme (light/dark)
  • Recent files
  • Export preferences
  • Validation strictness

Running

# Development
cargo run --package tss-gui

# Release
cargo run --release --package tss-gui

Testing

cargo test --package tss-gui

GUI testing is limited; focus on:

  • State transitions
  • Data transformations
  • Integration with other crates

See Also

xport

XPT (SAS Transport) file I/O crate. Designed for standalone use and publishing to crates.io.

Overview

xport provides reading and writing of XPT V5 and V8 format files. It’s designed to be used independently of the Trial Submission Studio application for general SAS Transport file handling.

Features

  • Read XPT V5 and V8 format files
  • Write XPT V5 and V8 format files
  • Handle IBM mainframe to IEEE floating-point conversion
  • Support all 28 SAS missing value codes
  • Optional Polars DataFrame integration (polars feature)
  • Optional serde serialization (serde feature)

Dependencies

[dependencies]
xport = { version = "0.1", features = ["polars"] }  # With DataFrame support
# or
xport = "0.1"  # Core functionality only

Architecture

Module Structure

xport/
├── src/
│   ├── lib.rs
│   ├── reader/       # XPT file reading (streaming)
│   ├── writer/       # XPT file writing (streaming)
│   ├── header/       # Header parsing
│   ├── types/        # Core types (column, value, missing)
│   ├── error/        # Error handling
│   └── version.rs    # V5/V8 version handling

XPT Format Details

File Structure

┌─────────────────────────────────────┐
│ Library Header (80 bytes × 2)       │
├─────────────────────────────────────┤
│ Member Header (80 bytes × 3)        │
├─────────────────────────────────────┤
│ Namestr Records (140 bytes each)    │
│ (one per variable)                  │
├─────────────────────────────────────┤
│ Observation Header (80 bytes)       │
├─────────────────────────────────────┤
│ Data Records                        │
│ (fixed-width, packed)               │
└─────────────────────────────────────┘

Numeric Handling

IBM mainframe to IEEE conversion:

#![allow(unused)]
fn main() {
pub fn ibm_to_ieee(ibm_bytes: [u8; 8]) -> f64 {
    // Convert IBM 370 floating point to IEEE 754
}

pub fn ieee_to_ibm(value: f64) -> [u8; 8] {
    // Convert IEEE 754 to IBM 370 floating point
}
}

Missing Values

Support for all 28 SAS missing codes:

#![allow(unused)]
fn main() {
pub enum MissingValue {
    Standard,           // .
    Special(char),      // .A through .Z
    Underscore,         // ._
}
}

API

Reading

#![allow(unused)]
fn main() {
use xport::{read_xpt, XptDataset};

let dataset: XptDataset = read_xpt("dm.xpt")?;
println!("Variables: {}", dataset.columns.len());
println!("Observations: {}", dataset.rows.len());
}

Writing

#![allow(unused)]
fn main() {
use xport::{write_xpt, XptDataset, XptColumn, XptVersion};

let dataset = XptDataset {
    name: "DM".to_string(),
    label: Some("Demographics".to_string()),
    columns: vec![
        XptColumn::character("USUBJID", 20).with_label("Unique Subject ID"),
        XptColumn::numeric("AGE").with_label("Age"),
    ],
    rows: vec![/* data rows */],
    ..Default::default()
};

write_xpt("dm.xpt", &dataset)?;
}

With Polars (optional feature)

#![allow(unused)]
fn main() {
use xport::polars::{read_xpt_to_dataframe, write_dataframe_to_xpt};
use polars::prelude::*;

// Read to DataFrame
let df = read_xpt_to_dataframe("dm.xpt")?;

// Write from DataFrame
write_dataframe_to_xpt(&df, "output.xpt", XptVersion::V5)?;
}

Testing

cargo test --package xport
cargo test --package xport --features polars

Test Categories

  • Header parsing
  • Numeric conversion accuracy
  • Missing value roundtrip
  • Large file handling
  • V5/V8 compatibility

See Also

tss-validate

CDISC conformance validation crate.

Overview

tss-validate checks data against SDTM implementation guide rules and controlled terminology.

Responsibilities

  • Structural validation (required variables, types)
  • Content validation (controlled terminology, formats)
  • Cross-record validation (relationships, duplicates)
  • Generate validation reports

Dependencies

[dependencies]
tss-standards = { path = "../tss-standards" }
tss-model = { path = "../tss-model" }
regex = "1"
chrono = "0.4"

Architecture

Module Structure

tss-validate/
├── src/
│   ├── lib.rs
│   ├── engine.rs        # Validation orchestration
│   ├── rules/
│   │   ├── mod.rs
│   │   ├── structural.rs   # Structure rules
│   │   ├── content.rs      # Value rules
│   │   ├── terminology.rs  # CT validation
│   │   └── cross_record.rs # Relationship rules
│   ├── result.rs        # Validation results
│   └── report.rs        # Report generation

Validation Engine

Rule Interface

#![allow(unused)]
fn main() {
pub trait ValidationRule {
    fn id(&self) -> &str;
    fn severity(&self) -> Severity;
    fn validate(&self, context: &ValidationContext) -> Vec<ValidationResult>;
}
}

Severity Levels

#![allow(unused)]
fn main() {
pub enum Severity {
    Error,    // Blocks export
    Warning,  // Should review
    Info,     // Informational
}
}

Validation Context

#![allow(unused)]
fn main() {
pub struct ValidationContext<'a> {
    pub domain: &'a str,
    pub data: &'a DataFrame,
    pub mappings: &'a [Mapping],
    pub standards: &'a Standards,
}
}

Built-in Rules

Structural Rules (SD*)

RuleDescription
SD0001Required variable missing
SD0002Invalid variable name
SD0003Variable length exceeded
SD0004Invalid data type

Terminology Rules (CT*)

RuleDescription
CT0001Value not in codelist
CT0002Invalid date format
CT0003Date out of range

Cross-Record Rules (XR*)

RuleDescription
XR0001USUBJID not in DM
XR0002Duplicate key values

API

Running Validation

#![allow(unused)]
fn main() {
use tss_validate::{Validator, ValidationContext};

let validator = Validator::new( & standards);
let results = validator.validate( & context) ?;

for result in results.errors() {
println ! ("{}: {}", result.rule_id, result.message);
}
}

Custom Rules

#![allow(unused)]
fn main() {
struct MyCustomRule;

impl ValidationRule for MyCustomRule {
    fn id(&self) -> &str { "CUSTOM001" }
    fn severity(&self) -> Severity { Severity::Warning }
    fn validate(&self, ctx: &ValidationContext) -> Vec<ValidationResult> {
        // Custom logic
    }
}
}

Testing

cargo test --package tss-validate

Test Strategy

  • Unit tests for each rule
  • Integration tests with sample data
  • Property tests for edge cases

See Also

tss-map

Fuzzy column mapping engine crate.

Overview

tss-map provides intelligent matching between source columns and SDTM variables.

Responsibilities

  • Fuzzy string matching for column names
  • Match confidence scoring
  • Mapping suggestions
  • Type compatibility checking

Dependencies

[dependencies]
rapidfuzz = "0.5"
tss-standards = { path = "../tss-standards" }
tss-model = { path = "../tss-model" }

Architecture

Module Structure

tss-map/
├── src/
│   ├── lib.rs
│   ├── matcher.rs      # Fuzzy matching logic
│   ├── scorer.rs       # Confidence scoring
│   ├── mapping.rs      # Mapping structures
│   └── suggestions.rs  # Auto-suggestion engine

Matching Algorithm

Process

  1. Normalize names - Case folding, remove special chars
  2. Calculate similarity - Multiple algorithms
  3. Apply domain hints - Boost relevant matches
  4. Score confidence - Combine factors
  5. Rank suggestions - Order by score

Similarity Metrics

#![allow(unused)]
fn main() {
pub fn calculate_similarity(source: &str, target: &str) -> f64 {
    let ratio = rapidfuzz::fuzz::ratio(source, target);
    let partial = rapidfuzz::fuzz::partial_ratio(source, target);
    let token_sort = rapidfuzz::fuzz::token_sort_ratio(source, target);

    // Weighted combination
    (ratio * 0.4 + partial * 0.3 + token_sort * 0.3) / 100.0
}
}

Confidence Levels

ScoreLevelAction
> 0.80HighAuto-accept
0.50-0.80MediumReview
< 0.50LowManual

API

Finding Matches

#![allow(unused)]
fn main() {
use tss_map::{Matcher, MatchOptions};

let matcher = Matcher::new( & standards);
let suggestions = matcher.suggest_mappings(
& source_columns,
domain,
MatchOptions::default ()
) ?;

for suggestion in suggestions {
println!("{} -> {} ({:.0}%)",
         suggestion.source,
         suggestion.target,
         suggestion.confidence * 100.0
);
}
}

Mapping Structure

#![allow(unused)]
fn main() {
pub struct Mapping {
    pub source_column: String,
    pub target_variable: String,
    pub confidence: f64,
    pub user_confirmed: bool,
}
}

Match Options

#![allow(unused)]
fn main() {
pub struct MatchOptions {
    pub min_confidence: f64,
    pub max_suggestions: usize,
    pub consider_types: bool,
}
}

Heuristics

Domain-Specific Boosting

PatternDomainBoost
*SUBJ*All+0.1
*AGE*DM+0.15
*TERM*AE, MH+0.15
*TEST*LB, VS+0.15

Common Transformations

Source PatternTarget
SUBJECT_IDUSUBJID
PATIENT_AGEAGE
GENDERSEX
VISIT_DATE–DTC

Testing

cargo test --package tss-map

Test Categories

  • Exact match detection
  • Fuzzy match accuracy
  • Confidence scoring
  • Domain-specific matching

See Also

tss-normalization

Data normalization crate for CDISC conversions.

Overview

tss-normalization applies normalizations to convert source data to SDTM-compliant format.

Responsibilities

  • Apply column mappings
  • Normalize data values to SDTM standards
  • Derive computed variables
  • Handle date conversions to ISO 8601
  • Apply controlled terminology mappings

Dependencies

[dependencies]
polars = { version = "0.44", features = ["lazy"] }
chrono = "0.4"
tss-model = { path = "../tss-model" }
tss-standards = { path = "../tss-standards" }

Architecture

Module Structure

tss-normalization/
├── src/
│   ├── lib.rs
│   ├── executor.rs       # Normalization execution
│   ├── inference.rs      # Type inference from domain metadata
│   ├── preview.rs        # Preview DataFrame builder
│   ├── types.rs          # Core types (NormalizationType, NormalizationRule, etc.)
│   ├── error.rs          # NormalizationError
│   └── normalization/
│       ├── mod.rs
│       ├── ct.rs         # Controlled terminology normalization
│       ├── datetime.rs   # ISO 8601 datetime formatting
│       ├── duration.rs   # ISO 8601 duration formatting
│       ├── numeric.rs    # Numeric conversions
│       └── studyday.rs   # Study day calculations

Normalization Types

NormalizationType Enum

#![allow(unused)]
fn main() {
pub enum NormalizationType {
    /// Copy value directly without modification
    CopyDirect,
    /// Auto-generate constant (STUDYID, DOMAIN)
    Constant,
    /// Derive USUBJID as STUDYID-SUBJID
    UsubjidPrefix,
    /// Generate sequence number per USUBJID
    SequenceNumber,
    /// Format as ISO 8601 datetime
    Iso8601DateTime,
    /// Format as ISO 8601 date
    Iso8601Date,
    /// Format as ISO 8601 duration
    Iso8601Duration,
    /// Calculate study day relative to RFSTDTC
    StudyDay { reference_dtc: String },
    /// Normalize using controlled terminology codelist
    CtNormalization { codelist_code: String },
    /// Convert to numeric (Float64)
    NumericConversion,
}
}

API

Building a Pipeline

#![allow(unused)]
fn main() {
use tss_normalization::{infer_normalization_rules, execute_normalization, NormalizationContext};

// Infer rules from domain metadata
let pipeline = infer_normalization_rules(&domain);

// Create execution context
let context = NormalizationContext::new(study_id, &domain.name)
    .with_ct_registry(ct_registry)
    .with_mappings(mappings);

// Execute normalization
let result = execute_normalization(&source_df, &pipeline, &context)?;
}

Preview Functions

#![allow(unused)]
fn main() {
use tss_normalization::build_preview_dataframe_with_dm_and_omitted;

let result = build_preview_dataframe_with_dm_and_omitted(
    &source_df,
    &mappings,
    &omitted,
    &domain,
    &study_id,
    dm_df.as_ref(),
    ct_registry.as_ref(),
)?;
}

Date Handling

Supported Input Formats

FormatExample
ISO 86012024-01-15
US01/15/2024
EU15-01-2024
With time2024-01-15T09:30:00

Output Format

Always ISO 8601:

  • Full: YYYY-MM-DDTHH:MM:SS
  • Date only: YYYY-MM-DD
  • Partial: YYYY-MM or YYYY

Testing

cargo test --package tss-normalization

Test Strategy

  • Unit tests for each normalization type
  • Integration tests with sample data
  • Snapshot tests for output consistency

See Also

tss-ingest

CSV ingestion and schema detection crate.

Overview

tss-ingest handles loading source data files and detecting their schema.

Responsibilities

  • CSV file parsing
  • Schema detection (types, formats)
  • Domain suggestion
  • Data preview generation

Dependencies

[dependencies]
csv = "1.3"
polars = { version = "0.44", features = ["lazy", "csv"] }
encoding_rs = "0.8"
tss-model = { path = "../tss-model" }

Architecture

Module Structure

tss-ingest/
├── src/
│   ├── lib.rs
│   ├── reader.rs        # CSV reading
│   ├── schema.rs        # Schema detection
│   ├── types.rs         # Type inference
│   ├── domain.rs        # Domain suggestion
│   └── preview.rs       # Data preview

Schema Detection

Type Inference

#![allow(unused)]
fn main() {
pub enum InferredType {
    Integer,
    Float,
    Date(String),      // With format pattern
    DateTime(String),
    Boolean,
    Text,
}
}

Detection Algorithm

  1. Sample first N rows
  2. For each column:
    • Try parsing as integer
    • Try parsing as float
    • Try common date formats
    • Default to text

Date Format Detection

PatternExample
%Y-%m-%d2024-01-15
%m/%d/%Y01/15/2024
%d-%m-%Y15-01-2024
%Y-%m-%dT%H:%M:%S2024-01-15T09:30:00

API

Loading a File

#![allow(unused)]
fn main() {
use tss_ingest::{CsvReader, IngestOptions};

let options = IngestOptions {
encoding: Some("utf-8"),
sample_rows: 1000,
..Default::default ()
};

let result = CsvReader::read("data.csv", options) ?;
println!("Rows: {}", result.row_count);
println!("Columns: {:?}", result.schema.columns);
}

Schema Result

#![allow(unused)]
fn main() {
pub struct IngestResult {
    pub data: DataFrame,
    pub schema: DetectedSchema,
    pub suggested_domain: Option<String>,
    pub warnings: Vec<IngestWarning>,
}

pub struct DetectedSchema {
    pub columns: Vec<ColumnInfo>,
}

pub struct ColumnInfo {
    pub name: String,
    pub inferred_type: InferredType,
    pub null_count: usize,
    pub sample_values: Vec<String>,
}
}

Domain Suggestion

Based on column names, suggest likely SDTM domain:

Column PatternsSuggested Domain
USUBJID, AGE, SEXDM
AETERM, AESTDTCAE
VSTESTCD, VSORRESVS
LBTESTCD, LBORRESLB
#![allow(unused)]
fn main() {
pub fn suggest_domain(columns: &[String]) -> Option<String> {
    // Pattern matching logic
}
}

Error Handling

Common Issues

IssueHandling
Encoding errorTry alternative encodings
Parse errorMark as text, warn user
Empty fileReturn error
No headerRequire user action

Testing

cargo test --package tss-ingest

Test Files

Located in mockdata/:

  • Various CSV formats
  • Different encodings
  • Edge cases

See Also

tss-output

Multi-format export crate.

Overview

tss-output generates output files in XPT, Dataset-XML, and Define-XML formats.

Responsibilities

  • Coordinate export to multiple formats
  • Generate XPT files (via xport)
  • Generate Dataset-XML
  • Generate Define-XML 2.1
  • Create checksums

Dependencies

[dependencies]
quick-xml = "0.36"
xport = { path = "../xport" }
tss-model = { path = "../tss-model" }
tss-standards = { path = "../tss-standards" }
sha2 = "0.10"

Architecture

Module Structure

tss-output/
├── src/
│   ├── lib.rs
│   ├── exporter.rs      # Export orchestration
│   ├── xpt.rs           # XPT export wrapper
│   ├── dataset_xml.rs   # Dataset-XML generation
│   ├── define_xml.rs    # Define-XML generation
│   └── checksum.rs      # SHA256 generation

Export Formats

XPT Export

Delegates to xport:

#![allow(unused)]
fn main() {
pub fn export_xpt(
    data: &DataFrame,
    metadata: &DatasetMetadata,
    path: &Path,
    version: XptVersion,
) -> Result<()> {
    let writer = XptWriter::new(path, version)?;
    writer.write_metadata(metadata)?;
    writer.write_data(data)?;
    writer.finish()
}
}

Dataset-XML Export

#![allow(unused)]
fn main() {
pub fn export_dataset_xml(
    data: &DataFrame,
    metadata: &DatasetMetadata,
    path: &Path,
) -> Result<()> {
    let mut writer = XmlWriter::new(path)?;
    writer.write_odm_header()?;
    writer.write_clinical_data(data, metadata)?;
    writer.finish()
}
}

Define-XML Export

#![allow(unused)]
fn main() {
pub fn export_define_xml(
    datasets: &[DatasetMetadata],
    standards: &Standards,
    path: &Path,
) -> Result<()> {
    let mut writer = DefineXmlWriter::new(path)?;
    writer.write_study_metadata()?;
    writer.write_item_group_defs(datasets)?;
    writer.write_item_defs(datasets)?;
    writer.write_codelists()?;
    writer.finish()
}
}

API

Single Dataset Export

#![allow(unused)]
fn main() {
use tss_output::{Exporter, ExportOptions, ExportFormat};

let exporter = Exporter::new();
let options = ExportOptions {
format: ExportFormat::XptV5,
generate_checksum: true,
};

exporter.export( & data, & metadata, "dm.xpt", options) ?;
}

Batch Export

#![allow(unused)]
fn main() {
let batch_options = BatchExportOptions {
output_dir: PathBuf::from("./output"),
formats: vec![ExportFormat::XptV5, ExportFormat::DefineXml],
generate_checksums: true,
};

exporter.export_batch( & datasets, batch_options) ?;
}

Checksum Generation

#![allow(unused)]
fn main() {
pub fn generate_checksum(path: &Path) -> Result<String> {
    use sha2::{Sha256, Digest};

    let mut hasher = Sha256::new();
    let mut file = File::open(path)?;
    std::io::copy(&mut file, &mut hasher)?;

    Ok(format!("{:x}", hasher.finalize()))
}
}

Output: dm.xpt.sha256 containing:

abc123...def456  dm.xpt

Testing

cargo test --package tss-output

Test Strategy

  • Output format validation
  • Roundtrip testing (export then read)
  • Checksum verification
  • Define-XML schema validation

See Also

tss-standards

CDISC standards data loader crate.

Overview

tss-standards loads and provides access to embedded CDISC standard definitions.

Responsibilities

  • Load SDTM-IG definitions
  • Load controlled terminology
  • Provide domain/variable metadata
  • Version management

Dependencies

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"
include_dir = "0.7"
tss-model = { path = "../tss-model" }

Architecture

Module Structure

tss-standards/
├── src/
│   ├── lib.rs
│   ├── loader.rs         # Data loading
│   ├── sdtm.rs           # SDTM definitions
│   ├── terminology.rs    # Controlled terminology
│   └── cache.rs          # In-memory caching

Embedded Data

Standards are embedded at compile time:

#![allow(unused)]
fn main() {
use include_dir::{include_dir, Dir};

static STANDARDS_DIR: Dir = include_dir!("$CARGO_MANIFEST_DIR/../standards");
}

Data Structures

SDTM Definitions

#![allow(unused)]
fn main() {
pub struct SdtmIg {
    pub version: String,
    pub domains: Vec<DomainDefinition>,
}

pub struct DomainDefinition {
    pub code: String,           // e.g., "DM"
    pub name: String,           // e.g., "Demographics"
    pub class: DomainClass,
    pub structure: String,
    pub variables: Vec<VariableDefinition>,
}

pub struct VariableDefinition {
    pub name: String,
    pub label: String,
    pub data_type: DataType,
    pub core: Core,             // Required/Expected/Permissible
    pub codelist: Option<String>,
    pub description: String,
}
}

Controlled Terminology

#![allow(unused)]
fn main() {
pub struct ControlledTerminology {
    pub version: String,
    pub codelists: Vec<Codelist>,
}

pub struct Codelist {
    pub code: String,           // e.g., "C66731"
    pub name: String,           // e.g., "Sex"
    pub extensible: bool,
    pub terms: Vec<Term>,
}

pub struct Term {
    pub code: String,
    pub value: String,
    pub synonyms: Vec<String>,
}
}

API

Loading Standards

#![allow(unused)]
fn main() {
use tss_standards::Standards;

// Load with specific versions
let standards = Standards::load(
SdtmVersion::V3_4,
CtVersion::V2024_12_20,
) ?;

// Get domain definition
let dm = standards.get_domain("DM") ?;

// Get codelist
let sex = standards.get_codelist("SEX") ?;
}

Querying

#![allow(unused)]
fn main() {
// Get required variables for domain
let required = standards.required_variables("DM");

// Check if value is in codelist
let valid = standards.is_valid_term("SEX", "M");

// Get variable definition
let var = standards.get_variable("DM", "USUBJID") ?;
}

Embedded Data Format

SDTM JSON

{
  "version": "3.4",
  "domains": [
    {
      "code": "DM",
      "name": "Demographics",
      "class": "SPECIAL_PURPOSE",
      "structure": "One record per subject",
      "variables": [
        {
          "name": "STUDYID",
          "label": "Study Identifier",
          "dataType": "Char",
          "core": "Required"
        }
      ]
    }
  ]
}

CT JSON

{
  "version": "2024-12-20",
  "codelists": [
    {
      "code": "C66731",
      "name": "Sex",
      "extensible": false,
      "terms": [
        {
          "code": "C16576",
          "value": "F"
        },
        {
          "code": "C20197",
          "value": "M"
        }
      ]
    }
  ]
}

Caching

Standards are cached in memory after first load:

#![allow(unused)]
fn main() {
lazy_static! {
    static ref STANDARDS_CACHE: RwLock<Option<Standards>> = RwLock::new(None);
}
}

Testing

cargo test --package tss-standards

Test Categories

  • JSON parsing
  • Version loading
  • Query accuracy
  • Missing data handling

See Also

tss-model

Core domain types crate.

Overview

tss-model defines the fundamental data structures used across all crates.

Responsibilities

  • Define core data types
  • Provide serialization/deserialization
  • Ensure type consistency across crates

Dependencies

[dependencies]
serde = { version = "1", features = ["derive"] }
chrono = { version = "0.4", features = ["serde"] }

Architecture

Module Structure

tss-model/
├── src/
│   ├── lib.rs
│   ├── domain.rs        # Domain types
│   ├── variable.rs      # Variable types
│   ├── mapping.rs       # Mapping types
│   ├── validation.rs    # Validation types
│   └── metadata.rs      # Metadata types

Core Types

Domain Types

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum DomainClass {
    SpecialPurpose,
    Interventions,
    Events,
    Findings,
    Custom,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Domain {
    pub code: String,
    pub name: String,
    pub class: DomainClass,
    pub description: String,
}
}

Variable Types

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum DataType {
    Char,
    Num,
    Date,
    DateTime,
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum Core {
    Required,
    Expected,
    Permissible,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Variable {
    pub name: String,
    pub label: String,
    pub data_type: DataType,
    pub length: Option<usize>,
    pub core: Core,
    pub codelist: Option<String>,
}
}

Mapping Types

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Mapping {
    pub source_column: String,
    pub target_variable: String,
    pub confidence: f64,
    pub transform: Option<Transform>,
    pub confirmed: bool,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Transform {
    Rename,
    ValueMap(HashMap<String, String>),
    DateFormat(String),
    Uppercase,
    Trim,
}
}

Validation Types

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum Severity {
    Error,
    Warning,
    Info,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ValidationResult {
    pub rule_id: String,
    pub severity: Severity,
    pub message: String,
    pub location: Option<Location>,
    pub suggestion: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Location {
    pub row: Option<usize>,
    pub column: Option<String>,
}
}

Metadata Types

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DatasetMetadata {
    pub name: String,
    pub label: String,
    pub domain: String,
    pub structure: String,
    pub variables: Vec<VariableMetadata>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VariableMetadata {
    pub name: String,
    pub label: String,
    pub data_type: DataType,
    pub length: usize,
    pub origin: Origin,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Origin {
    Crf,
    Derived,
    Assigned,
    Protocol,
}
}

Design Principles

Immutability

Types are designed to be cloned rather than mutated:

#![allow(unused)]
fn main() {
let updated = Mapping {
confirmed: true,
..original
};
}

Serialization

All types derive Serialize and Deserialize for:

  • Configuration storage
  • State persistence
  • Debug output

Equality

Types implement PartialEq for:

  • Testing
  • Deduplication
  • Change detection

Testing

cargo test --package tss-model

Test Focus

  • Serialization roundtrip
  • Type conversions
  • Default values

See Also

tss-updater

Application update mechanism crate.

Overview

tss-updater checks for and applies application updates from GitHub releases.

Responsibilities

  • Check for new versions
  • Download updates
  • Verify checksums
  • Apply updates (platform-specific)

Dependencies

[dependencies]
reqwest = { version = "0.12", features = ["json"] }
semver = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
sha2 = "0.10"
tss-common = { path = "../tss-common" }

Architecture

Module Structure

tss-updater/
├── src/
│   ├── lib.rs
│   ├── checker.rs       # Version checking
│   ├── downloader.rs    # Download handling
│   ├── verifier.rs      # Checksum verification
│   └── installer.rs     # Update installation

Update Flow

┌─────────────────┐
│ Check Version   │
│ (GitHub API)    │
└────────┬────────┘
         │ New version?
         ▼
┌─────────────────┐
│ Download Asset  │
│ (Release file)  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Verify Checksum │
│ (SHA256)        │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Install Update  │
│ (Platform)      │
└─────────────────┘

API

Checking for Updates

#![allow(unused)]
fn main() {
use tss_updater::{UpdateChecker, UpdateInfo};

let checker = UpdateChecker::new("rubentalstra", "Trial-Submission-Studio");

match checker.check_for_updates(current_version)? {
Some(update) => {
println ! ("New version available: {}", update.version);
println ! ("Release notes: {}", update.notes);
}
None => {
println ! ("You're up to date!");
}
}
}

Update Info

#![allow(unused)]
fn main() {
pub struct UpdateInfo {
    pub version: Version,
    pub notes: String,
    pub download_url: String,
    pub checksum_url: String,
    pub published_at: DateTime<Utc>,
}
}

Downloading

#![allow(unused)]
fn main() {
use tss_updater::Downloader;

let downloader = Downloader::new();
let progress_callback = | percent| {
println ! ("Download: {}%", percent);
};

downloader.download( & update.download_url, & temp_path, progress_callback) ?;
}

Verification

#![allow(unused)]
fn main() {
use tss_updater::Verifier;

let verifier = Verifier::new();
let expected_hash = verifier.fetch_checksum( & update.checksum_url) ?;

if verifier.verify_file( & temp_path, & expected_hash)? {
println ! ("Checksum verified!");
} else {
return Err(UpdateError::ChecksumMismatch);
}
}

Platform-Specific Installation

macOS

  1. Mount/extract new app bundle
  2. Replace existing application
  3. Restart application

Windows

  1. Extract to temp location
  2. Schedule replacement on restart
  3. Restart application

Linux

  1. Extract new binary
  2. Replace existing binary
  3. Restart application

Security

HTTPS Only

All connections use HTTPS:

  • GitHub API
  • Release downloads
  • Checksum files

Checksum Verification

SHA256 checksums verified before installation.

Signed Releases

(Future) Code signing verification for releases.

Configuration

Update Settings

#![allow(unused)]
fn main() {
pub struct UpdateConfig {
    pub check_on_startup: bool,
    pub auto_download: bool,
    pub prerelease: bool,  // Include prereleases
}
}

Default Behavior

  • Check on startup (with delay)
  • Notify user, don’t auto-install
  • Stable releases only

Error Handling

#![allow(unused)]
fn main() {
#[derive(Error, Debug)]
pub enum UpdateError {
    #[error("Network error: {0}")]
    Network(#[from] reqwest::Error),

    #[error("Checksum mismatch")]
    ChecksumMismatch,

    #[error("Installation failed: {0}")]
    InstallFailed(String),
}
}

Testing

cargo test --package tss-updater

Test Strategy

  • Mock HTTP responses
  • Checksum calculation tests
  • Version comparison tests

See Also

Design Decisions

Key architectural decisions and their rationale.

Why Rust?

Chosen: Rust

Rationale:

  • Memory safety without garbage collection
  • Performance comparable to C/C++
  • Type system catches errors at compile time
  • Cross-platform compilation to native binaries
  • Growing ecosystem for data processing

Alternatives Considered

LanguageProsCons
PythonFamiliar, many librariesPerformance, distribution
JavaCross-platform, matureJVM dependency, startup time
C++PerformanceMemory safety, complexity
GoSimple, fast compilationLess expressive types

Why egui for GUI?

Chosen: egui/eframe

Rationale:

  • Immediate mode - Simple mental model
  • Pure Rust - No FFI complexity
  • Cross-platform - macOS, Windows, Linux
  • Lightweight - Small binary size
  • Fast iteration - Easy to prototype

Alternatives Considered

FrameworkProsCons
TauriWeb tech, flexibleBundle size, two languages
GTK-rsNative lookPlatform differences
QtMature, richLicense complexity, bindings
IcedElm-likeLess mature

Why Polars for Data?

Chosen: Polars

Rationale:

  • Performance - Lazy evaluation, parallelism
  • Rust native - No Python dependency
  • DataFrame API - Familiar for data work
  • Memory efficient - Arrow-based

Alternatives Considered

LibraryProsCons
ndarrayLow-level controlMore manual work
ArrowStandard formatLess DataFrame features
CustomFull controlDevelopment time

Why Embed Standards?

Chosen: Embedded CDISC data

Rationale:

  • Offline operation - No network dependency
  • Deterministic - Consistent across runs
  • Fast - No API latency
  • Regulatory - Audit trail

Alternatives Considered

ApproachProsCons
API-basedAlways currentNetwork required, latency
Download on demandSmaller binaryCaching complexity
Plugin systemFlexibleDistribution complexity

Workspace Architecture

Chosen: Multi-crate workspace

Rationale:

  • Separation of concerns - Clear boundaries
  • Parallel compilation - Faster builds
  • Selective testing - Test only changed crates
  • Reusability - Crates can be used independently

Crate Boundaries

BoundaryPrinciple
tss-modelCore types, no dependencies on other crates
tss-standardsPure data loading, no transformation logic
tss-validateRules only, no I/O
xportXPT format only, no CDISC logic

Data Processing Pipeline

Chosen: Lazy evaluation with checkpoints

Rationale:

  • Memory efficiency - Don’t load all data at once
  • Performance - Optimize query plans
  • Transparency - User sees intermediate results
  • Recoverability - Can resume from checkpoints

Pipeline Stages

flowchart LR
    subgraph Stage1[Import]
        I1[CSV File]
        I2[Schema Detection]
    end

    subgraph Stage2[Map]
        M1[Column Matching]
        M2[Type Conversion]
    end

    subgraph Stage3[Validate]
        V1[Structure Rules]
        V2[CT Validation]
        V3[Cross-Domain]
    end

    subgraph Stage4[Export]
        E1[XPT Generation]
        E2[XML Output]
    end

    I1 --> I2 --> M1 --> M2 --> V1 --> V2 --> V3 --> E1
    V3 --> E2
    V1 -.->|Errors| M1
    V2 -.->|Warnings| M1
    style I1 fill: #e8f4f8, stroke: #333
    style E1 fill: #d4edda, stroke: #333
    style E2 fill: #d4edda, stroke: #333

Validation Strategy

Chosen: Multi-level validation

Rationale:

  • Early feedback - Catch issues during mapping
  • Complete checking - Full validation before export
  • Severity levels - Error vs. warning vs. info
  • Actionable - Clear fix suggestions

Validation Levels

flowchart TB
    subgraph "Validation Layers"
        direction TB
        L1[Schema Validation<br/>File structure, encoding]
        L2[Mapping Validation<br/>Variable compatibility, types]
        L3[Content Validation<br/>CDISC compliance, CT checks]
        L4[Output Validation<br/>Format conformance, checksums]
    end

    IMPORT[Import] --> L1
    L1 --> MAP[Map]
    MAP --> L2
    L2 --> TRANSFORM[Transform]
    TRANSFORM --> L3
    L3 --> EXPORT[Export]
    EXPORT --> L4
    L4 --> OUTPUT[Output Files]
    L1 -.->|Schema Error| IMPORT
    L2 -.->|Type Mismatch| MAP
    L3 -.->|CT Error| TRANSFORM
    style L1 fill: #ffeeba, stroke: #333
    style L2 fill: #ffeeba, stroke: #333
    style L3 fill: #ffeeba, stroke: #333
    style L4 fill: #ffeeba, stroke: #333
    style OUTPUT fill: #d4edda, stroke: #333
LevelWhenPurpose
SchemaImportFile structure
MappingMap stepVariable compatibility
ContentPre-exportCDISC compliance
OutputExportFormat conformance

Error Handling

Chosen: Result types with context

Rationale:

  • No panics - Graceful error handling
  • Context - Where and why errors occurred
  • Recovery - Allow user to fix and continue
  • Logging - Full trace for debugging

Error Categories

CategoryHandling
User errorDisplay message, allow retry
Data errorShow affected rows, suggest fix
System errorLog, display generic message
BugLog with context, fail gracefully

File Format Choices

XPT V5 as Default

Rationale:

  • FDA requirement for submissions
  • Maximum compatibility
  • Well-documented format

XPT V8 as Option

Rationale:

  • Longer variable names
  • Larger labels
  • Future-proofing

Security Considerations

Data Privacy

  • No cloud - All processing local
  • No telemetry - No usage data collection
  • No network - Works fully offline

Code Security

  • Dependency audit - Regular cargo audit
  • Minimal dependencies - Reduce attack surface
  • Memory safety - Rust’s guarantees

Performance Goals

Target Metrics

OperationTarget
Import 100K rows< 2 seconds
Validation< 5 seconds
Export to XPT< 3 seconds
Application startup< 1 second

Optimization Strategies

  • Lazy evaluation
  • Parallel processing
  • Memory mapping for large files
  • Incremental validation

Future Considerations

Extensibility

The architecture supports future additions:

  • New CDISC standards (ADaM, SEND)
  • Additional output formats
  • Plugin system (potential)
  • CLI interface (potential)

Backward Compatibility

  • Configuration format versioning
  • Data migration paths
  • Deprecation warnings

Next Steps

Contributing: Getting Started

Thank you for your interest in contributing to Trial Submission Studio!

Ways to Contribute

Code Contributions

  • Bug fixes
  • New features
  • Performance improvements
  • Documentation updates

Non-Code Contributions

  • Bug reports
  • Feature requests
  • Documentation improvements
  • Testing and feedback
  • Helping other users

Before You Start

Prerequisites

  • Rust 1.92+ - Install via rustup
  • Git - For version control
  • Basic familiarity with Rust programming
  • (Optional) Understanding of CDISC SDTM standards

Read the Documentation

Familiarize yourself with:

Finding Issues to Work On

GitHub Issues

  1. Check GitHub Issues
  2. Look for labels:
    • good-first-issue - Great for newcomers
    • help-wanted - We’d love assistance
    • bug - Known issues to fix
    • enhancement - New features

Claiming an Issue

  1. Find an issue you want to work on
  2. Comment on the issue expressing interest
  3. Wait for maintainer feedback before starting
  4. Fork the repository
  5. Create a branch and start working

Contribution Workflow

Overview

1. Find Issue → 2. Comment → 3. Fork → 4. Branch → 5. Code → 6. Test → 7. PR

Detailed Steps

  1. Find an issue (or create one)
  2. Comment to claim it
  3. Fork the repository
  4. Clone your fork
  5. Create a branch (feature/my-feature or fix/my-fix)
  6. Make changes
  7. Test your changes
  8. Commit with conventional commit messages
  9. Push to your fork
  10. Create a Pull Request

Communication

Where to Discuss

  • GitHub Issues - Bug reports, feature requests
  • GitHub Discussions - Questions, ideas, general discussion
  • Pull Requests - Code review discussion

Guidelines

  • Be respectful and constructive
  • Assume good intentions
  • Welcome newcomers
  • Focus on the code, not the person

Code of Conduct

Please read and follow our Code of Conduct.

Key points:

  • Be respectful and inclusive
  • Welcome newcomers
  • Focus on constructive feedback
  • Assume good intentions

Getting Help

Stuck on Something?

  1. Check existing documentation
  2. Search GitHub Issues/Discussions
  3. Ask in GitHub Discussions
  4. Open an issue with your question

Review Process

After submitting a PR:

  1. Automated checks run (CI)
  2. Maintainer reviews code
  3. Address any feedback
  4. Maintainer merges when ready

Recognition

Contributors are recognized in:

  • GitHub contributor list
  • Release notes (for significant contributions)
  • THIRD_PARTY_LICENSES.md (if adding dependencies)

Next Steps

Development Setup

Set up your development environment for contributing to Trial Submission Studio.

Prerequisites

Required

ToolVersionPurpose
Rust1.92+Programming language
GitAny recentVersion control

Optional

ToolPurpose
cargo-aboutLicense generation
cargo-watchAuto-rebuild on changes

Step 1: Fork and Clone

Fork on GitHub

  1. Go to Trial Submission Studio
  2. Click “Fork” in the top right
  3. Select your account

Clone Your Fork

git clone https://github.com/YOUR_USERNAME/trial-submission-studio.git
cd trial-submission-studio

Add Upstream Remote

git remote add upstream https://github.com/rubentalstra/Trial-Submission-Studio.git

Step 2: Install Rust

Using rustup

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Verify Installation

rustup show

Expected output should show Rust 1.92 or higher.

Install Required Toolchain

rustup toolchain install stable
rustup component add rustfmt clippy

Step 3: Platform Dependencies

macOS

No additional dependencies required.

Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install -y libgtk-3-dev libxdo-dev

Windows

No additional dependencies required.

Step 4: Build the Project

Debug Build

cargo build

Release Build

cargo build --release

Check Build

cargo check

Step 5: Run the Application

cargo run --package tss-gui

Step 6: Run Tests

# All tests
cargo test

# Specific crate
cargo test --package xport

# With output
cargo test -- --nocapture

Step 7: Run Lints

# Format check
cargo fmt --check

# Apply formatting
cargo fmt

# Clippy lints
cargo clippy -- -D warnings

IDE Setup

RustRover / IntelliJ IDEA

  1. Open the project folder
  2. Rust plugin auto-detects workspace
  3. Configure run configuration for tss-gui

VS Code

  1. Install rust-analyzer extension
  2. Open the project folder
  3. Extension auto-configures
  • rust-analyzer
  • Even Better TOML
  • Error Lens
  • GitLens

Project Structure

trial-submission-studio/
├── Cargo.toml              # Workspace config
├── crates/                 # All crates
│   ├── tss-gui/           # Main application
│   ├── xport/             # XPT I/O
│   └── ...                # Other crates
├── standards/             # Embedded CDISC data
├── mockdata/              # Test data
└── docs/                  # Documentation

Development Workflow

Create Feature Branch

git checkout main
git pull upstream main
git checkout -b feature/my-feature

Make Changes

  1. Edit code
  2. Run tests: cargo test
  3. Run lints: cargo clippy
  4. Format: cargo fmt

Commit Changes

git add .
git commit -m "feat: add my feature"

Push and Create PR

git push origin feature/my-feature

Then create PR on GitHub.

Useful Commands

CommandPurpose
cargo buildBuild debug
cargo build --releaseBuild release
cargo testRun all tests
cargo test --package XTest specific crate
cargo clippyRun linter
cargo fmtFormat code
cargo doc --openGenerate docs
cargo run -p tss-guiRun application

Troubleshooting

Build Fails

  1. Ensure Rust 1.92+: rustup update stable
  2. Clean build: cargo clean && cargo build
  3. Check dependencies: cargo fetch

Tests Fail

  1. Run with output: cargo test -- --nocapture
  2. Run specific test: cargo test test_name
  3. Check test data in mockdata/

GUI Won’t Start

  1. Check platform dependencies installed
  2. Try release build: cargo run --release -p tss-gui
  3. Check logs for errors

Next Steps

Coding Standards

Code style and quality guidelines for Trial Submission Studio.

Rust Style

Formatting

Use rustfmt for all code formatting:

# Check formatting
cargo fmt --check

# Apply formatting
cargo fmt

Linting

All code must pass Clippy with no warnings:

cargo clippy -- -D warnings

Naming Conventions

Crates

  • Lowercase with hyphens: xport, tss-validate
  • Prefix with tss- for project crates

Modules

  • Lowercase with underscores: column_mapping.rs
  • Keep names short but descriptive

Functions

#![allow(unused)]
fn main() {
// Good - descriptive, snake_case
fn calculate_similarity(source: &str, target: &str) -> f64

// Good - verb-noun pattern
fn validate_domain(data: &DataFrame) -> Vec<ValidationResult>

// Avoid - too abbreviated
fn calc_sim(s: &str, t: &str) -> f64
}

Types

#![allow(unused)]
fn main() {
// Good - PascalCase, descriptive
struct ValidationResult {
    ...
}
enum DomainClass {...}

// Good - clear trait naming
trait ValidationRule { ... }
}

Constants

#![allow(unused)]
fn main() {
// Good - SCREAMING_SNAKE_CASE
const MAX_VARIABLE_LENGTH: usize = 8;
const DEFAULT_CONFIDENCE_THRESHOLD: f64 = 0.8;
}

Code Organization

File Structure

#![allow(unused)]
fn main() {
// 1. Module documentation
//! Module description

// 2. Imports (grouped)
use std::collections::HashMap;

use serde::{Deserialize, Serialize};

use crate::model::Variable;

// 3. Constants
const DEFAULT_VALUE: i32 = 0;

// 4. Type definitions
pub struct MyStruct {
    ...
}

// 5. Implementations
impl MyStruct { ... }

// 6. Functions
pub fn my_function() { ... }

// 7. Tests (at bottom or in separate file)
#[cfg(test)]
mod tests {
    ...
}
}

Import Organization

Group imports in this order:

  1. Standard library
  2. External crates
  3. Internal crates
  4. Current crate modules
#![allow(unused)]
fn main() {
use std::path::Path;

use polars::prelude::*;
use serde::Serialize;

use tss_model::Variable;

use crate::mapping::Mapping;
}

Error Handling

Use Result Types

#![allow(unused)]
fn main() {
// Good - explicit error handling
pub fn parse_file(path: &Path) -> Result<Data, ParseError> {
    let content = std::fs::read_to_string(path)?;
    parse_content(&content)
}

// Avoid - panicking on errors
pub fn parse_file(path: &Path) -> Data {
    let content = std::fs::read_to_string(path).unwrap(); // Don't do this
    parse_content(&content).expect("parse failed") // Or this
}
}

Custom Error Types

#![allow(unused)]
fn main() {
use thiserror::Error;

#[derive(Error, Debug)]
pub enum ValidationError {
    #[error("Missing required variable: {0}")]
    MissingVariable(String),

    #[error("Invalid value '{value}' for {variable}")]
    InvalidValue { variable: String, value: String },
}
}

Error Context

#![allow(unused)]
fn main() {
// Good - add context to errors
fs::read_to_string(path)
.map_err( | e| ParseError::FileRead {
path: path.to_path_buf(),
source: e,
}) ?;
}

Documentation

Public Items

All public items must be documented:

#![allow(unused)]
fn main() {
/// Validates data against SDTM rules.
///
/// # Arguments
///
/// * `data` - The DataFrame to validate
/// * `domain` - Target SDTM domain code
///
/// # Returns
///
/// Vector of validation results
///
/// # Example
///
/// ```
/// let results = validate(&data, "DM")?;
/// ```
pub fn validate(data: &DataFrame, domain: &str) -> Result<Vec<ValidationResult>> {
    // ...
}
}

Module Documentation

#![allow(unused)]
fn main() {
//! CSV ingestion and schema detection.
//!
//! This module provides functionality for loading CSV files
//! and automatically detecting their schema.
}

Testing

Test Organization

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_case() {
        // Arrange
        let input = "test";

        // Act
        let result = process(input);

        // Assert
        assert_eq!(result, expected);
    }

    #[test]
    fn test_edge_case() {
        // ...
    }
}
}

Test Naming

#![allow(unused)]
fn main() {
// Good - descriptive test names
#[test]
fn parse_iso8601_date_returns_correct_value() { ... }

#[test]
fn validate_returns_error_for_missing_usubjid() { ... }

// Avoid - vague names
#[test]
fn test1() { ... }
}

Architecture Principles

Separation of Concerns

  • Keep business logic out of GUI code
  • I/O operations separate from data processing
  • Validation rules independent of data loading

Pure Functions

Prefer pure functions where possible:

#![allow(unused)]
fn main() {
// Good - pure function, easy to test
pub fn calculate_confidence(source: &str, target: &str) -> f64 {
    // No side effects, deterministic
}

// Use sparingly - side effects
pub fn log_and_calculate(source: &str, target: &str) -> f64 {
    tracing::info!("Calculating..."); // Side effect
    calculate_confidence(source, target)
}
}

Determinism

Output must be reproducible:

#![allow(unused)]
fn main() {
// Good - deterministic output
pub fn derive_sequence(data: &DataFrame, group_by: &[&str]) -> Vec<i32> {
    // Same input always produces same output
}

// Avoid - non-deterministic
pub fn derive_sequence_random(data: &DataFrame) -> Vec<i32> {
    // Uses random ordering - bad for regulatory compliance
}
}

Performance

Avoid Premature Optimization

Write clear code first, optimize if needed based on profiling.

Use Appropriate Data Structures

#![allow(unused)]
fn main() {
// Good - HashMap for lookups
let lookup: HashMap<String, Variable> =...;

// Good - Vec for ordered data
let results: Vec<ValidationResult> =...;
}

Lazy Evaluation

Use Polars lazy evaluation for large datasets:

#![allow(unused)]
fn main() {
let result = df.lazy()
.filter(col("value").gt(lit(0)))
.collect() ?;
}

Next Steps

Testing

Testing guidelines for Trial Submission Studio contributions.

Test Types

Unit Tests

Test individual functions and methods:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn normalize_column_name_removes_spaces() {
        let result = normalize_column_name("Patient Age");
        assert_eq!(result, "PATIENT_AGE");
    }
}
}

Integration Tests

Test interactions between modules:

#![allow(unused)]
fn main() {
// tests/integration_test.rs
use tss_ingest::CsvReader;
use tss_validate::Validator;

#[test]
fn validate_imported_data() {
    let data = CsvReader::read("tests/data/sample.csv").unwrap();
    let results = Validator::validate(&data, "DM").unwrap();
    assert!(results.errors().is_empty());
}
}

Snapshot Tests

Use insta for output stability:

#![allow(unused)]
fn main() {
use insta::assert_snapshot;

#[test]
fn export_produces_expected_output() {
    let output = export_to_string(&data);
    assert_snapshot!(output);
}
}

Property Tests

Use proptest for edge cases:

#![allow(unused)]
fn main() {
use proptest::prelude::*;

proptest! {
    #[test]
    fn similarity_is_symmetric(a in ".*", b in ".*") {
        let ab = calculate_similarity(&a, &b);
        let ba = calculate_similarity(&b, &a);
        assert!((ab - ba).abs() < 0.001);
    }
}
}

Running Tests

All Tests

cargo test

Specific Crate

cargo test --package xport

Specific Test

cargo test test_name

With Output

cargo test -- --nocapture

Release Mode

cargo test --release

Test Organization

File Structure

crates/tss-validate/
├── src/
│   ├── lib.rs
│   └── rules/
│       └── structural.rs
└── tests/
    ├── structural_rules_test.rs
    └── data/
        └── sample_dm.csv

Inline Tests

For simple unit tests:

#![allow(unused)]
fn main() {
// src/normalize.rs

pub fn normalize(s: &str) -> String {
    s.trim().to_uppercase()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_normalize() {
        assert_eq!(normalize("  hello  "), "HELLO");
    }
}
}

External Tests

For integration tests:

#![allow(unused)]
fn main() {
// tests/validation_integration.rs

use tss_validate::*;

#[test]
fn full_validation_workflow() {
    // Integration test code
}
}

Test Data

Location

Test data files are in:

  • mockdata/ - Shared test datasets
  • crates/*/tests/data/ - Crate-specific test data

Sample Data

STUDYID,DOMAIN,USUBJID,SUBJID,AGE,SEX
ABC123,DM,ABC123-001,001,45,M
ABC123,DM,ABC123-002,002,38,F

Sensitive Data

Never commit real clinical trial data. Use:

  • Synthetic/mock data only
  • Anonymized examples
  • Generated test cases

Writing Good Tests

Structure (AAA Pattern)

#![allow(unused)]
fn main() {
#[test]
fn test_validation_rule() {
    // Arrange - set up test data
    let data = create_test_dataframe();
    let validator = Validator::new();

    // Act - perform the operation
    let results = validator.validate(&data);

    // Assert - verify results
    assert_eq!(results.len(), 1);
    assert_eq!(results[0].severity, Severity::Error);
}
}

Descriptive Names

#![allow(unused)]
fn main() {
// Good
#[test]
fn returns_error_when_usubjid_is_missing() { ... }

#[test]
fn accepts_valid_iso8601_date_format() { ... }

// Avoid
#[test]
fn test1() { ... }

#[test]
fn it_works() { ... }
}

Test Edge Cases

#![allow(unused)]
fn main() {
#[test]
fn handles_empty_dataframe() { ... }

#[test]
fn handles_null_values() { ... }

#[test]
fn handles_unicode_characters() { ... }

#[test]
fn handles_maximum_length_values() { ... }
}

Test Error Conditions

#![allow(unused)]
fn main() {
#[test]
fn returns_error_for_invalid_input() {
    let result = process_file("nonexistent.csv");
    assert!(result.is_err());
}

#[test]
fn error_contains_helpful_message() {
    let err = process_file("bad.csv").unwrap_err();
    assert!(err.to_string().contains("parse error"));
}
}

Snapshot Testing

With Insta

#![allow(unused)]
fn main() {
use insta::assert_snapshot;

#[test]
fn xpt_header_format() {
    let header = generate_header(&metadata);
    assert_snapshot!(header);
}
}

Updating Snapshots

# Review and update snapshots
cargo insta review

CI Testing

Automated Checks

Every PR runs:

  1. cargo test - All tests
  2. cargo clippy - Linting
  3. cargo fmt --check - Formatting

Test Matrix

Tests run on:

  • Ubuntu (primary)
  • macOS (future)
  • Windows (future)

Test Coverage

Goal

Aim for high coverage on critical paths:

  • Validation rules
  • Data transformations
  • File I/O

Not Required

100% coverage isn’t required. Focus on:

  • Business logic
  • Error handling
  • Edge cases

Next Steps

Pull Requests

Guidelines for submitting pull requests to Trial Submission Studio.

Before Creating a PR

Complete Your Changes

  • Code compiles: cargo build
  • Tests pass: cargo test
  • Lints pass: cargo clippy -- -D warnings
  • Formatted: cargo fmt

Commit Guidelines

Conventional Commits

Use conventional commit format:

type(scope): description

[optional body]

[optional footer]

Types

TypeDescription
featNew feature
fixBug fix
docsDocumentation only
testAdding/updating tests
refactorCode refactoring
perfPerformance improvement
choreMaintenance tasks

Examples

git commit -m "feat(validate): add CT validation for SEX variable"
git commit -m "fix(xpt): handle missing values correctly"
git commit -m "docs: update installation instructions"
git commit -m "test(map): add property tests for similarity"
git commit -m "refactor(ingest): simplify schema detection"

Keep PRs Focused

  • One feature or fix per PR
  • Small, reviewable changes
  • Don’t mix refactoring with features

Creating a PR

Push Your Branch

git push origin feature/my-feature

Open PR on GitHub

  1. Go to your fork on GitHub
  2. Click “Pull Request”
  3. Select your branch
  4. Fill in the template

PR Title

Use same format as commits:

feat(validate): add USUBJID cross-domain validation
fix(xpt): correct numeric precision for large values
docs: add API documentation for tss-map

PR Description Template

## Summary

Brief description of what this PR does.

## Changes

- Added X
- Fixed Y
- Updated Z

## Testing

How was this tested?

- [ ] Unit tests added
- [ ] Manual testing performed
- [ ] Tested on: macOS / Windows / Linux

## Related Issues

Fixes #123
Related to #456

## Checklist

- [ ] Code compiles without warnings
- [ ] Tests pass
- [ ] Clippy passes
- [ ] Code is formatted
- [ ] Documentation updated (if needed)

Review Process

What Reviewers Look For

  1. Correctness - Does it work?
  2. Tests - Are changes tested?
  3. Style - Follows coding standards?
  4. Performance - Any concerns?
  5. Documentation - Updated if needed?

Responding to Feedback

  1. Address all comments
  2. Push additional commits
  3. Mark conversations resolved
  4. Request re-review when ready

Acceptable Responses

  • Fix the issue
  • Explain why it’s correct
  • Discuss alternative approaches
  • Agree to follow up in separate PR

After Merge

Clean Up

# Switch to main
git checkout main

# Update from upstream
git pull upstream main

# Delete local branch
git branch -d feature/my-feature

# Delete remote branch (optional, GitHub can auto-delete)
git push origin --delete feature/my-feature

Update Fork

git push origin main

PR Types

Feature PRs

  • Reference the issue or discussion
  • Include tests
  • Update documentation if user-facing

Bug Fix PRs

  • Reference the bug issue
  • Include regression test
  • Explain root cause if complex

Documentation PRs

  • No code changes required
  • Preview locally: mdbook serve
  • Check links work

Refactoring PRs

  • No behavior changes
  • All existing tests must pass
  • Add tests if coverage was low

Tips for Good PRs

Make Review Easy

  • Write clear descriptions
  • Add comments on complex code
  • Break large changes into steps

Be Patient

  • Reviews take time
  • Don’t ping repeatedly
  • Provide more context if asked

Learn from Feedback

  • Feedback improves code quality
  • Ask questions if unclear
  • Apply learnings to future PRs

Automated Checks

CI Pipeline

Every PR runs:

  1. Build - Compilation check
  2. Test - All tests
  3. Lint - Clippy
  4. Format - rustfmt

Required Checks

All checks must pass before merge.

Fixing Failed Checks

# If tests fail
cargo test

# If clippy fails
cargo clippy -- -D warnings

# If format fails
cargo fmt

Emergency Fixes

For critical bugs:

  1. Create PR with hotfix/ prefix
  2. Note urgency in description
  3. Request expedited review

Questions?

  • Ask in PR comments
  • Open a Discussion
  • Reference documentation

Next Steps

macOS Code Signing Setup

This guide explains how to set up Apple Developer certificates for signing and notarizing Trial Submission Studio releases.

Prerequisites

  • Active Apple Developer Program membership ($99/year)
  • macOS with Xcode Command Line Tools installed
  • Access to the GitHub repository settings (for adding secrets)

Step 1: Create Developer ID Application Certificate

1.1 Request Certificate from Apple

  1. Open Keychain Access (Applications → Utilities → Keychain Access)
  2. Go to Keychain Access → Certificate Assistant → Request a Certificate From a Certificate Authority
  3. Fill in:
    • Email Address: Your Apple ID email
    • Common Name: Your name
    • Request is: Saved to disk
  4. Save the .certSigningRequest file

1.2 Create Certificate in Apple Developer Portal

  1. Go to Apple Developer Certificates
  2. Click + to create a new certificate
  3. Select Developer ID Application (NOT “Developer ID Installer”)
  4. Upload your .certSigningRequest file
  5. Download the generated .cer file
  6. Double-click the .cer file to install it in Keychain Access

1.3 Verify Certificate Installation

Run this command to verify the certificate is installed:

security find-identity -v -p codesigning

You should see output like:

1) ABCDEF1234567890... "Developer ID Application: Your Name (TEAM_ID)"

Step 2: Export Certificate for GitHub Actions

2.1 Export as .p12

  1. Open Keychain Access
  2. Find your certificate: “Developer ID Application: Your Name”
  3. Right-click → Export
  4. Choose .p12 format
  5. Set a strong password (you’ll need this later)
  6. Save the file

2.2 Convert to Base64

base64 -i YourCertificate.p12 | pbcopy

This copies the base64-encoded certificate to your clipboard.

Step 3: Create App-Specific Password

Apple requires an app-specific password for notarization (not your regular Apple ID password).

  1. Go to Apple ID Account
  2. Sign in with your Apple ID
  3. Navigate to App-Specific Passwords
  4. Click Generate an app-specific password
  5. Label: “GitHub Actions Notarization”
  6. Copy the generated password (format: xxxx-xxxx-xxxx-xxxx)

Step 4: Find Your Team ID

  1. Go to Apple Developer Account
  2. Click Membership in the left sidebar
  3. Copy your Team ID (10-character alphanumeric string)

Step 5: Configure GitHub Secrets

Go to your repository’s Settings → Secrets and variables → Actions and add these 7 secrets:

Secret NameDescriptionHow to Get
APPLE_DEVELOPER_CERTIFICATE_P12_BASE64Base64-encoded .p12 certificateStep 2.2 output
APPLE_DEVELOPER_CERTIFICATE_PASSWORDPassword you set when exporting .p12Step 2.1
APPLE_CODESIGN_IDENTITYFull certificate namesecurity find-identity -v -p codesigning output
APPLE_NOTARIZATION_APPLE_IDYour Apple ID emailYour Apple Developer email
APPLE_NOTARIZATION_APP_PASSWORDApp-specific passwordStep 3 output
APPLE_DEVELOPER_TEAM_ID10-character Team IDStep 4
CI_KEYCHAIN_PASSWORDRandom secure passwordGenerate any secure string

Example Values

APPLE_CODESIGN_IDENTITY: Developer ID Application: Ruben Talstra (ABCD1234EF)
APPLE_DEVELOPER_TEAM_ID: ABCD1234EF
APPLE_NOTARIZATION_APPLE_ID: your.email@example.com

Local Development

Create App Bundle

cargo build --release
./scripts/macos/create-bundle.sh

Sign Locally (for testing)

./scripts/macos/sign-local.sh

Verify Bundle

./scripts/macos/verify-bundle.sh

Test Gatekeeper

./scripts/macos/test-gatekeeper.sh
open "Trial Submission Studio.app"

Troubleshooting

“No Developer ID Application certificate found”

Ensure the certificate is in your login keychain and not expired:

security find-identity -v -p codesigning

“The signature is invalid”

Re-sign with the --force flag:

codesign --force --options runtime --sign "Developer ID Application: ..." "Trial Submission Studio.app"

“Notarization failed”

Check the notarization log:

xcrun notarytool log <submission-id> --apple-id "..." --password "..." --team-id "..."

Common issues:

  • Missing hardened runtime (--options runtime)
  • Problematic entitlements (JIT, unsigned memory)
  • Unsigned nested code

Security Notes

  • Never commit certificates or passwords to the repository
  • Use GitHub’s encrypted secrets for all sensitive values
  • The app-specific password is NOT your Apple ID password
  • Rotate credentials if you suspect they’ve been compromised

Windows Code Signing Setup

This guide explains how to set up Windows code signing using SignPath Foundation for Trial Submission Studio releases.

Overview

Windows code signing uses Authenticode certificates to sign executables. This eliminates SmartScreen warnings (“Windows protected your PC”) and builds user trust.

We use SignPath Foundation which provides free code signing certificates for open source projects. The certificate is issued to SignPath Foundation, and they vouch for your project by verifying binaries are built from your open source repository.

Prerequisites

  • Open source project with an OSI-approved license
  • GitHub repository with automated builds
  • MFA enabled on both GitHub and SignPath accounts
  • At least one prior release of your application

Step 1: Apply to SignPath Foundation

1.1 Check Eligibility

Your project must meet these criteria:

  1. OSI-approved license - Must use an approved open source license (no dual-licensing)
  2. No malware - No malware or potentially unwanted programs
  3. Actively maintained - Project must be actively maintained
  4. Already released - Must have prior releases in the form to be signed
  5. Documented - Functionality described on download page
  6. All team members use MFA - For both SignPath and GitHub
  7. Automated builds - Build process must be fully automated

1.2 Submit Application

  1. Go to signpath.org/apply
  2. Fill out the application form with your project details
  3. Link your GitHub repository
  4. Wait for approval (typically a few days)

1.3 After Approval

Once approved, you’ll receive:

  • Organization ID
  • Project slug
  • Access to the SignPath dashboard

Step 2: Install SignPath GitHub App

  1. Go to github.com/apps/signpath
  2. Click Install
  3. Select your repository
  4. Grant necessary permissions

Step 3: Configure SignPath Dashboard

3.1 Add GitHub as Trusted Build System

  1. Log in to app.signpath.io
  2. Navigate to your project
  3. Go to Trusted Build Systems
  4. Add GitHub.com as a trusted build system
  5. Link to your repository

3.2 Configure Artifact Format

  1. Go to Artifact Configurations
  2. Create a new configuration or use the default
  3. Set the root element to <zip-file> (GitHub packages artifacts as ZIP)
  4. Configure the PE file signing within the ZIP

Example artifact configuration:


<artifact-configuration xmlns="http://signpath.io/artifact-configuration/v1">
    <zip-file>
        <pe-file path="*.exe">
            <authenticode-sign/>
        </pe-file>
    </zip-file>
</artifact-configuration>

3.3 Create API Token

  1. Go to My ProfileAPI Tokens
  2. Click Create API Token
  3. Name: “GitHub Actions”
  4. Permissions: Submitter role for your project
  5. Copy the token (you won’t see it again!)

Step 4: Configure GitHub Secrets

Go to your repository’s Settings → Secrets and variables → Actions and add these 4 secrets:

Secret NameDescriptionWhere to Find
SIGNPATH_API_TOKENAPI token with submitter permissionsStep 3.3
SIGNPATH_ORGANIZATION_IDYour organization IDSignPath dashboard URL or settings
SIGNPATH_PROJECT_SLUGProject identifierSignPath project settings
SIGNPATH_SIGNING_POLICY_SLUGSigning policy nameSignPath project (typically “release-signing”)

Finding Your IDs

Organization ID: Look at your SignPath dashboard URL:

https://app.signpath.io/Web/YOUR_ORG_ID/...

Project Slug: Found in your project’s URL or settings page.

Signing Policy Slug: Usually release-signing for open source projects.

How It Works

When you push a tag to release:

  1. Build: GitHub Actions builds the unsigned .exe
  2. Upload: The unsigned artifact is uploaded to GitHub
  3. Submit: The SignPath action submits the artifact for signing
  4. Sign: SignPath signs the executable with their certificate
  5. Download: The signed artifact is downloaded back to the workflow
  6. Verify: The workflow verifies the signature is valid
  7. Release: The signed executable is included in the GitHub release

Verification

After signing, users can verify the signature:

Windows

Right-click the .exePropertiesDigital Signatures tab

PowerShell

Get-AuthenticodeSignature "trial-submission-studio.exe"

The publisher will show as SignPath Foundation.

Troubleshooting

“Signing request rejected”

Check the SignPath dashboard for the rejection reason. Common issues:

  • Artifact format doesn’t match configuration
  • Missing permissions on API token
  • Project not linked to GitHub as trusted build system

“API token invalid”

  • Ensure the token has Submitter permissions
  • Check token hasn’t expired
  • Verify the token is for the correct organization

“Artifact not found”

  • Ensure the artifact is uploaded before the signing step
  • Check the artifact ID is correctly passed between steps
  • Verify artifact name matches what SignPath expects

SmartScreen still warns

After signing, SmartScreen warnings should disappear. If they persist:

  • The signature may need time to build reputation
  • Check the certificate is valid in Properties → Digital Signatures
  • Ensure users download from official GitHub releases

Security Notes

  • Never commit API tokens to the repository
  • Use GitHub’s encrypted secrets for all sensitive values
  • SignPath stores keys in HSM (Hardware Security Module)
  • The signing certificate is managed by SignPath Foundation
  • All signing requests are logged and auditable

Cost

SignPath Foundation is free for open source projects that meet the eligibility criteria. There are no hidden fees or limits for qualifying projects.

Resources

Code Signing Policy

Trial Submission Studio uses code signing to ensure authenticity and integrity of distributed binaries.

Attribution

Windows: Free code signing provided by SignPath.io, certificate by SignPath Foundation.

macOS: Signed and notarized with Apple Developer ID.

Linux: Unsigned (standard for AppImage distribution).

Team Roles

Per SignPath Foundation requirements, this project has a single maintainer:

RoleMemberResponsibility
Author@rubentalstraSource code ownership, trusted commits
Reviewer@rubentalstraReview all external contributions
Approver@rubentalstraAuthorize signing requests

All external contributions (pull requests) are reviewed before merging. Only merged code is included in signed releases.

Privacy & Network Communication

See Privacy Policy for full details.

Summary: This application only connects to GitHub when you explicitly request an update check. No clinical data or personal information is ever transmitted.

Build Verification

All signed binaries are:

  • Built from source code in this repository
  • Compiled via GitHub Actions (auditable CI/CD)
  • Tagged releases with full git history
  • Verified with SLSA build provenance attestations

Security Requirements

  • MFA required for SignPath access
  • MFA recommended for GitHub access (best practice)
  • Private signing keys are HSM-protected (SignPath infrastructure)
  • All signing requests are logged and auditable

Verifying Signatures

Windows

Right-click the .exe file → Properties → Digital Signatures tab.

Or use PowerShell:

Get-AuthenticodeSignature "trial-submission-studio.exe"

The publisher should show SignPath Foundation.

macOS

codesign -dv --verbose=4 /Applications/Trial\ Submission\ Studio.app
spctl --assess -vvv /Applications/Trial\ Submission\ Studio.app

Reporting Issues

macOS Gatekeeper Issues

This guide helps resolve common issues when opening Trial Submission Studio on macOS.

“Trial Submission Studio is damaged and can’t be opened”

This error typically means the app is not properly signed or notarized by Apple.

For Users: Quick Fix

If you downloaded from our official GitHub releases and see this error:

  1. Open System SettingsPrivacy & Security
  2. Scroll down to the Security section
  3. Look for a message about “Trial Submission Studio” being blocked
  4. Click Open Anyway
  5. Confirm in the dialog that appears

For Developers: Root Causes

This error can occur when:

  1. App is not code signed - No Developer ID certificate was used
  2. App is not notarized - Apple’s notary service didn’t approve it
  3. Entitlements are too permissive - Certain entitlements can cause rejection
  4. GitHub secrets not configured - CI skipped signing due to missing secrets

“Apple cannot check it for malicious software”

This warning appears for apps that are signed but not notarized.

Workaround

  1. Right-click (or Control+click) the app
  2. Select Open from the context menu
  3. Click Open in the dialog

Note: On macOS Sequoia (15.0+), Control+click bypass no longer works. You must use System Settings → Privacy & Security → Open Anyway.

Verifying App Signature

To check if an app is properly signed:

# Check code signature
codesign --verify --deep --strict --verbose=2 "Trial Submission Studio.app"

# Check notarization
xcrun stapler validate "Trial Submission Studio.app"

# Check Gatekeeper assessment
spctl --assess --type execute --verbose=2 "Trial Submission Studio.app"

Expected output for a properly signed and notarized app:

  • valid on disk from codesign
  • The validate action worked! from stapler
  • accepted from spctl

Removing Quarantine Attribute

If you’re a developer testing the app, you can remove the quarantine attribute:

xattr -d com.apple.quarantine "Trial Submission Studio.app"

Warning: Only do this for apps you trust. This bypasses macOS security.

macOS Sequoia (15.0+) Changes

Apple significantly tightened Gatekeeper in macOS Sequoia:

  • Control+click bypass removed - The old workaround no longer works
  • New bypass path: System Settings → Privacy & Security → Open Anyway
  • Admin password required - You’ll need to authenticate twice
  • spctl --master-disable removed - Can’t globally disable Gatekeeper via terminal

This makes proper code signing and notarization more important than ever.

Reporting Issues

If you downloaded from our official releases and still have issues:

  1. Check the GitHub Releases page
  2. Ensure you downloaded the .dmg file (not the .zip)
  3. Report issues at GitHub Issues

Include:

  • macOS version (sw_vers)
  • Where you downloaded the app from
  • The exact error message
  • Output of codesign --verify --verbose=2 (if possible)

Frequently Asked Questions

Common questions about Trial Submission Studio.

General

What is Trial Submission Studio?

Trial Submission Studio is a free, open-source desktop application for transforming clinical trial source data (CSV) into CDISC-compliant formats like XPT for FDA submissions.

Is my data sent anywhere?

No. Your clinical trial data stays on your computer. Trial Submission Studio works completely offline - all CDISC standards are embedded in the application, and no data is transmitted over the network.

Is Trial Submission Studio free?

Yes! Trial Submission Studio is free and open source, licensed under the MIT License. You can use it commercially without any fees.

Which platforms are supported?

  • macOS (Apple Silicon and Intel)
  • Windows (x86_64 and ARM64)
  • Linux (x86_64)

CDISC Standards

Which CDISC standards are supported?

Currently Supported:

  • SDTM-IG v3.4
  • Controlled Terminology (2024-2025 versions)

Planned:

  • ADaM-IG v1.3
  • SEND-IG v3.1.1

Can I use this for FDA submissions?

Not yet. Trial Submission Studio is currently in alpha development. Our goal is to generate FDA-compliant outputs, but until the software reaches stable release, all outputs should be validated by qualified regulatory professionals before submission.

How often is controlled terminology updated?

Controlled terminology updates are included in application releases. We aim to incorporate new CDISC CT versions within a reasonable time after their official release.

Technical

Do I need SAS installed?

No. Trial Submission Studio is completely standalone and does not require SAS or any other software. It generates XPT files natively.

What input formats are supported?

Currently, Trial Submission Studio supports CSV files as input. The CSV should have:

  • Headers in the first row
  • UTF-8 encoding (recommended)
  • Comma-separated values

What output formats are available?

  • XPT V5 - FDA standard SAS Transport format
  • XPT V8 - Extended SAS Transport (longer names)
  • Dataset-XML - CDISC XML format
  • Define-XML 2.1 - Metadata documentation

How large datasets can it handle?

Trial Submission Studio can handle datasets with hundreds of thousands of rows. For very large datasets (1M+ rows), ensure adequate RAM (8GB+) and consider processing in batches.

Usage

How does column mapping work?

Trial Submission Studio uses fuzzy matching to suggest mappings between your source column names and SDTM variables. It analyzes name similarity and provides confidence scores. You can accept suggestions or map manually.

What happens if validation fails?

Validation errors must be resolved before export. The validation panel shows:

  • Errors (red) - Must fix
  • Warnings (yellow) - Should review
  • Info (blue) - Informational

Each message includes the affected rows and suggestions for fixing.

Can I save my mapping configuration?

Yes, you can save mapping templates and reuse them for similar datasets. This is useful when processing multiple studies with consistent source data structures.

Troubleshooting

The application won’t start on macOS

On first launch, macOS may block the application. Right-click and select “Open”, then click “Open” in the dialog to bypass Gatekeeper.

Import shows garbled characters

Your file may not be UTF-8 encoded. Open it in a text editor and save with UTF-8 encoding, then re-import.

Validation shows many errors

Common causes:

  1. Incorrect domain selection
  2. Wrong column mappings
  3. Data quality issues in source
  4. Controlled terminology mismatches

Review errors one by one, starting with mapping issues.

Export creates empty file

Ensure:

  1. Data is imported successfully
  2. Mappings are configured
  3. No blocking validation errors exist

Development

How can I contribute?

See our Contributing Guide for details. We welcome:

  • Bug reports
  • Feature requests
  • Code contributions
  • Documentation improvements

Where do I report bugs?

Open an issue on GitHub Issues.

Is there a roadmap?

Yes! See our Roadmap for planned features and development priorities.

More Questions?

Glossary

Terms and definitions used in Trial Submission Studio and CDISC standards.

A

ADaM

Analysis Data Model - CDISC standard for analysis-ready datasets derived from SDTM data.

ADSL

ADaM Subject-Level - ADaM dataset containing one record per subject with demographics and key variables.

B

BDS

Basic Data Structure - An ADaM structure used for parameter-based data like vital signs and lab results.

C

CDISC

Clinical Data Interchange Standards Consortium - Organization that develops global data standards for clinical research.

Codelist

A defined set of valid values for a variable. Also known as controlled terminology.

Controlled Terminology (CT)

Standardized sets of terms and codes published by CDISC for use in SDTM and ADaM datasets.

D

Dataset-XML

A CDISC standard XML format for representing tabular clinical data.

Define-XML

An XML standard for describing the structure and content of clinical trial datasets. Required for FDA submissions.

Domain

A logical grouping of SDTM data organized by observation type (e.g., DM for Demographics, AE for Adverse Events).

DM

Demographics - SDTM domain containing one record per subject with demographic information.

E

eCTD

Electronic Common Technical Document - Standard format for regulatory submissions.

F

FDA

Food and Drug Administration - US regulatory agency that requires CDISC standards for drug submissions.

Findings Class

SDTM observation class for collected measurements and test results (e.g., Labs, Vital Signs).

I

ISO 8601

International standard for date and time formats. SDTM uses ISO 8601 format: YYYY-MM-DD.

Interventions Class

SDTM observation class for treatments given to subjects (e.g., Exposure, Concomitant Medications).

M

MedDRA

Medical Dictionary for Regulatory Activities - Standard medical terminology for adverse events.

Metadata

Data that describes other data. In Define-XML, metadata describes dataset structure and variable definitions.

O

ODM

Operational Data Model - CDISC standard for representing clinical data and metadata in XML.

P

PMDA

Pharmaceuticals and Medical Devices Agency - Japanese regulatory agency that requires CDISC standards.

S

SAS Transport (XPT)

File format for SAS datasets used for FDA submissions. See XPT.

SDTM

Study Data Tabulation Model - CDISC standard structure for organizing clinical trial data.

SDTM-IG

SDTM Implementation Guide - Detailed guidance for implementing SDTM, including variable definitions and business rules.

SEND

Standard for Exchange of Nonclinical Data - CDISC standard for nonclinical (animal) study data.

Special Purpose Domain

SDTM domains that don’t fit standard observation classes (e.g., DM, Trial Design domains).

STUDYID

Standard SDTM variable containing the unique study identifier.

U

USUBJID

Unique Subject Identifier - Standard SDTM variable that uniquely identifies each subject across all studies.

V

Variable

An individual data element within a dataset. In SDTM, variables have standard names, labels, and data types.

X

XPT

SAS Transport Format - Binary file format used to transport SAS datasets. Required by FDA for data submissions.

XPT V5

Original SAS Transport format with 8-character variable names.

XPT V8

Extended SAS Transport format supporting 32-character variable names.

Numbers

–DTC Variables

SDTM timing variables containing dates/times in ISO 8601 format (e.g., AESTDTC, VSDTC).

–SEQ Variables

SDTM sequence variables providing unique record identifiers within a domain (e.g., AESEQ, VSSEQ).

–TESTCD Variables

SDTM test code variables in Findings domains (e.g., VSTESTCD, LBTESTCD).

Changelog

All notable changes to Trial Submission Studio.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added

  • Initial mdBook documentation site
  • Comprehensive user guide
  • CDISC standards reference
  • Architecture documentation
  • Contributing guidelines

Changed

  • Updated documentation structure

Fixed

  • Various documentation improvements

0.0.1-alpha.1 - 2024-XX-XX

Added

Core Features

  • CSV file import with automatic schema detection
  • Column-to-SDTM variable mapping with fuzzy matching
  • XPT V5 and V8 export support
  • Basic SDTM validation
  • Controlled terminology validation

Standards Support

  • SDTM-IG v3.4 embedded
  • Controlled Terminology 2024 versions
  • Domain definitions for common SDTM domains

User Interface

  • Native desktop GUI (egui/eframe)
  • Data preview grid
  • Mapping interface with suggestions
  • Validation results panel
  • Export options dialog

Platform Support

  • macOS (Apple Silicon)
  • macOS (Intel)
  • Windows (x86_64)
  • Windows (ARM64)
  • Linux (x86_64)

Known Issues

  • Alpha software - not for production use
  • ADaM support not yet implemented
  • SEND support not yet implemented
  • Dataset-XML export in progress
  • Define-XML export in progress

Version History

VersionDateStatus
0.0.1-alpha.1TBDCurrent

Release Notes Format

Each release includes:

  • Added - New features
  • Changed - Changes to existing features
  • Deprecated - Features to be removed
  • Removed - Removed features
  • Fixed - Bug fixes
  • Security - Security fixes

Getting Updates

Check for Updates

Trial Submission Studio checks for updates automatically. You can also:

  1. Visit GitHub Releases
  2. Download the latest version for your platform
  3. Replace your existing installation

Update Notifications

When a new version is available, you’ll see a notification in the application.

Reporting Issues

Found a bug or have a feature request?


Roadmap

Development plans for Trial Submission Studio.

Note

This roadmap reflects current plans and priorities. Items may change based on community feedback and project needs.

Current Focus

Features actively being developed:

  • Complete SDTM transformation pipeline
  • Dataset-XML export
  • Define-XML 2.1 generation
  • Comprehensive SDTM validation rules
  • Full export workflow

Short-term

Features planned for near-term development:

  • Batch processing (multiple domains)
  • Export templates and presets
  • Improved error messages and validation feedback
  • Session save/restore
  • Mapping templates (save and reuse mappings)

Medium-term

Features planned after core functionality is stable:

  • ADaM (Analysis Data Model) support
  • SUPP domain handling improvements
  • Custom validation rules
  • Report generation
  • Undo/redo functionality improvements

Long-term

Features for future consideration:

  • SEND (Standard for Exchange of Nonclinical Data) support
  • Batch CLI mode for automation
  • Define-XML import (reverse engineering)
  • Plugin system for custom transformations
  • Multi-study support

Completed

Features that have been implemented:

  • Core XPT read/write (V5 + V8)
  • CSV ingestion with schema detection
  • Fuzzy column mapping engine
  • Controlled Terminology validation
  • Desktop GUI (egui/eframe)
  • SDTM-IG v3.4 standards embedded
  • Controlled Terminology (2024-2025)
  • Cross-platform support (macOS, Windows, Linux)

How to Contribute

We welcome contributions! See the Contributing Guide for details.

Working on Roadmap Items

If you’d like to work on a roadmap item:

  1. Check if there’s an existing GitHub Issue
  2. Comment to express interest
  3. Wait for maintainer feedback before starting work
  4. Follow the PR guidelines

Suggesting New Features

Have ideas for the roadmap?

  1. Check existing issues and discussions
  2. Open a new issue or discussion
  3. Describe the feature and use case
  4. Engage with community feedback

Prioritization

Features are prioritized based on:

  1. Regulatory compliance - FDA submission requirements
  2. User impact - Benefit to most users
  3. Complexity - Development effort required
  4. Dependencies - Prerequisites from other features
  5. Community feedback - Requested features

Versioning Plan

VersionFocus
0.1.0Core SDTM workflow stable
0.2.0Define-XML and Dataset-XML
0.3.0ADaM support
1.0.0Production ready

Stay Updated

Disclaimer

Important notices about Trial Submission Studio.

Alpha Software Notice

Warning

Trial Submission Studio is currently in alpha development.

This software is provided for evaluation and development purposes only. It is not yet suitable for production use in regulatory submissions.

What This Means

  • Features may be incomplete or change without notice
  • Bugs and unexpected behavior may occur
  • Data outputs should be independently validated
  • No guarantee of regulatory compliance

Not for Production Submissions

Do not use Trial Submission Studio outputs for actual FDA, PMDA, or other regulatory submissions until the software reaches stable release (version 1.0.0 or later).

Before Submission

All outputs from Trial Submission Studio should be:

  1. Validated by qualified regulatory professionals
  2. Verified against CDISC standards independently
  3. Reviewed for completeness and accuracy
  4. Tested with regulatory authority validation tools

Limitation of Liability

Trial Submission Studio is provided “as is” without warranty of any kind, express or implied. The authors and contributors:

  • Make no guarantees about output accuracy
  • Are not responsible for submission rejections
  • Cannot be held liable for regulatory issues
  • Do not provide regulatory consulting

See the full MIT License for complete terms.

CDISC Standards

Trial Submission Studio implements CDISC standards based on publicly available documentation:

  • SDTM-IG v3.4 - Study Data Tabulation Model Implementation Guide
  • Controlled Terminology - 2024-2025 versions

CDISC standards are developed by the Clinical Data Interchange Standards Consortium. Trial Submission Studio is not affiliated with or endorsed by CDISC.

Regulatory Guidance

This software does not constitute regulatory advice. For guidance on:

Data Privacy

Trial Submission Studio:

  • Processes all clinical data locally on your computer
  • Does not collect usage analytics or telemetry
  • Does not transmit clinical data over the network

Network communication is limited to user-initiated update checks via GitHub API. No clinical data or personal information is ever transmitted.

See our full Privacy Policy for details.

You are responsible for protecting any sensitive or confidential data processed with this software.

Reporting Issues

If you encounter problems:

  1. Do not rely on potentially incorrect outputs
  2. Report issues on GitHub
  3. Validate outputs through independent means

Future Stability

We are actively working toward a stable release. Progress can be tracked on our Roadmap.

VersionStatus
0.x.xAlpha - Not for production
1.0.0+Stable - Production ready

Questions?

Code of Conduct

Our Pledge

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Examples of behavior that contributes to a positive environment:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints and experiences
  • Gracefully accepting constructive criticism
  • Focusing on what is best for the community
  • Showing empathy towards other community members

Examples of unacceptable behavior:

  • The use of sexualized language or imagery and unwelcome sexual attention or advances
  • Trolling, insulting or derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others’ private information without explicit permission
  • Other conduct which could reasonably be considered inappropriate in a professional setting

Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.

Scope

This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue on the GitHub repository or contacting the project maintainers directly.

All complaints will be reviewed and investigated promptly and fairly.

Attribution

This Code of Conduct is adapted from the Contributor Covenant, version 2.1.

Privacy Policy

Trial Submission Studio is designed with privacy as a core principle.

Data Collection

We do not collect any data. Trial Submission Studio:

  • Does not collect usage analytics or telemetry
  • Does not track user behavior
  • Does not collect personal information
  • Does not access or transmit clinical trial data

Local Processing

All clinical data processing occurs entirely on your local computer:

  • Source files (CSV, XPT) are read locally
  • Transformations execute in local memory
  • Output files are written to local storage
  • No data is uploaded to any server

Network Communication

Trial Submission Studio connects to the internet only when you explicitly request it:

ActionDestinationPurpose
Check for Updatesapi.github.comFetch latest release info
Download Updategithub.comDownload new version

Important:

  • Update checks are user-initiated only (not automatic)
  • No clinical data is ever transmitted
  • No personal information is sent
  • All connections use TLS encryption

This complies with SignPath Foundation’s requirement:

“This program will not transfer any information to other networked systems unless specifically requested by the user.”

Third-Party Services

The only third-party service used is GitHub for:

  • Hosting releases and source code
  • Providing update information via GitHub Releases API

For GitHub’s data practices, see: GitHub Privacy Statement

Data Storage

Trial Submission Studio may store the following locally:

DataLocationPurpose
User preferencesOS config directoryRemember settings
Recent files listOS config directoryQuick access
Window stateOS config directoryRestore layout

Storage locations by platform:

  • Windows: %APPDATA%\trial-submission-studio\
  • macOS: ~/Library/Application Support/trial-submission-studio/
  • Linux: ~/.config/trial-submission-studio/

No clinical data is ever stored by the application itself.

Your Responsibilities

You are responsible for:

  • Protecting clinical data on your system
  • Compliance with HIPAA, GxP, 21 CFR Part 11 as applicable
  • Secure storage of source and output files
  • Access control on your computer

Changes to This Policy

Changes will be documented in release notes and this file.

Contact

Questions about privacy: GitHub Discussions