Healthcare Data Processing Platform
HIPAA-compliant HL7 ingestion, anonymization, and multi-tenant portal

End-to-end (Frontend, Backend Pipeline, Anonymization, Infrastructure)
Health Information Exchanges (HIEs) need to process and anonymize large volumes of PHI (Protected Health Information) before sharing with researchers and partners, but manual workflows are slow, error-prone, and don't scale.
This platform enables Health Information Exchanges (HIEs) to ingest raw HL7 v2 data, clean and validate it, anonymize all PHI per HIPAA Safe Harbor standards, and deliver anonymized datasets to customer-facing portals and APIs. The system uses a two-zone architecture: an air-gapped node for PHI processing and a public-facing orchestrator for the web app, dashboards, and exports. It supports multi-tenant isolation, batch job tracking, quality reports, and multiple export formats (clean HL7, FHIR R4, OMOP CSV).
The Challenge
HIEs aggregate health data from dozens of hospitals and clinics, but each facility sends data in different formats with inconsistent quality. Raw HL7 v2 messages often contain duplicates, malformed fields, missing required data, and inconsistent coding systems. Before sharing data with researchers or payers, PHI must be removed or anonymized—a process that, if done manually, is slow and error-prone. The goal was to build an automated pipeline that could ingest, parse, clean, anonymize, and export health data at scale, while maintaining strict HIPAA compliance and multi-tenant isolation.
Technical Architecture
Frontend
- Next.js 16 (App Router)
- React 19
- TypeScript
- TailwindCSS v4
- shadcn/ui
- Recharts
Backend & Pipeline
- Python 3.11
- FastAPI
- python-hl7 (HL7 v2 parsing)
- fhir.resources (FHIR R4)
- pandas, numpy
Data & Storage
- PostgreSQL (Supabase)
- Row-Level Security (RLS)
- Dual-zone DBs (PHI + Anonymized)
Infrastructure
- Docker (Backend + DB)
- Air-gapped Module1 (PHI)
- Orchestrator (Public Portal)
- SCP file transfer
Integrations
- Clerk (Auth)
- Knock (Notifications)
- Stripe (Billing)
- Mapbox (Geo viz)
Key Features Built
- HL7 v2 file/ZIP upload with deduplication by SHA-256 hash and batch job tracking
- Fast bulk loader: Batched INSERT for 100K+ messages with sub-second writes
- HL7 parsing: Extract demographics, encounters, observations, diagnoses, medications, allergies, procedures
- HIPAA Safe Harbor anonymization: Date shifting (deterministic per patient), ID hashing (SHA-256 + salt), ZIP 3-digit masking, synthetic names, PHI detection/validation, audit logs
- FHIR R4 bundle generation: Convert HL7 to FHIR resources (Patient, Encounter, Observation, etc.)
- OMOP CSV export: Map cleaned data to OMOP Common Data Model for research
- Multi-tenant portal: Organizations, hospitals, users, roles, permissions with RLS enforcement
- Quality dashboards: Data completeness, validation errors, processing metrics
- API: REST endpoints for job status, record retrieval, and export downloads
- File watcher service: Monitors delivery folder, transfers via SCP to air-gapped node, creates batch jobs
Technical Challenges & Solutions
HL7 parsing failures on malformed segments.
Built fault-tolerant parser with fallback extraction; logged errors for manual review.
Slow batch inserts for 100K+ messages.
Implemented batched INSERT with psycopg2 execute_batch; reduced ingestion time from minutes to seconds.
Ensuring deterministic anonymization (same patient → same anonymized ID across batches).
Used HMAC-SHA256 with org-specific salt; stored mapping (never synced).
Date shifting while preserving temporal relationships.
Applied consistent shift per patient; validated that event order remained intact.