Healthcare Data Processing Platform

HIPAA-compliant HL7 ingestion, anonymization, and multi-tenant portal

Role

End-to-end (Frontend, Backend Pipeline, Anonymization, Infrastructure)

Problem

Health Information Exchanges (HIEs) need to process and anonymize large volumes of PHI (Protected Health Information) before sharing with researchers and partners, but manual workflows are slow, error-prone, and don't scale.

This platform enables Health Information Exchanges (HIEs) to ingest raw HL7 v2 data, clean and validate it, anonymize all PHI per HIPAA Safe Harbor standards, and deliver anonymized datasets to customer-facing portals and APIs. The system uses a two-zone architecture: an air-gapped node for PHI processing and a public-facing orchestrator for the web app, dashboards, and exports. It supports multi-tenant isolation, batch job tracking, quality reports, and multiple export formats (clean HL7, FHIR R4, OMOP CSV).

The Challenge

HIEs aggregate health data from dozens of hospitals and clinics, but each facility sends data in different formats with inconsistent quality. Raw HL7 v2 messages often contain duplicates, malformed fields, missing required data, and inconsistent coding systems. Before sharing data with researchers or payers, PHI must be removed or anonymized—a process that, if done manually, is slow and error-prone. The goal was to build an automated pipeline that could ingest, parse, clean, anonymize, and export health data at scale, while maintaining strict HIPAA compliance and multi-tenant isolation.

Technical Architecture

Frontend

Next.js 16 (App Router)
React 19
TypeScript
TailwindCSS v4
shadcn/ui
Recharts

Backend & Pipeline

Python 3.11
FastAPI
python-hl7 (HL7 v2 parsing)
fhir.resources (FHIR R4)
pandas, numpy

Data & Storage

PostgreSQL (Supabase)
Row-Level Security (RLS)
Dual-zone DBs (PHI + Anonymized)

Infrastructure

Docker (Backend + DB)
Air-gapped Module1 (PHI)
Orchestrator (Public Portal)
SCP file transfer

Integrations

Clerk (Auth)
Knock (Notifications)
Stripe (Billing)
Mapbox (Geo viz)

Key Features Built

HL7 v2 file/ZIP upload with deduplication by SHA-256 hash and batch job tracking
Fast bulk loader: Batched INSERT for 100K+ messages with sub-second writes
HL7 parsing: Extract demographics, encounters, observations, diagnoses, medications, allergies, procedures
HIPAA Safe Harbor anonymization: Date shifting (deterministic per patient), ID hashing (SHA-256 + salt), ZIP 3-digit masking, synthetic names, PHI detection/validation, audit logs
FHIR R4 bundle generation: Convert HL7 to FHIR resources (Patient, Encounter, Observation, etc.)
OMOP CSV export: Map cleaned data to OMOP Common Data Model for research
Multi-tenant portal: Organizations, hospitals, users, roles, permissions with RLS enforcement
Quality dashboards: Data completeness, validation errors, processing metrics
API: REST endpoints for job status, record retrieval, and export downloads
File watcher service: Monitors delivery folder, transfers via SCP to air-gapped node, creates batch jobs

Technical Challenges & Solutions

Challenge

HL7 parsing failures on malformed segments.

Solution

Built fault-tolerant parser with fallback extraction; logged errors for manual review.

Challenge

Slow batch inserts for 100K+ messages.

Solution

Implemented batched INSERT with psycopg2 execute_batch; reduced ingestion time from minutes to seconds.

Challenge

Ensuring deterministic anonymization (same patient → same anonymized ID across batches).

Solution

Used HMAC-SHA256 with org-specific salt; stored mapping (never synced).

Challenge

Date shifting while preserving temporal relationships.

Solution

Applied consistent shift per patient; validated that event order remained intact.

←AI Document Review & Q&A Portal Civic Outreach & Donor Operations Platform→