Healthcare Data Processing Platform

HIPAA-compliant HL7 ingestion, anonymization, and multi-tenant portal

Role

End-to-end (Frontend, Backend Pipeline, Anonymization, Infrastructure)

Problem

Health Information Exchanges (HIEs) need to process and anonymize large volumes of PHI (Protected Health Information) before sharing with researchers and partners, but manual workflows are slow, error-prone, and don't scale.

This platform enables Health Information Exchanges (HIEs) to ingest raw HL7 v2 data, clean and validate it, anonymize all PHI per HIPAA Safe Harbor standards, and deliver anonymized datasets to customer-facing portals and APIs. The system uses a two-zone architecture: an air-gapped node for PHI processing and a public-facing orchestrator for the web app, dashboards, and exports. It supports multi-tenant isolation, batch job tracking, quality reports, and multiple export formats (clean HL7, FHIR R4, OMOP CSV).

The Challenge

HIEs aggregate health data from dozens of hospitals and clinics, but each facility sends data in different formats with inconsistent quality. Raw HL7 v2 messages often contain duplicates, malformed fields, missing required data, and inconsistent coding systems. Before sharing data with researchers or payers, PHI must be removed or anonymized—a process that, if done manually, is slow and error-prone. The goal was to build an automated pipeline that could ingest, parse, clean, anonymize, and export health data at scale, while maintaining strict HIPAA compliance and multi-tenant isolation.

Technical Architecture

Frontend

  • Next.js 16 (App Router)
  • React 19
  • TypeScript
  • TailwindCSS v4
  • shadcn/ui
  • Recharts

Backend & Pipeline

  • Python 3.11
  • FastAPI
  • python-hl7 (HL7 v2 parsing)
  • fhir.resources (FHIR R4)
  • pandas, numpy

Data & Storage

  • PostgreSQL (Supabase)
  • Row-Level Security (RLS)
  • Dual-zone DBs (PHI + Anonymized)

Infrastructure

  • Docker (Backend + DB)
  • Air-gapped Module1 (PHI)
  • Orchestrator (Public Portal)
  • SCP file transfer

Integrations

  • Clerk (Auth)
  • Knock (Notifications)
  • Stripe (Billing)
  • Mapbox (Geo viz)

Key Features Built

  • HL7 v2 file/ZIP upload with deduplication by SHA-256 hash and batch job tracking
  • Fast bulk loader: Batched INSERT for 100K+ messages with sub-second writes
  • HL7 parsing: Extract demographics, encounters, observations, diagnoses, medications, allergies, procedures
  • HIPAA Safe Harbor anonymization: Date shifting (deterministic per patient), ID hashing (SHA-256 + salt), ZIP 3-digit masking, synthetic names, PHI detection/validation, audit logs
  • FHIR R4 bundle generation: Convert HL7 to FHIR resources (Patient, Encounter, Observation, etc.)
  • OMOP CSV export: Map cleaned data to OMOP Common Data Model for research
  • Multi-tenant portal: Organizations, hospitals, users, roles, permissions with RLS enforcement
  • Quality dashboards: Data completeness, validation errors, processing metrics
  • API: REST endpoints for job status, record retrieval, and export downloads
  • File watcher service: Monitors delivery folder, transfers via SCP to air-gapped node, creates batch jobs

Technical Challenges & Solutions

Challenge

HL7 parsing failures on malformed segments.

Solution

Built fault-tolerant parser with fallback extraction; logged errors for manual review.

Challenge

Slow batch inserts for 100K+ messages.

Solution

Implemented batched INSERT with psycopg2 execute_batch; reduced ingestion time from minutes to seconds.

Challenge

Ensuring deterministic anonymization (same patient → same anonymized ID across batches).

Solution

Used HMAC-SHA256 with org-specific salt; stored mapping (never synced).

Challenge

Date shifting while preserving temporal relationships.

Solution

Applied consistent shift per patient; validated that event order remained intact.