← Back to Case Studies

How We Linked 4.8 Billion Records Without Exposing a Single Patient

Our client needed to enable cross-organization analytics on clinical data — but couldn't share any personally identifiable information. We built a privacy-first tokenization pipeline that made it possible.

The Challenge

Pharma analytics requires linking data across organizations to get meaningful insights. But with HIPAA requirements, sharing patient identifiers is a non-starter. The client needed a way to link records across boundaries without ever exposing PII.

What We Built

  • Registered and configured the tokenization application, mapping PII columns and defining output schemas for secure token creation.
  • Worked directly with the tokenization vendor (Datavant) throughout platform configuration and troubleshooting.
  • Developed secure SQL and Snowflake pipelines for partitioned (daily/monthly) tokenization runs with full auditability.
  • Ran comprehensive validation to ensure each token mapped correctly while all PII was properly excluded.
  • Set up centralized token distribution tables for clients and partner platforms.

Data Integrity & Privacy

  • Maintained strict compliance — unique identifiers preserved for research, all required PII excluded.
  • Systematic QA of all outputs, protecting patient privacy while enabling high-value pharma analytics.

Results

  • 4.8 billion records tokenized with full privacy compliance.
  • Reduced implementation friction through collaborative vendor engagement and rapid troubleshooting.
  • Enabled partner overlap analytics across organizational boundaries — unlocking new revenue streams for the client.
Tech: Datavant SQL Snowflake Data Portal Integration