← Back to Case Studies

Pharma Pipeline Orchestration with Airflow & MWAA

Developed and maintained scalable, automated pipelines for a pharmaceutical analytics platform, utilizing Apache Airflow (MWAA) to orchestrate secure daily and historical data deliveries from de-identified master tables—enabling reliable, privacy-compliant analytics for client overlap studies.

Project Overview

  • Developed modular Python DAGs (Directed Acyclic Graphs) to standardize ETL tasks for all clients.
  • Used YAML-based pipeline configuration for easy modification, scaling, and onboarding new clients and datasets.
  • Built scheduled jobs (daily/historical) that joined de-identified claims and tokenized tables to produce analytic datasets for downstream customer research.

Infrastructure & Security

  • Secure Cross-Account Delivery: Implemented flexible authentication patterns to meet diverse client security requirements:
    • AssumeRole Pattern: Leveraged AWS STS for clients providing specific ARNs, ensuring temporary, least-privilege access.
    • Secrets Manager Integration: Securely stored and rotated client-provided access keys/secrets, accessed dynamically via Airflow connections.
    • Scoped IAM Policies: Created dedicated IAM users for specific clients, restricted strictly to sub-folder paths within isolated S3 buckets to prevent data leakage.
  • Automated Event Scheduling: Leveraged AWS EventBridge to trigger Lambda functions for reliable, unattended data shipments, removing manual intervention.

Results & Value Delivered

  • Enabled daily, automated, and secure data transfer for multiple pharma clients.
  • Simplified onboarding for new engagement—no-code needed for new schedules or data splits.
  • Reduced manual labor and improved compliance and auditability for both teams.
Tech Stack: Apache Airflow (MWAA) Python YAML AWS Lambda AWS S3 EventBridge IAM Policies