← Back to Case Studies
Pharma Pipeline Orchestration with Airflow & MWAA
Developed and maintained scalable, automated pipelines for a pharmaceutical analytics platform, utilizing Apache Airflow (MWAA) to orchestrate secure daily and historical data deliveries from de-identified master tables—enabling reliable, privacy-compliant analytics for client overlap studies.
Project Overview
- Developed modular Python DAGs (Directed Acyclic Graphs) to standardize ETL tasks for all clients.
- Used YAML-based pipeline configuration for easy modification, scaling, and onboarding new clients and datasets.
- Built scheduled jobs (daily/historical) that joined de-identified claims and tokenized tables to produce analytic datasets for downstream customer research.
Infrastructure & Security
- Secure Cross-Account Delivery: Implemented flexible
authentication patterns to meet diverse client security
requirements:
- AssumeRole Pattern: Leveraged AWS STS for clients providing specific ARNs, ensuring temporary, least-privilege access.
- Secrets Manager Integration: Securely stored and rotated client-provided access keys/secrets, accessed dynamically via Airflow connections.
- Scoped IAM Policies: Created dedicated IAM users for specific clients, restricted strictly to sub-folder paths within isolated S3 buckets to prevent data leakage.
- Automated Event Scheduling: Leveraged AWS EventBridge to trigger Lambda functions for reliable, unattended data shipments, removing manual intervention.
Results & Value Delivered
- Enabled daily, automated, and secure data transfer for multiple pharma clients.
- Simplified onboarding for new engagement—no-code needed for new schedules or data splits.
- Reduced manual labor and improved compliance and auditability for both teams.
Tech Stack: Apache Airflow (MWAA) Python YAML AWS Lambda AWS S3 EventBridge IAM Policies