Running Apache Airflow Across Environments
Apache Airflow has become a de facto standard for orchestrating data workflows. However, depending on the environment, the way Airflow runs can change significantly. Many teams get confused when moving between managed cloud services, local setups, and containerized deployments. This post provides a clear comparison of how Airflow operates in different contexts:
1. Airflow on MWAA (Managed Workflows for Apache Airflow – AWS)
- Infrastructure: AWS manages the underlying resources (ECS/Fargate, S3, CloudWatch, Secrets Manager).
- User responsibility: Upload DAGs, configure connections, manage IAM roles.
- Advantages: Fully managed, secure integration with AWS ecosystem.
- Limitations: Less flexibility, dependency on AWS services, higher cost for scaling.
2. Airflow on Google Cloud Composer (GCP)
- Infrastructure: Based on GKE (Google Kubernetes Engine) and integrated with Google Cloud services (GCS, BigQuery, Pub/Sub).
- User responsibility: DAG development, connections, quotas.
- Advantages: Native GCP integration, managed scaling.
- Limitations: Cost and vendor lock-in, limited customization compared to self-managed setups.
3. Airflow on Docker Compose
- Infrastructure: Single-node deployment using
docker-compose.yml
. - User responsibility: Run locally or on a VM, manage scaling manually.
- Advantages: Quick setup, great for testing and development.
- Limitations: Not production-ready for large workloads, single-node bottlenecks.
4. Airflow on Kubernetes
- Infrastructure: Deployed on Kubernetes clusters, often with the official Helm chart.
- User responsibility: Manage cluster, scaling, observability, and costs.
- Advantages: High scalability, flexible architecture, cloud-agnostic.
- Limitations: Requires Kubernetes expertise, operational overhead.
5. Airflow with Astro CLI
- Infrastructure: Runs locally with Docker Compose under the hood.
- User responsibility: DAG development, packaging, and deployment with Astronomer.
- Advantages: Developer-friendly, abstracts away Compose details, integrates with Astronomer Cloud.
- Limitations: Tied to Astronomer ecosystem if moving beyond local.
Summary Comparison
Environment | Infrastructure | Who Manages Infra? | Best For |
---|---|---|---|
MWAA (AWS) | ECS/Fargate, S3, AWS services | AWS | Production on AWS |
Cloud Composer (GCP) | GKE + GCP services | Production on GCP | |
Docker Compose | Local Docker | You | Development / Testing |
Kubernetes | K8s clusters | You / DevOps | Large-scale production |
Astro CLI | Docker Compose (wrapped) | You locally | Local dev with Astronomer |
Key Takeaways
- Managed services (MWAA, Cloud Composer) are best when you want less operational overhead and deep cloud integration.
- Docker Compose and Astro CLI are excellent for local development and testing.
- Kubernetes provides the most flexibility and scalability, but requires advanced operational maturity.