Geek Logbook

Tech sea log book

Running Apache Airflow Across Environments

Apache Airflow has become a de facto standard for orchestrating data workflows. However, depending on the environment, the way Airflow runs can change significantly. Many teams get confused when moving between managed cloud services, local setups, and containerized deployments. This post provides a clear comparison of how Airflow operates in different contexts:


1. Airflow on MWAA (Managed Workflows for Apache Airflow – AWS)

  • Infrastructure: AWS manages the underlying resources (ECS/Fargate, S3, CloudWatch, Secrets Manager).
  • User responsibility: Upload DAGs, configure connections, manage IAM roles.
  • Advantages: Fully managed, secure integration with AWS ecosystem.
  • Limitations: Less flexibility, dependency on AWS services, higher cost for scaling.

2. Airflow on Google Cloud Composer (GCP)

  • Infrastructure: Based on GKE (Google Kubernetes Engine) and integrated with Google Cloud services (GCS, BigQuery, Pub/Sub).
  • User responsibility: DAG development, connections, quotas.
  • Advantages: Native GCP integration, managed scaling.
  • Limitations: Cost and vendor lock-in, limited customization compared to self-managed setups.

3. Airflow on Docker Compose

  • Infrastructure: Single-node deployment using docker-compose.yml.
  • User responsibility: Run locally or on a VM, manage scaling manually.
  • Advantages: Quick setup, great for testing and development.
  • Limitations: Not production-ready for large workloads, single-node bottlenecks.

4. Airflow on Kubernetes

  • Infrastructure: Deployed on Kubernetes clusters, often with the official Helm chart.
  • User responsibility: Manage cluster, scaling, observability, and costs.
  • Advantages: High scalability, flexible architecture, cloud-agnostic.
  • Limitations: Requires Kubernetes expertise, operational overhead.

5. Airflow with Astro CLI

  • Infrastructure: Runs locally with Docker Compose under the hood.
  • User responsibility: DAG development, packaging, and deployment with Astronomer.
  • Advantages: Developer-friendly, abstracts away Compose details, integrates with Astronomer Cloud.
  • Limitations: Tied to Astronomer ecosystem if moving beyond local.

Summary Comparison

EnvironmentInfrastructureWho Manages Infra?Best For
MWAA (AWS)ECS/Fargate, S3, AWS servicesAWSProduction on AWS
Cloud Composer (GCP)GKE + GCP servicesGoogleProduction on GCP
Docker ComposeLocal DockerYouDevelopment / Testing
KubernetesK8s clustersYou / DevOpsLarge-scale production
Astro CLIDocker Compose (wrapped)You locallyLocal dev with Astronomer

Key Takeaways

  • Managed services (MWAA, Cloud Composer) are best when you want less operational overhead and deep cloud integration.
  • Docker Compose and Astro CLI are excellent for local development and testing.
  • Kubernetes provides the most flexibility and scalability, but requires advanced operational maturity.
Tags: