Multi-cloud migration: 10 years on AWS, Azure, GCP

A practitioner's playbook for migrating on-prem workloads to the cloud — when to choose AWS vs Azure vs GCP, the migration sequence, security, and cost lessons.

Over the last decade I've moved hundreds of workloads from on-premise data centers to the cloud — sometimes AWS, sometimes Azure, sometimes GCP, often all three for the same company. Multi-cloud isn't a strategy you choose; it's a state you arrive at, usually because clients have preferences and procurement doesn't care about your architecture diagram.

This is what I've actually learned about doing it well.

When to choose which cloud

Forget the marketing. Here's the practitioner's filter:

AWS — default for greenfield. Deepest service catalog, most mature IAM, best-in-class networking primitives (VPC, Transit Gateway). The downside: it's the most expensive at list price and the easiest to over-architect. Choose AWS when you want maximum service breadth and you have the engineering talent to make non-trivial choices.

Azure — default if the company already runs Microsoft (Active Directory, Exchange, Office 365, Dynamics, Windows Server). The identity story alone is worth it: Azure AD (now Entra) connects every Microsoft tool seamlessly. The downside: portal UX is uneven and CLI tooling lags AWS. Choose Azure when Microsoft is already in your stack — the integration leverage is real.

GCP — best for data + ML workloads (BigQuery is genuinely better than Redshift or Synapse), best Kubernetes (GKE invented many of the patterns), and often cheapest on networking egress. The downside: thinner enterprise sales and support, smaller third-party tooling ecosystem. Choose GCP when data is the workload, not when compute is.

At Virtual Employee we ended up running all three because clients demanded each. The lesson: don't pick a winning cloud; pick a winning operating model that works across all three.

The migration sequence that actually works

I've seen this exact sequence work across dozens of migrations. Don't deviate without reason.

Phase 1 — Discovery (weeks 1–4)

Inventory every workload. What runs where, with what dependencies, on what hardware, with what SLA. Use a tool (CloudEndure, Carbonite Migrate, or Azure Migrate's discovery agent) or do it manually for under 50 servers.
Tag workloads by migration archetype:
- Retire — workloads nobody uses
- Retain — workloads that must stay on-prem (compliance, latency, hardware)
- Rehost (lift-and-shift) — straight VM-to-cloud-VM
- Replatform — small changes (e.g., MySQL on EC2 → RDS)
- Refactor — meaningful rewrite (monolith → microservices, or → managed PaaS)
- Repurchase — replace with SaaS (Exchange → Microsoft 365)
Discover the surprise dependencies. Every migration has them: a database your CRM secretly hits, a file share you didn't know existed, a scheduled task on a desktop somewhere.

Phase 2 — Foundation (weeks 4–8)

Don't migrate a single workload until the foundation is right.

Account / subscription structure. Multi-account on AWS (one per environment, or per business unit); Management Groups + Subscriptions on Azure. Don't start in a single account; the cost of restructuring later is brutal.
Networking. VPC / VNet design with public, private, and isolated subnets per environment. Transit Gateway / vNet peering / VPC peering for inter-account traffic. Don't allow direct internet egress from private subnets — route through NAT and inspect.
Identity. SSO from your existing AD into the cloud (AWS IAM Identity Center / Azure AD-to-AWS / Azure AD as primary on Azure). Never create long-lived IAM users; use roles + SSO + temporary credentials.
Logging & monitoring. CloudTrail / Activity Log on from day 0. Send everything to a central log archive account. Set up alerting on root-account usage immediately.
Cost guardrails. Budgets, anomaly detection (AWS Cost Anomaly Detection, Azure Cost Management). Tagging policy — every resource tagged with owner, environment, cost center.

Phase 3 — Pilot (weeks 8–12)

Pick the second-most-important workload for the pilot. Not the most important (too risky), not the least important (no lessons). The second-most teaches you the patterns without breaking the business.

For a typical SMB the pilot is the corporate website + CMS. For a B2B platform, it's the dev/staging environment of the main app.

Run the pilot end-to-end including:

Production-equivalent traffic test
Failover test
Rollback test (yes, test the rollback — you'll likely need it)
Cost review at one week of operation

Phase 4 — Production migration (months 3–12)

Now batch the rest. Migration waves of 10–30 workloads per wave. Each wave:

Pre-migration: full snapshot/backup, communication to stakeholders, change window scheduled
Migration: data sync (CloudEndure, AWS DMS, Azure Migrate), application cutover, DNS flip
Post-migration: smoke test, monitoring, sign-off

Phase 5 — Optimization (ongoing)

Most teams stop at "it works in cloud." That's where the real waste begins. Optimization is continuous:

Reserved Instances / Savings Plans / Reserved VMs — 30–70% cost reduction on steady workloads
Spot / Preemptible instances — for batch and non-critical workloads
Right-sizing — most lift-and-shift workloads are 2–3× over-provisioned
Storage tiering — S3 lifecycle policies, Azure Cool/Archive storage
Egress optimization — Cloudflare or CloudFront in front of S3/Blob to reduce egress costs

Hosting and CDN architecture across clouds

For B2B web properties, my default pattern across all three clouds:

Users → Cloudflare (CDN + WAF + DDoS) → Cloud Load Balancer
   → Application (EC2 / Azure VM / GKE) → RDS / Azure SQL / Cloud SQL
                                       → Cache layer (ElastiCache / Azure Cache / Memorystore)
                                       → Object storage (S3 / Blob / GCS)

Why Cloudflare in front of all three? Cloudflare is the deduplicating layer. Same CDN, same WAF, same DDoS protection regardless of origin cloud. Your security and CDN teams operate one tool, not three. The cloud-native CDNs (CloudFront, Azure Front Door, Cloud CDN) are fine, but they're per-cloud, which fragments your operations.

The only exception: serverless / edge use cases where CloudFront + Lambda@Edge or Azure Front Door + Functions gives you tight cloud-native integration that's worth the lock-in.

Identity across multi-cloud — the actual hard part

The biggest operational pain in multi-cloud is identity. Here's the pattern that works:

One identity provider for humans. Azure AD / Entra is what I usually pick because most enterprises already have it. Okta also fine.
Federate that IDP into every cloud — AWS IAM Identity Center accepts Entra as upstream; GCP Workload Identity Federation accepts Entra; Azure obviously native.
No long-lived cloud IAM users. Ever. All human access through SSO; all programmatic access through workload identity federation or short-lived tokens.
Per-cloud role mapping. Define roles in your IDP (e.g., aws-prod-admin, azure-prod-readonly) and map them to cloud-native roles.
MFA everywhere. Conditional access policies in Entra; AWS SSO MFA; GCP context-aware access.

This identity architecture is the single highest-ROI investment in multi-cloud security. It eliminates the #1 cause of cloud breaches (leaked long-lived credentials).

Security across multi-cloud

Three things to standardize across all three clouds:

Encryption at rest — KMS / Key Vault / Cloud KMS, with key rotation policies. Customer-managed keys for production.
Network segmentation — public/private/isolated subnet pattern repeated identically.
Logging into a central SIEM — Splunk, Sentinel, Sumo Logic, or self-hosted. Cloud-native (CloudWatch / Log Analytics / Cloud Logging) is fine for cloud-internal use, but for cross-cloud correlation you need a central SIEM.

Cost lessons learned the hard way

After a decade of cloud spend, the patterns that matter:

List price is fictional. You should be paying 30–50% less than list for steady workloads via RIs/SPs/Reserved VMs.
Egress is the silent killer. S3-to-internet is $0.05–$0.09/GB. At 10 TB/month that's $500–$900. Multiply by ten and it's the difference between profitable and not.
Unattached resources. Every cloud account has detached EBS volumes, unused Elastic IPs, idle load balancers. Run a weekly garbage collector.
Dev/staging on schedules. Dev environments should auto-shut down nights and weekends. Easy 50% saving.
Watch the bill weekly, not monthly. Anomalies are 10× cheaper to catch within 7 days than within 30.

What I'd avoid

Lift-and-shift the entire data center. It rarely captures cloud's real value. Always combine lift-and-shift with at least 20% replatform/refactor work.
Greenfield Kubernetes for everything. K8s is the right answer for ~30% of workloads. Don't force the other 70%.
Multi-cloud for "redundancy." Most workloads don't need it; the operational overhead of true active-active multi-cloud is brutal. Single-cloud with multi-region usually solves the same problem cheaper.
Choosing the cloud before the workload. Match cloud to workload, not workload to cloud.

Use cases — what multi-cloud actually solved

Use case 1 — A client's mandatory Azure migration

A long-term UK financial-services client moved their entire stack from on-prem to Azure in 2018 and required all our agents working on their account to access their environment from Azure-native client VMs. We:

Built a parallel Azure subscription specifically for this client's delivery
Deployed jump-host VMs in the client's region (UK South) with conditional-access policies
Federated identity from our Active Directory to their Azure AD tenant via ADFS
Configured Azure Bastion for agent access (no exposed RDP)
Wrote runbooks for our IT helpdesk on the new flow

The migration took 14 weeks. After it completed, the client's compliance team specifically noted that our Azure-native delivery posture was the cleanest among their offshore partners. We won an additional engagement from the same client three months later partly on the basis of this architecture.

Use case 2 — VPN scaling for COVID, hosted on AWS

When we needed to scale VPN capacity from 400 to 1,500+ concurrent sessions in 24 hours during the COVID lockdown (full story here), AWS was the answer. We spun up a second VPN edge on AWS (Fortinet AMI from the marketplace), configured BGP between it and the primary on-prem appliance, and had it taking overflow traffic within 12 hours. AWS's pay-as-you-go elasticity turned what could have been a multi-week capacity expansion into an overnight scaling event.

Use case 3 — GCP for client data warehousing

A US e-commerce client wanted us to handle their data-analytics offshore work but required all data to stay in GCP (because their pipelines were already there). We built:

A dedicated GCP project for their delivery
Workforce Identity Federation from our Azure AD into GCP for SSO
BigQuery datasets with row-level security
Workflow templates in Dataform for repeatable transformations
A small Cloud Run service for ad-hoc data exports

The setup took 6 weeks. GCP's BigQuery was genuinely better than the alternative we'd have built on AWS or Azure for this workload. Sometimes the right cloud is the client's cloud.

Use case 4 — A six-figure cost saving via right-sizing on AWS

After the initial COVID-driven cloud expansion, we audited AWS spend in late 2020. We found:

30+ EBS volumes attached to nothing
12 EC2 instances at 5% average CPU
Old AMIs and snapshots from years prior
A staging environment running 24/7 that only needed business hours

Cleanup + right-sizing + reserved instances reduced monthly AWS spend by ~$15k/month. Every cloud account leaks money. A weekly cleanup discipline pays for itself.

Use case 5 — Cross-cloud identity as the unsung hero

The single highest-ROI investment we made across multi-cloud was identity federation. One Azure AD identity for every employee, mapped to AWS IAM Identity Center roles, GCP Workload Identity, and the client portals via SAML. When an employee left, deactivation in Azure AD propagated to every cloud and every connected SaaS within minutes. No more "did we revoke their AWS access?" anxiety. This is the kind of unglamorous architecture work that you only appreciate after an offboarding goes smoothly.

The meta-point

After a decade, my honest assessment: the cloud you pick matters less than the operating model you build around it. A team that has good identity, good cost discipline, good observability, and good IaC will succeed on any cloud. A team without those will struggle equally on all three.

Build the operating model first. The cloud follows.