The Challenge
Our client — a B2B SaaS platform serving 500+ enterprise customers — had watched their AWS bill climb from $120K/month to $450K/month over 18 months. Revenue was growing, but infrastructure costs were growing faster. Finance was raising alarms. Engineering lacked the bandwidth and cloud-cost expertise to diagnose the problem.
What we found:
- AWS monthly spend: $450,000 with no clear cost attribution by service or customer
- Over-provisioned EC2 instances running 24/7 for workloads that only peaked during business hours
- $38,000/month in unused EBS volumes and snapshots from decommissioned services
- No reserved instance strategy — 100% on-demand pricing across all services
- Data transfer costs of $22,000/month due to cross-region architecture decisions made years ago
Our Approach
Phase 1: Comprehensive Cloud Audit (Week 1–2)
We deployed our infrastructure analysis toolkit across their entire AWS organization — 4 accounts, 3 regions, 200+ EC2 instances, 40+ RDS databases, and dozens of S3 buckets.
Every dollar of spend was categorized into three buckets:
- Necessary spend — correctly sized resources serving current load
- Optimizable spend — resources that could be right-sized or reserved
- Waste — resources that should be eliminated
Finding: 45% of their total spend was either optimizable or pure waste.
Phase 2: Quick Wins (Week 2–3)
We prioritized immediate savings that required zero application changes:
- Terminated orphaned resources: Removed 47 unused EBS volumes, 12 idle EC2 instances, and 8 unused Elastic IPs — saving $42,000/month immediately
- Right-sized databases: Downgraded 6 over-provisioned RDS instances based on actual utilization data — saving $18,000/month
- S3 lifecycle policies: Moved 14TB of infrequently accessed data to S3 Glacier — saving $8,000/month
Phase 3: Structural Optimizations (Week 3–8)
The larger savings required architectural changes:
- Reserved Instance Strategy: Purchased 1-year convertible reserved instances for stable workloads, saving 35% vs. on-demand
- Auto-scaling Implementation: Converted 60+ EC2 instances from static to auto-scaling groups with scheduled scaling for business-hours workloads
- Spot Instance Strategy: Moved batch processing and CI/CD workloads to spot instances with automated fallback
- Data Transfer Optimization: Consolidated cross-region communication to reduce data transfer costs by 70%
- Graviton Migration: Migrated eligible workloads to ARM-based Graviton instances for 20% better price-performance
Phase 4: Ongoing Governance (Month 3+)
We implemented automated cost governance to prevent drift:
- Real-time cost anomaly detection with Slack alerts
- Automated tagging enforcement for cost attribution
- Monthly cost optimization reviews with specific action items
- Quarterly architectural reviews to capture new savings opportunities
Results
Within 30 days of engagement start:
- 45% total cost reduction — from $450K/month to $248K/month
- $2.4M annualized savings — verified against 6 months of historical billing data
- 30-day ROI — the client’s savings in month one exceeded any fees
- $0 upfront cost — our gain-share model meant zero risk for the client
- Zero performance impact — p99 latency actually improved by 12% due to right-sizing
- Full cost visibility — every dollar attributed to a specific service, team, and customer cohort
Ongoing Impact
Six months post-engagement, the automated governance system has:
- Prevented $180K in cost drift from new deployments
- Identified and captured an additional $45K/month in savings from new optimization opportunities
- Reduced the time engineers spend on infrastructure cost questions from 8 hours/week to 30 minutes/week