AWS VPC Design Mistakes and How to Fix Them
AWS VPC serves as the core networking layer for AWS resources. Early design choices have a lasting impact on security, availability, scalability, cost efficiency, and operational maintenance, while changes made later often require downtime or increase risk. Below are 11 common VPC mistakes in production environments, along with their issues and fixes drawn from AWS documentation as of December 2025.
Mistake 1: Using the Default VPC for Production
The default VPC, created per region in each account, is for quick testing. It uses public subnets with automatic public IP assignment, which increases exposure and limits control over routing and security. It also encourages mixing workloads, complicating access and auditing.
Fixes:
- Create custom VPCs with specific subnets, route tables, and security settings.
- Disable automatic public IP assignment unless needed for public-facing services.
- Use separate VPCs or accounts for development, staging, and production.
- Consider deleting the default VPC after migrating dependencies, like EC2 instances. If needed, recreate a default VPC manually and reconfigure affected services.
Mistake 2: Poor IP Address and CIDR Planning
Small CIDR blocks, such as a /24 for a VPC, may work well at first but can limit growth over time. Adding secondary CIDRs increases operational complexity and makes address overlaps more likely, which can prevent VPC peering and hybrid connectivity.
Fixes:
- Plan for growth: Use /18–/20 for medium-scale environments and /16–/17 for large environments; since the primary CIDR can’t exceed /16, add secondary CIDRs if more IP addresses are needed.
- Standardize subnets at /24 per AZ.
- Use AWS IPAM for centralized planning, overlap detection, and automated allocation across accounts and regions. IPAM includes monitoring for CIDR exhaustion.
- Enable IPv6 dual-stack (/56 prefixes per VPC) at creation for additional addressing; align with on-premises via IPAM.
Mistake 3: Single Availability Zone Architectures
Single-AZ setups are vulnerable to outages due to single points of failure, so high availability demands span multiple availability zones.
Fixes:
- Create subnets in at least two AZs, preferably three.
- Distribute application tiers across AZs.
- Use Application or Network Load Balancers for traffic routing.
- Enable Multi-AZ for services like RDS.
Mistake 4: Flat Network Design Without Subnet Segmentation
A flat network without subnet segmentation amplifies the impact of errors while complicating routing and access controls.
Fixes:
- Separate public and private subnets per AZ.
- Assign dedicated subnets to tiers (e.g., web, app, database).
- Use route tables for traffic direction.
- Define boundaries based on security and access requirements.
Mistake 5: Overusing Public Subnets
Public subnets connect to Internet Gateways, suitable only for load balancers or gateways. Placing other resources there expands the attack surface.
Fixes:
- Default to private subnets for compute and data.
- Restrict public subnets to inbound services.
- Use NAT Gateways or VPC endpoints for outbound access.
- Audit public IPs regularly using AWS Config or custom scripts.
Mistake 6: Inefficient Internet and AWS Service Access
Overusing NAT Gateways centralizes traffic, which increases both costs and latency. In contrast, many AWS services offer private access options to reduce this dependency.
Fixes:
- Deploy one NAT Gateway per AZ with AZ-specific route tables for availability.
- Use Gateway VPC Endpoints (free) for S3 and DynamoDB.
- Use Interface Endpoints (PrivateLink) for other services; costs include hourly fees plus data transfer.
- For hybrid or zero-trust, consider AWS Private CA for certificate management with endpoints.
- Monitor traffic with CloudWatch and remove unused routes.
Mistake 7: Weak or Inconsistent Security Group Design
Permissive or shared security groups reduce isolation and accumulate unused rules, increasing exposure.
Fixes:
- Apply least-privilege rules per workload.
- Reference other security groups instead of CIDRs when possible.
- Use unique security groups for distinct services.
- Audit and remove unused rules regularly; enable VPC Flow Logs for visibility and integrate with GuardDuty for threat detection.
Mistake 8: Misusing Network ACLs
Because network ACLs are stateless and operate at the subnet level, overusing them can cause unexpected blocks and make troubleshooting more difficult.
Fixes:
- Rely on security groups for most controls.
- Use custom NACLs only for specific compliance needs.
- Keep rules minimal, numbered, and documented.
- Avoid overlaps; use Flow Logs for diagnostics.
Mistake 9: Mixing Multiple Environments in One VPC
Combining environments increases error impact, complicates security, and hinders cost tracking.
Fixes:
- Assign separate VPCs per environment.
- Use separate accounts for better isolation.
- Connect via peering or Transit Gateway as needed; monitor limits like 5,000 attachments per Transit Gateway.
- For complex traffic, use VPC Lattice.
- In multi-account setups, apply Service Control Policies (SCPs) via AWS Organizations to enforce standards like VPC tagging.
Mistake 10: Ignoring DNS and Name Resolution Design
Ad-hoc DNS practices often lead to hardcoded IP addresses, which can fail during infrastructure changes or failover events.
Fixes:
- Use Route 53 private hosted zones.
- Reference services by DNS names.
- Plan resolution for multi-account or hybrid via Route 53 Resolver.
- Validate with network scripts.
Mistake 11: Lack of Standardization and Documentation
Inconsistent designs across regions or accounts raise operational errors and costs.
Fixes:
- Standardize via Infrastructure as Code (e.g., CloudFormation, Terraform).
- Document in repositories or wikis.
- Automate with AWS Control Tower.
- Reference the AWS Well-Architected Framework’s Networking Lens for guidance.
VPC Design Checklist
Category 1264_f83d16-3f> | Items to Verify 1264_d9f37f-63> |
|---|---|
IP Planning 1264_2dbab6-4e> | CIDR sized for growth? IPv6 enabled? IPAM for overlaps? 1264_4b8057-3b> |
HA/Segmentation 1264_2ac1d4-58> | Multi-AZ subnets? Tier isolation? Route tables explicit? 1264_cf4d6f-3e> |
Security 1264_3ebdc1-ce> | Private default? Least-privilege SGs? Flow Logs/GuardDuty on? NACLs minimal? 1264_dd941e-3d> |
Access 1264_893e48-61> | Endpoints over NAT where possible? Public IPs audited? 1264_39a47b-d7> |
Connectivity 1264_fb87b8-a5> | Separate env VPCs/accounts? Lattice/Transit for links? 1264_046e3a-6d> |
Ops 1264_2e97bd-2d> | DNS via Route 53? IaC standardized? Docs current? 1264_ba5ab5-7a> |
Monitoring 1264_66618f-e3> | CloudWatch for traffic/costs? Config rules for compliance? 1264_839866-38> |
Key Takeaways
As scale increases, weaknesses in VPC IP planning, AZ distribution, segmentation, and controls intensify. Focus on multi-AZ, private subnets, least-privilege access, and standardization to reduce risks. Check AWS VPC documentation for details.

Pouya Nourizadeh is the founder of Cloudformix, with extensive experience optimizing enterprise cloud environments across AWS, Azure, and Google Cloud. For years, he has addressed real-world challenges in cloud cost management, performance, and architecture, offering practical insights for engineering teams navigating modern cloud complexities.







