March 6, 2023 · 5 min read ·
(Originally authored by Ariel Filotti (Cloud Architect, Zencore) on Medium)
Unlocking the Hidden Method of Outbound Internet Connectivity in Google Cloud VPCs
If you thought you knew everything there is to know about outbound internet connectivity in Google Cloud, think again.
We all know about the usual methods to enable internet connectivity from a Google Cloud VPC:
Attaching a public IP address to a VM instance
Using Cloud NAT or deploying a custom NAT instance that has a public IP address attached.
In this post I will show you that there’s a third, undocumented way of accessing the internet from a Google Cloud VPC, which is to use a Load Balancer’s external IP address as the source IP address for outbound traffic.
Terraform Proof of Concept
I have included a Terraform configuration that deploys two dual-homed instances that simulate a Virtual Network Appliance, External and Internal Load Balancers that forward traffic to these instances on their untrusted and trusted interfaces respectively, and a VM instance representing a workload. You can find the Terraform samples here, and instructions for deployment are included in the Readme file:
The pair of instances are configured to perform outbound NAT for packets received from the Internal Load Balancer using the External Load Balancer’s external IP address as the source of the outbound traffic.
The only external IP address in the project is the one assigned to the External Load Balancer, and there are no Cloud NAT gateways, but the workload VM can still access the internet.
Here’s how our routing table looks:
It’s very simple, our default route to the internet has the internal load balancer as the next hop.
Understanding the Traffic Flow and Routing Mechanisms
This graphic illustrates the traffic flow of a request from an external host to the workload VM. The first packet arrives at the edge router, which forwards it to a Maglev instance (Google’s internal codename for the traffic forwarders used to provide load balancing capabilities, you can read more about it in this white paper).
The Maglev instance forwards the packet to the NAT-1 instance, which performs NAT and forwards the packet to the workload VM. The response packet is routed back to the NAT-1 instance, but instead of going back through the Maglev instance, it uses Direct Server Return (DSR) to send the packet directly to the edge router. The edge router then forwards the packet to the external host.
This is how Network Load Balancing works in GCP. But why does it also work in this case, where the traffic originates from the workload VM and goes through one of the NAT instances to access the internet? How does the request packet reach the edge router, and how is the return packet routed back to the origin?
The answer is that since only one NAT instance is active at a time, we can know for sure that the response packet will be sent to the same NAT instance that sent the request packet. And in this case, the request packet was sent directly to the edge router (using the same mechanism that DSR uses for the reverse traffic flow), and the response packet was received through the Maglev forwarder.
Step-by-Step Guide to Understanding Outbound Traffic Flow
The active NAT instance receives a packet from the workload VM that is destined for 184.108.40.206 on its trusted interface and replaces the source IP address with the Virtual IP of the External Load Balancer (220.127.116.11).
The packet now has public source and destination addresses, and is routed to the internet by the edge router using the same mechanism as Direct Server Return.
The packet reaches its destination, and the destination generates a return packet with 18.104.22.168 as the source and 22.214.171.124 as the destination.
The return packet hits a Maglev forwarder, which doesn’t have any context about it since it did not see the first packet go out. However, it will send the return packet to the same NAT instance that sent the original packet because it’s the only active instance behind the External Load Balancer.
The active NAT instance receives the return packet and performs NAT to replace the destination IP address with the workload VM’s private IP address.
The workload VM receives the return packet and the traffic flow is complete (although technically there is a third packet that is sent by the workload VM to the destination to complete the TCP connection).
Use Cases and Best Practices
There are cases where you might want to have 2 VMs for High Availability that respond on a single external IP address, but you also need outbound traffic to have that same IP address as the source. For example, if you want to terminate IPSEC tunnels on a single IP address and also have High Availability, you need the inbound and outbound IP addresses to be consistent.
If you add an external IP address to each instance, you’ll have to configure IPSEC tunnels to terminate on each IP address, and hope that the network at the end of the tunnels supports dynamic routing.
Benefits and Implementation
Overall, I was surprised when I first discovered this third method of accessing the internet from a GCP VPC. It offers an alternative to the commonly used methods of attaching a public IP address to a VM instance or using a NAT gateway. By using a Load Balancer’s external IP address as the source IP address for outbound traffic, you can deploy Virtual Network Appliances in HA configurations, while also maintaining consistency for the inbound and outbound IP addresses.
I hope this post has helped you understand how this method works and how it can be useful for your projects. Feel free to experiment with it and let me know your thoughts!
Optimize Your Network Performance on Google Cloud Platform with expert guidance from Zencore! Unlock your network’s full potential and ensure maximum availability for your applications - Contact us now.