I am having trouble accessing a Cloud SQL instance running Postgres from a GKE cluster using the database's private IP. All the documentation I've found suggests using a VPC-enabled cluster to accomplish this, but I am still having trouble reaching the database.
Specifically, I can reach the database from the nodes in my cluster, but I cannot reach the database from within a container on the node unless I run the docker container using the host's network. This leads me to believe that I have a misunderstanding with how the networking components of a GCP VPC and Kubernetes interact with each other.
VPC
My VPC has one subnet with two secondary ranges:
IP Range: 10.0.0.0/16
Secondary Range - pods: 10.1.0.0/16
Secondary Range - services: 10.2.0.0/16
This is created using the following Terraform configuration:
resource "google_compute_subnetwork" "cluster" {
ip_cidr_range = "10.0.0.0/16"
name = "cluster"
network = google_compute_network.vpc.self_link
secondary_ip_range {
ip_cidr_range = "10.1.0.0/16"
range_name = "pods"
}
secondary_ip_range {
ip_cidr_range = "10.2.0.0/16"
range_name = "services"
}
}
Database
My cloud SQL database is running Postgres 11 and configured to only allow connections via private IP. I have set up a peering connection with a set of global compute addresses to allow access to the Cloud SQL instance from my VPC. In this case I ended up with the following values:
Private Service Connection IP Range: 172.26.0.0/16
Database Private IP: 172.26.0.3
These resources are provisioned with the following Terraform configuration:
resource "google_compute_global_address" "db_private_ip" {
provider = "google-beta"
name = "db-private-ip"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = google_compute_network.vpc.self_link
}
resource "google_service_networking_connection" "db_vpc_connection" {
network = google_compute_network.vpc.self_link
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.db_private_ip.name]
}
resource "google_sql_database_instance" "db" {
depends_on = [google_service_networking_connection.db_vpc_connection]
database_version = "POSTGRES_11"
settings {
availability_type = "ZONAL"
tier = "db-f1-micro"
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.vpc.self_link
}
}
}
Cluster
My GKE cluster is configured to be VPC-native and to use the secondary ranges from the cluster
subnet of the VPC. Some of the relevant cluster information:
Master Version: 1.14.8-gke.17
Network: my-vpc
Subnet: cluster
VPC-native: Enabled
Pod address range: 10.1.0.0/16
Service address range: 10.2.0.0/16
The cluster is created using the following Terraform configuration:
resource "google_container_cluster" "primary" {
location = var.gcp_region
min_master_version = data.google_container_engine_versions.latest_patch.latest_master_version
name = "my-cluster"
network = google_compute_network.vpc.self_link
subnetwork = google_compute_subnetwork.cluster.self_link
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
ip_allocation_policy {
use_ip_aliases = true
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
Connection Attempts
I've made attempts to connect to the database from many different contexts to try to figure out the problem.
Standalone Instance
I spun up a new Ubuntu compute VM in my VPC and was able to connect to the database using both nping
and psql
.
From a Container on a Node
By either using kubectl attach
on a pod in my cluster or SSH-ing into a node and running my own docker command, I see that all packets to the database do not make it.
# SSH-ing and running a docker container.
docker run -it ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'
From a Container on a Node with Host Networking
If I repeat the command from above but use the host's network, I can connect to the database.
docker run -it --net host ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'
Suggestions?
Seeing as most questions about connecting to a Cloud SQL instance from GKE via private IP are solved when they configure their cluster to be VPC-native, I assume my problem lies somewhere in my networking configuration. I would appreciate any suggestions and I'm happy to provide any additional information. Thanks.
Related Questions
Issue Connecting to Cloud SQL Postgres using Private IP from GKE
Update 2019-12-05
Converting the commands from the related question linked above into Terraform (call this the MVP config), I am able to connect to the Postgres instance using a private IP so I now believe the issue lies deeper in my configuration. I still haven't determined which exact piece of my infrastructure differs from the MVP config.
My next attempt will probably be to enhance the MVP config to use a separately configured node pool rather than the default node pool to see if that accounts for the behavior I am seeing.