7

I am having trouble accessing a Cloud SQL instance running Postgres from a GKE cluster using the database's private IP. All the documentation I've found suggests using a VPC-enabled cluster to accomplish this, but I am still having trouble reaching the database.

Specifically, I can reach the database from the nodes in my cluster, but I cannot reach the database from within a container on the node unless I run the docker container using the host's network. This leads me to believe that I have a misunderstanding with how the networking components of a GCP VPC and Kubernetes interact with each other.

VPC

My VPC has one subnet with two secondary ranges:

IP Range: 10.0.0.0/16
Secondary Range - pods: 10.1.0.0/16
Secondary Range - services: 10.2.0.0/16

This is created using the following Terraform configuration:

resource "google_compute_subnetwork" "cluster" {
  ip_cidr_range            = "10.0.0.0/16"
  name                     = "cluster"
  network                  = google_compute_network.vpc.self_link

  secondary_ip_range {
    ip_cidr_range = "10.1.0.0/16"
    range_name    = "pods"
  }

  secondary_ip_range {
    ip_cidr_range = "10.2.0.0/16"
    range_name    = "services"
  }
}

Database

My cloud SQL database is running Postgres 11 and configured to only allow connections via private IP. I have set up a peering connection with a set of global compute addresses to allow access to the Cloud SQL instance from my VPC. In this case I ended up with the following values:

Private Service Connection IP Range: 172.26.0.0/16
Database Private IP: 172.26.0.3

These resources are provisioned with the following Terraform configuration:

resource "google_compute_global_address" "db_private_ip" {
  provider = "google-beta"

  name          = "db-private-ip"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 16
  network       = google_compute_network.vpc.self_link
}

resource "google_service_networking_connection" "db_vpc_connection" {
  network                 = google_compute_network.vpc.self_link
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.db_private_ip.name]
}


resource "google_sql_database_instance" "db" {
  depends_on = [google_service_networking_connection.db_vpc_connection]

  database_version = "POSTGRES_11"

  settings {
    availability_type = "ZONAL"
    tier              = "db-f1-micro"

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.self_link
    }
  }
}

Cluster

My GKE cluster is configured to be VPC-native and to use the secondary ranges from the cluster subnet of the VPC. Some of the relevant cluster information:

Master Version: 1.14.8-gke.17
Network: my-vpc
Subnet: cluster
VPC-native: Enabled
Pod address range: 10.1.0.0/16
Service address range: 10.2.0.0/16

The cluster is created using the following Terraform configuration:

resource "google_container_cluster" "primary" {
  location           = var.gcp_region
  min_master_version = data.google_container_engine_versions.latest_patch.latest_master_version
  name               = "my-cluster"
  network            = google_compute_network.vpc.self_link
  subnetwork         = google_compute_subnetwork.cluster.self_link

  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count       = 1

  ip_allocation_policy {
    use_ip_aliases                = true
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  master_auth {
    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }
}

Connection Attempts

I've made attempts to connect to the database from many different contexts to try to figure out the problem.

Standalone Instance

I spun up a new Ubuntu compute VM in my VPC and was able to connect to the database using both nping and psql.

From a Container on a Node

By either using kubectl attach on a pod in my cluster or SSH-ing into a node and running my own docker command, I see that all packets to the database do not make it.

# SSH-ing and running a docker container.
docker run -it ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'

From a Container on a Node with Host Networking

If I repeat the command from above but use the host's network, I can connect to the database.

docker run -it --net host ubuntu /bin/bash -c 'apt update && apt install -y nmap && nping --tcp -p 5432 172.26.0.3'

Suggestions?

Seeing as most questions about connecting to a Cloud SQL instance from GKE via private IP are solved when they configure their cluster to be VPC-native, I assume my problem lies somewhere in my networking configuration. I would appreciate any suggestions and I'm happy to provide any additional information. Thanks.

Related Questions

Issue Connecting to Cloud SQL Postgres using Private IP from GKE

Update 2019-12-05

Converting the commands from the related question linked above into Terraform (call this the MVP config), I am able to connect to the Postgres instance using a private IP so I now believe the issue lies deeper in my configuration. I still haven't determined which exact piece of my infrastructure differs from the MVP config.

My next attempt will probably be to enhance the MVP config to use a separately configured node pool rather than the default node pool to see if that accounts for the behavior I am seeing.

4 Answers4

2

This is a fully working version for me that:

  • uses GCP GKE VPC-native cluster
  • works with a Private GCP PGSQL instance

resource "google_compute_subnetwork" "gke-subnet" {
  name          = "gke-subnet"
  region        = var.region
  network       = google_compute_network.vpc.name
  ip_cidr_range = "10.10.0.0/16"
}

resource "google_container_cluster" "the_cluster" {
  provider            = google-beta
  name                = "gke"
  project             = var.project_id
  # single-zone cluster
  location            = var.zone
  # we need 1.17.6+ to use NEGs
  # https://cloud.google.com/kubernetes-engine/docs/concepts/ingress
  # min_master_version  = data.google_container_engine_versions.default.latest_master_version
  min_master_version = "1.17.12-gke.2502"

  remove_default_node_pool = true
  initial_node_count       = 1

  # Create a VPC-native GKE cluster instead of route-based cluster
  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.gke-subnet.name
  networking_mode = "VPC_NATIVE"

  ip_allocation_policy {
    cluster_ipv4_cidr_block = "/20"
    services_ipv4_cidr_block = "/20"
  }

  master_auth {
    username = var.gke_username
    password = var.gke_password

    client_certificate_config {
      issue_client_certificate = false
    }
  }
}

# Separately Managed Node Pool
resource "google_container_node_pool" "the_cluster_nodes" {
  name       = "node-pool"
  project    = var.project_id
  # single-zone cluster
  location   = var.zone
  cluster    = google_container_cluster.the_cluster.name
  node_count = var.gke_num_nodes

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      # needed for Container Image pulling
      "https://www.googleapis.com/auth/devstorage.read_only"
    ]

    machine_type = "g1-small"

    tags         = [ "${data.google_project.project.name}-gke" ]
    metadata = {
      disable-legacy-endpoints = "true"
    }
  }
}

rantoniuk
  • 131
  • 3
1

There are specific network requirements Cloud SQL instances must adhere to when communicating via a private connection. One of which is that your CloudSQL and GKE instances are located in the same region and VPC network. [1]

Regarding "I cannot reach the database from within a container on the node", does this mean you have your database and container located in different networks? If so, you cannot access a Cloud SQL instance on its private IP address from another network using a Cloud VPN tunnel, instance based VPN, or Cloud interconnect.

[1] https://cloud.google.com/sql/docs/mysql/private-ip#network_requirements.

Philippe
  • 46
  • 2
  • It is my understanding that Cloud SQL instances are never on your network; they exist on a network managed by Google that you connect to via a peering connection. Either way, I can connect to the database from my nodes, but not from pods on those nodes, so I don't think this is my problem. – Chathan Driehuys Nov 19 '19 at 21:39
  • Can you provide the error message you’re receiving when you connect from the nodes and also have you tried [this](https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine) without terraform? – Milad Tabrizi Nov 22 '19 at 01:14
1

I have this working with the following terraform VPC setup, the main dif I can see to the OP set up is I am defining a top level network see below {"google_compute_network" "gke-sql-vpc-impl"} as opposed to the OPs use of 'google_compute_network.vpc.self_link'

variable "public-subnet-cidr" {
default = "10.1.0.0/24"
}
resource "google_compute_network" "gke-sql-vpc-impl" {
name                    = "${var.network}"
auto_create_subnetworks = "false"
}

resource "google_compute_global_address" "mysql-private-ip-address-impl" {
name          = "mysql-private-ip-address"
purpose       = "VPC_PEERING"
address_type  = "INTERNAL"
prefix_length = 16
network       = "${google_compute_network.gke-sql-vpc-impl.name}"
}

resource "google_service_networking_connection" "private-mysql-vpc-connection-impl" {
network                 = "${google_compute_network.gke-sql-vpc-impl.self_link}"
service                 = "servicenetworking.googleapis.com"
reserved_peering_ranges = ["${google_compute_global_address.mysql-private-ip-address-impl.name}"]
}

resource "google_compute_subnetwork" "public-subnet-impl" {
name          = "${var.network}-public-subnet"
ip_cidr_range = "${var.public-subnet-cidr}"
network       = "${var.network}"
depends_on    = ["google_compute_network.gke-sql-vpc-impl"]
region        = "${var.region}"

secondary_ip_range {
ip_cidr_range = "10.2.0.0/16"
range_name    = "pods"
}

secondary_ip_range {
ip_cidr_range = "10.3.0.0/16"
range_name    = "services"
 }
}

with this VPC I can connect from a pod to the Cloud SQL instance using the private IP above "mysql-private-ip-address-impl". Also I have firewall rules set up for tcp and the Cloud SQL instance database port tagged to the cluster nodes.

resource "google_container_cluster" "primary" {
name                     = "${var.cluster_name}"
location                 = "${var.zone}"
remove_default_node_pool = false
initial_node_count       = "${var.node_count_simple}"
network            = "${google_compute_network.gke-sql-vpc-impl.name}"
subnetwork         = "${google_compute_subnetwork.public-subnet-impl.name}"  

ip_allocation_policy {
cluster_secondary_range_name  = "pods"
services_secondary_range_name = "services"
}

node_config {
machine_type = "${var.pool_machine_type}"
preemptible  = true
oauth_scopes = [
  "https://www.googleapis.com/auth/compute",
  "https://www.googleapis.com/auth/devstorage.read_only",
  "https://www.googleapis.com/auth/logging.write",
  "https://www.googleapis.com/auth/monitoring"
 ]

 tags = ["default-nodeport-http", "default-nodeport-https", "default-firewall-mysql"]
  }

 master_auth {
 username = ""
 password = ""
 client_certificate_config {
  issue_client_certificate = false
  }
 }
}
Nigel Savage
  • 119
  • 3
0

Containers work in the same way when it comes to network as the VMs (Host & Guest). The example of virtualbox shows different network types https://www.nakivo.com/blog/virtualbox-network-setting-guide/ and they are very helpful in other network scenarios like for example containerisation. What you have with GKE is I believe a Internal Network so you should use iptables on node. In GCP for example this stuff is used for NAT Gateways when creating a NAT instance which provides Internet access for all other VMs.

And also , node should be in the same region as the CloudSQL because otherwise using private IP will not work. P.S. If you thinking on forcing SSL in the future in your CloudSQL instance, don't do it unless you wanna lose private connectivity permanently. I've just raised a ticket to GCP Support as I consider this as a Bug.

eset
  • 1