I have following configuration of External Google Cloud Load Balancer:

Diagram of the load balancer

  • GlobalNetworkEndpointGroupToClusterByIp is Internet NEG with type INTERNET_IP_PORT pointing to Kubernetes cluster's IP.
  • GlobalNetworkEndpointGroupToManagedS3 is Internet NEG with type INTERNET_FQDN_PORT pointing to managed by Yandex S3 service.

For some reason some backend services fail to work and when I'm trying to connect to them they response with HTML page showing 502 Server Error:

Error: Server Error

The server encountered a temporary error and could not complete your request.

Please try again in 30 seconds.

In failed backend service logs there are always following errors:

jsonPayload: {
  cacheId: "GRU-c0ee45d8"
  @type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
  statusDetails: "failed_to_pick_backend"

Requests to backend services fail in 1ms (as noted in logs), so it seems like they don't even try to connect to my Kubernetes cluster's IP or Managed S3 and fail instantly.

At the moment of posting this question S3 and Imgproxy backend services are in good condition, but others are not working:

Uptime status

If I re-deploy everything, some other services may fail, for example:

  • API and Docs will work, others will fail
  • API, Docs, FPS and Imgproxy will work, S3 will fail
  • S3 will work, others will fail

So it's absolutely random and I can't understand why it happens. If I will be very lucky enough, after re-deployment all backend services will work well. Also it's possible neither of them will work.

Kubernetes cluster works, it accept connections, Managed S3 works well too. It looks like a bug, but I couldn't find anything about this in Google.

Here's how my Terraform configuration looks:

resource "google_compute_global_network_endpoint_group" "kubernetes-cluster" {
  name                  = "kubernetes-cluster-${var.ENVIRONMENT_NAME}"
  network_endpoint_type = "INTERNET_IP_PORT"

  depends_on = [

resource "google_compute_global_network_endpoint" "kubernetes-cluster" {
  global_network_endpoint_group = google_compute_global_network_endpoint_group.kubernetes-cluster.name
  port                          = 80
  ip_address                    = yandex_vpc_address.kubernetes.external_ipv4_address.0.address

resource "google_compute_global_network_endpoint_group" "s3" {
  name                  = "s3-${var.ENVIRONMENT_NAME}"
  network_endpoint_type = "INTERNET_FQDN_PORT"

resource "google_compute_global_network_endpoint" "s3" {
  global_network_endpoint_group = google_compute_global_network_endpoint_group.s3.name
  port                          = 443
  fqdn                          = trimprefix(local.s3.endpoint, "https://")

resource "google_compute_backend_service" "s3" {
  name = "s3-${var.ENVIRONMENT_NAME}"

  backend {
    group = google_compute_global_network_endpoint_group.s3.self_link

  custom_request_headers = [

  cdn_policy {
    cache_key_policy {
      include_host         = true
      include_protocol     = false
      include_query_string = false

  enable_cdn            = true
  load_balancing_scheme = "EXTERNAL"

  log_config {
    enable      = true
    sample_rate = 1.0

  port_name   = "https"
  protocol    = "HTTPS"
  timeout_sec = 60

resource "google_compute_backend_service" "imgproxy" {
  name = "imgproxy-${var.ENVIRONMENT_NAME}"

  backend {
    group = google_compute_global_network_endpoint_group.kubernetes-cluster.self_link

  cdn_policy {
    cache_key_policy {
      include_host         = true
      include_protocol     = false
      include_query_string = false

  enable_cdn            = true
  load_balancing_scheme = "EXTERNAL"

  log_config {
    enable      = true
    sample_rate = 1.0

  port_name   = "http"
  protocol    = "HTTP"
  timeout_sec = 60

resource "google_compute_backend_service" "api" {
  name = "api-${var.ENVIRONMENT_NAME}"

  custom_request_headers = [

  backend {
    group = google_compute_global_network_endpoint_group.kubernetes-cluster.self_link

  load_balancing_scheme = "EXTERNAL"

  log_config {
    enable      = true
    sample_rate = 1.0

  port_name   = "http"
  protocol    = "HTTP"
  timeout_sec = 60

resource "google_compute_backend_service" "front" {
  name = "front-${var.ENVIRONMENT_NAME}"

  backend {
    group = google_compute_global_network_endpoint_group.kubernetes-cluster.self_link

  cdn_policy {
    cache_key_policy {
      include_host         = true
      include_protocol     = false
      include_query_string = true

  enable_cdn            = true
  load_balancing_scheme = "EXTERNAL"

  log_config {
    enable      = true
    sample_rate = 1.0

  port_name   = "http"
  protocol    = "HTTP"
  timeout_sec = 60

resource "google_compute_url_map" "default" {
  name            = "default-${var.ENVIRONMENT_NAME}"
  default_service = google_compute_backend_service.front.self_link

  host_rule {
    hosts = [
    path_matcher = "api"


  host_rule {
    hosts = [
    path_matcher = "s3"

  host_rule {
    hosts = [
    path_matcher = "imgproxy"

  path_matcher {
    default_service = google_compute_backend_service.api.self_link
    name            = "api"

  path_matcher {
    default_service = google_compute_backend_service.s3.self_link
    name            = "s3"

  path_matcher {
    default_service = google_compute_backend_service.imgproxy.self_link
    name            = "imgproxy"

  test {
    host    = local.hosts.docs
    path    = "/"
    service = google_compute_backend_service.front.self_link

  test {
    host    = local.hosts.api
    path    = "/"
    service = google_compute_backend_service.api.self_link

  test {
    host    = local.hosts.fps
    path    = "/"
    service = google_compute_backend_service.api.self_link

  test {
    host    = local.hosts.s3
    path    = "/"
    service = google_compute_backend_service.s3.self_link

  test {
    host    = local.hosts.imgproxy
    path    = "/"
    service = google_compute_backend_service.imgproxy.self_link

# See: https://github.com/hashicorp/terraform-provider-google/issues/5356
resource "random_id" "managed-certificate-name" {
  byte_length = 4
  prefix      = "default-${var.ENVIRONMENT_NAME}-"

  keepers = {
    domains = join(",", values(local.hosts))

resource "google_compute_managed_ssl_certificate" "default" {
  name = random_id.managed-certificate-name.hex

  lifecycle {
    create_before_destroy = true

  managed {
    domains = values(local.hosts)

resource "google_compute_ssl_policy" "default" {
  name    = "default-${var.ENVIRONMENT_NAME}"
  profile = "MODERN"

resource "google_compute_target_https_proxy" "default" {
  name       = "default-${var.ENVIRONMENT_NAME}"
  url_map    = google_compute_url_map.default.self_link
  ssl_policy = google_compute_ssl_policy.default.self_link
  ssl_certificates = [

resource "google_compute_global_forwarding_rule" "default" {
  name                  = "default-${var.ENVIRONMENT_NAME}"
  load_balancing_scheme = "EXTERNAL"
  port_range            = "443-443"
  target                = google_compute_target_https_proxy.default.self_link

UPD. I figured out that recreating NEG will resolve the issue:

  1. Wait until Terraform will finish deployment.
  2. Create via Google Cloud Platform Console NEGs with same configurations.
  3. Edit backend services to use newly created NEGs.
  4. It works!

But it's definitely hack and seems like there is no way to automate it with Terraform. I will continue investigating the issue.

Petr Flaks
Glad to hear that your issue has been fixed and I understand that you have achieved it by manually creating NEG thru GCP console and subsequently editing backend services rather than using Terraform. The most likely cause of this issue seems to be racing condition i.e. in Terraform we usually define the resources in a chain and hence each resource being defined is dependent on another resource. Usually while defining resources through Terraform, the backend services creation and NE attachments are dependent on NEG creation. Both the backend services creation and Network endpoint(NE) attachment operations tend to run in parallel and in such case the NE attach process doesn’t reference to the backend service correctly because the state of the Internet NEG will be read exactly during backend service creation/update (so NE attachment has to happen prior to backend creation) .
So, in the Terraform while creating the backend service, we have to define it to be depends-on (meta argument) [1] NE attachment (i.e, backend service should run only after NE attachment).

[1] https://www.terraform.io/docs/language/meta-arguments/depends_on.html

Hope this clarifies your doubt.

Dave M
