0

I have a not-so-small cluster hosted on Google Kubernetes Engine (2 e2-standard nodes) with a couple of web services talking to each other. The architecture is composed by:

  • 2 Cloud SQL instances, hosted on GCP
  • 1 deployment and 1 statefulset, 1 replica each
  • an Nginx Ingress Controller installation that exposes both apps (the deploy and the sts) over HTTPS
  • a few internal tools such CertManager and others

The two apps talk to each other over the internal k8s DNS system, they connect to the relative Cloud SQL instance over the external IP and they send emails and push notifications to APNS and Firebase. They expose RESTful APIs for my mobile apps to consume.

In the bill breakdown I can see 1.5+ TB/month of egress traffic generated by Nginx ingress controller which I cannot explain. I expect the egress traffic to consist only in SQL connections and in the email messages and push notifications sent by my pod, since this is the only traffic actually generated in the cluster. I can also see the egress traffic consumed by the 2 apps but the sum of these two values is much smaller then the one consumed by Nginx.

Now I wonder if Google is charging me also the HTTP responses my pod generate towards the mobile apps consuming the APIs, because I cannot see any other reason why my cluster is so expensive.

afe
  • 101
  • 1
  • 3
    Have you tried turning up the logging verbosity of your nginx to ensure someone isn't using it as a proxy server? 1.5TB is a monster amount of traffic – mdaniel Sep 16 '20 at 05:36
  • 1
    Hello. You could use this tool: [VPC Flow logs](https://cloud.google.com/vpc/docs/using-flow-logs) to analyze your traffic and determine the reason for such high egress traffic. – Dawid Kruk Sep 16 '20 at 10:34
  • @mdaniel thanks for the tip, I will try! – afe Sep 16 '20 at 10:37
  • Hi @DawidKruk, I already stumbled upon VPC logs. I enabled everything but in the Log Viewer I cannot choose "compute.googleapis.com/vpc_flows" as Log name since only Cloud Audits are available to me. Even if I write the query manually nothis shows up. – afe Sep 16 '20 at 11:25
  • 2
    The fact that nothing shows could be related to: https://cloud.google.com/vpc/docs/using-flow-logs#no_vpc_flows_appear_in_under_the_gce_subnetwork_resource . Also please take a look here: https://cloud.google.com/logging/docs/access-control#overview – Dawid Kruk Sep 16 '20 at 13:46
  • Ok I was missing some permission although I was already Project Admin..I hate these logics based on permissions... Thanks @DawidKruk for pointing me to the right direction, I can investigate deeper now. – afe Sep 16 '20 at 15:24

1 Answers1

0

Posting this answer as community wiki for better visibility as well to add some context to the comments made under the question.

One of the possible solutions was posted by user @mdaniel:

Have you tried turning up the logging verbosity of your nginx to ensure someone isn't using it as a proxy server? 1.5TB is a monster amount of traffic


Other solutions could use native GCP tools to analyze the flow of the traffic like:

Additional resources on VPC Flow Logs:


Assuming that:

  • There is a cluster within VPC that has Flow logs enabled
  • A pod that is generating traffic

Go to:

  • GCP UI -> VPC Network -> Subnet that GKE cluster resides in -> View flow logs

The filter for this logs should be the following:

logName:("projects/PROJECT_NAME/logs/compute.googleapis.com%2Fvpc_flows") AND resource.labels.subnetwork_id:(SUBNETWORK_ID)

The values PROJECT_NAME and SUBNETWORK_ID were replaced with placeholders

You can use below parameters to narrow down the traffic:

  • jsonPayload.bytes_sent (<,>) VALUE
  • jsonPayload.connection.dest_ip="IP_ADDRESS"
  • jsonPayload.connection.src_ip="IP_ADDRESS"

You can also create a sink and export this logs for further inspection with:

  • BigQuery
  • Cloud Storage (bucket) - download it and use scripts to extract the data
Dawid Kruk
  • 588
  • 2
  • 8