0

using: varnish-3.0.4

can anyone suggest potential cause of backend connection failure, this normally happens when N-Worker_thread goes anything above default of 100 worker_thread(not necessarily all the time)?

In one of the several case, While trying to create 491 thread in peak it was unable to connect to backend. whereas, backend servers were not in load or anything. To narrow down issue its not problem with backend server as it healthy and reachable.

backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many

As i understood, "backend conn. failure" is oppse to configuration 1) Thread max is 1000 * 2[pools], 2) Server load is below 1

Theoretically it should be able to handle that many spikes, And i would not see why backend would fail here.

[NOTE, Due to demand of use, it is designed to cache 1s to 5s at most]

n_worker_thread = 100 , all good

n_worker_thread = 491 , 8 backend_connection failure.

varnishadm

thread_pool_add_delay       2 [milliseconds]
thread_pool_add_threshold   2 [requests]
thread_pool_fail_delay      200 [milliseconds]
thread_pool_max             1000 [threads]
thread_pool_min             50 [threads]
thread_pool_purge_delay     1000 [milliseconds]
thread_pool_stack           unlimited [bytes]
thread_pool_timeout         120 [seconds]
thread_pool_workspace       65536 [bytes]
thread_pools                2 [pools]
thread_stats_rate           10 [requests]

varnishstat

32+03:45:05
Hitrate ratio:        2        2        2
Hitrate avg:     0.9404   0.9404   0.9404


backend_conn           4516262         1.63 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many
backend_fail              9562         0.00 Backend conn. failures
backend_reuse         67350518        24.24 Backend conn. reuses
backend_toolate         361647         0.13 Backend conn. was closed
backend_recycle       67715544        24.38 Backend conn. recycles
backend_retry             5133         0.00 Backend conn. retry
n_backend                    5          .   N backends
backend_req           71855086        25.87 Backend requests made
LCK.backend.creat              5         0.00 Created locks
LCK.backend.destroy            0         0.00 Destroyed locks
LCK.backend.locks      149007648        53.64 Lock Operations
LCK.backend.colls              0         0.00 Collisions
tike
  • 633
  • 1
  • 5
  • 18

1 Answers1

0

Hi Shane thanks for response,

just managed to figure out that backend communication issue was not due to any config failure but due to hardware switch between backend and varnish.

This was difficult to analyse as primary would work fine as oppose to secondary switch which was causing issue while fail-over communication.

This explains loud that backend conn failure without other having other backend n_worker busy/too many/ or over queue is unlikely.

Hope this will be useful for someone in future.

tike
  • 633
  • 1
  • 5
  • 18