0

We have heavy load sphinx instance. Index is realtime, but we bulk insert data only once a week or so.

It runs on dedicated 12 core / 24 thread server.
Server have only sphinx installed.

Here is snipped of conf file:

index data_all
{
        type                    = distributed

        local                   = data_0
        local                   = data_1
        local                   = data_2
        local                   = data_3
}

searchd
{
        listen                  = 9305:mysql41
        listen                  = 9405
        log                     = /usr/local/sphinx/var/log/searchd.log
        query_log               = /usr/local/sphinx/var/log/query.log
        read_timeout            = 5
        max_children            = 2000
        pid_file                = /usr/local/sphinx/var/log/searchd.pid
        seamless_rotate         = 1
        preopen_indexes         = 1
        unlink_old              = 1
        workers                 = threads
        dist_threads            = 4
        binlog_path             =
}

Each local index is about 17 GB.

Most of the time server load average is less 2-3, but sometimes, the load average of the machine spike to 50 or so.

Currently we have very good response time, even during those spikes.

I am wondering about dist_threads. Do I need to keep it 4 (as number of local indexes) or I need to choose 24 (number of CPU threads). Or should I choose 1, because we have lots of queries in parallel anyway.

Nick
  • 786
  • 2
  • 12
  • 37

1 Answers1

0

Short answer - setting must be equal to number of local indexes.

long ansewer - it depends :

In case of CPU bound workload, setting dist_threads to 1x the number of cores is advised (creating more threads than cores will not improve query time). In case of mixed CPU/disk bound workload it might sometimes make sense to use more (so that all cores could be utilizes even when there are threads that wait for I/O completion).

Nick
  • 786
  • 2
  • 12
  • 37