0

I'm working on a m1.large instance in ec2.

m1.large 64-bit

vCPU -2
ECU-4
Memory -7.5GB 
DIsks-2 x 420 
EBS-Optimized - Yes 
Network performance: Moderate

the index files are on an EBS block with 500 (promised) IOPS.

I have one index consists of 3 attributes id - uint second id - string third id - string

I'm indexing 3 large text fields.

index file sizes:

spa - 68mb
spd - 8.8 gb
sph - 567 bytes
spi - 88 mb
spp - 20gb
sps - 36mb

my goal is to reach about 0.1~ sec' per query, however, currently I'm having difficulties reaching this goal.

below is the query log -

had to mask the queries, changed each letter to 'x'

[Mon Aug 12 06:34:17.569 2013] 0.306 sec [ext2/0/ext 33074 (0,40)] [Index_1] [ios=2891
kb=101461.1 ioms=32.8 cpums=306.5] xxx xxxxxxxxx xxxxx
[Mon Aug 12 06:34:43.105 2013] 0.155 sec [ext2/0/ext 55208 (0,40)] [Index_1] [ios=256
kb=10974.0 ioms=42.7 cpums=120.1] xxxxxx xxx
[Mon Aug 12 06:37:43.063 2013] 0.148 sec [ext2/0/ext 122 (0,40)] [Index_1] [ios=257
kb=17985.1 ioms=6.1 cpums=148.9] xxxxxxxxx xxx xxxxxxxxx xxxx xxxxx xxxx xx xxxxx
[Mon Aug 12 07:00:18.735 2013] 0.179 sec [ext2/0/ext 1409 (0,40)] [Index_1] [ios=246
kb=9872.1 ioms=139.3 cpums=48.3] xxxxxxx xxx xxxxxxx
[Mon Aug 12 07:00:40.811 2013] 2.395 sec [ext2/0/ext 54213 (0,40)] [Index_1] [ios=2360
kb=92897.0 ioms=2004.9 cpums=458.9] xxxx xxxx xxxxxx
[Mon Aug 12 07:04:49.447 2013] 0.358 sec [ext2/0/ext 17698 (0,40)] [Index_1] [ios=696
kb=26975.8 ioms=237.0 cpums=140.2] xxxxx xxxxxx xxxx xxxxx
[Mon Aug 12 07:05:29.542 2013] 0.041 sec [ext2/0/ext 0 (0,40)] [Index_1] [ios=8 kb=1606.5
ioms=26.3 cpums=16.8] xxxxxxxx xxxxxxx xxx xxxxxxxx
[Mon Aug 12 07:05:40.208 2013] 0.244 sec [ext2/0/ext 72176 (0,40)] [Index_1] [ios=376
kb=15216.4 ioms=41.1 cpums=214.0] xxxxxxxx xxxxxxxx xxxxxxxx
[Mon Aug 12 07:13:28.726 2013] 10.177 sec [ext2/0/ext 703 (0,40)] [Index_1] [ios=6235
kb=294854.2 ioms=8724.6 cpums=1723.4] x xxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxx, xxxxxxxxx
a xxxxx xxxxxxx xxxxxx, a xxxxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxxx xxxxxxx xx xxxx xxxxx
xxxxxx xxxxxx
[Mon Aug 12 07:14:16.458 2013] 1.522 sec [ext2/0/ext 703 (0,40)] [Index_1] [ios=6235
kb=294854.2 ioms=100.1 cpums=1523.6] a xxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxx, xxxxxxxxx a
xxxxx xxxxxxx xxxxxx, a xxxxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxxx xxxxxxx xx xxxx xxxxx
xxxxxxx xxxxxx
[Mon Aug 12 07:36:47.737 2013] 1.391 sec [ext2/0/ext 727 (0,40)] [Index_1] [ios=5899
kb=269990.2 ioms=161.8 cpums=1320.6] a xxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxx, xxxxxxxxx a
xxxxx xxxxxxx xxxxxx, a xxxxxxx xxxxxxx xxxxxxx xx xxxx xxxxxxx xxxxx xxxxxx xxxxxx
[Mon Aug 12 07:38:12.832 2013] 1.325 sec [ext2/0/ext 140830 (0,40)] [Index_1] [ios=3264
kb=120011.3 ioms=737.1 cpums=652.5] a xxxxx xxxxxxx xxxxxxx xx xxxx

sphinx conf -

{
source = DB
path = /home/ubuntu/sphinx_drive/sphinxdata/index/IndexMain
docinfo = extern
charset_type = sbcs
stopwords = /home/ubuntu/sphinx_drive/sphinxdata/stopwords
morphology = stem_en
min_word_len = 3
html_strip = 1
}


searchd
{
mysql_version_string = 5.0.37
listen = 0.0.0.0:9999:mysql41
log = /home/ubuntu/sphinx_drive/sphinxdata/log/searchd.log
query_log = /home/ubuntu/sphinx_drive/sphinxdata/log/query.log
read_timeout = 5
max_children = 30
pid_file = /home/ubuntu/sphinx_drive/sphinxdata/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads
binlog_path = /home/ubuntu/sphinx_drive/sphinxdata/data
compat_sphinxql_magics = 0
}‏

do you have any suggestions or recommendations for improving the query speed? if you need any other information please ask and I'll attach.

Thanks!

YSY
  • 105
  • 4

1 Answers1

2

TL/DR

Here's the roundup of my advice (see the headings below for more of an explaination)

  • Produce stats on your DiskIO/Memory/CPU Usage
  • Try more IOPS, does this have a significant impact to query time?
  • How much Memory is Sphinx currently using?
  • Investigate problem queries (turn on verbose logging)
  • Take advantage of multiple CPU cores on the same computer

Useful Information to gather

Have you checked the performance of your EC2 to see where it might be struggling (if at all)? I'm thinking DiskIO, Memory, CPU would be good indicators to check.

It would be interesting to see if increasing IOPS has a significant increase in performance in this case, have you tried a few different IOPS values to see how that might improve performance?

Memory - I expect you're using far less than 7GB

http://sphinxsearch.com/blog/2011/11/11/sphinx-memory-consumption/

This article calculates memory by excluding the .spd and .spp files. So your memory consumption should be around the 200MB mark.

You may also need to account for rt_mem_limit & mem_limit. Having said that, it seems unlikely that you'll be consuming more than 7GB of Memory.

You can confirm your Memory usage with the following command SHOW INDEX myindex STATUS

Here's a thought: If you don't need that much memory but could do with more CPU, you might be better off using 2x c1.medium ($0.183) instead of 1x m1.large ($0.320)

Track down that query

http://sphinxsearch.com/blog/2011/10/27/sphinx-performance-know-your-queries-time/

query_log_format = sphinxql
query_log = query.log

Then restart the Sphinx daemon and you should get much more useful output.

The idea here is to use this data and look for clues at to what the problem is (one particular query could be causing an issue, and you may want to try and optimise it specifically).

multi-threaded search - Take advantage of multiple CPU Cores

You may want to look into the sphinx distributed search feature, it can help out for some query types. You can configure it to take advantage of both the CPU cores you have in the m1.large

http://www.mysqlperformanceblog.com/2013/01/16/sphinx-search-performance-optimization-multi-threaded-search/

Also, you get a bonus: once you configure server for distributed search, you can do indexing in parallel too!

...

Word of caution: while this technique will improve most types of search queries, there are some that aren’t going to benefit greatly from parallel execution.

...

if data nodes return large amounts of data to post-process, aggregator may well become a bottle-neck due to its single-threaded nature

Drew Khoury
  • 4,569
  • 8
  • 26
  • 28