Grid Engine / multithreading / multi-core / multi-cpu: How to decide optimum number of threads?

0

I am using a program (*) under unix/linux (various flavours) on various servers and clusters, the program supports multithreading. I can specificy how many threads I want via command line option.

Generally speaking, how can I determine how many threads I should specify for the multithreading (to get maximum speed)?

Should the number of threads be lower / equal to the number of hardware threads the respective CPU supports? Is there any rule-of-thumb or starting point?

If yes, how can I find out how many hardware threads a CPU supports?

I should also mention that the computers I run this typically on, have multiple CPUs, each with several cores. Unclear if one core = one thread.

(*) The program I use is bwa, a program for aligning DNA sequences. But my question is general in nature.

user50105

Posted 2013-10-10T07:47:19.057

Reputation:

Answers

0

Well there's a few parts to this question - in general a good rule of thumb is to run no more threads than you have logical processors - though this is usually for the whole system, and may depend on load. To find out how many physical processor cores you have, you can use cat /proc/sysinfo. It'll print a set of lines for each logical core so scroll down and look at the last one (I have 8 almost identical ones on my quad core, HT system)

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
stepping        : 9
microcode       : 0x16
cpu MHz         : 3401.000
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips        : 6819.66
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management: 

I'll pick out the important lines here physical id: 0 (this is the first socket - if you use more than one socket then check the processor and cpu cores for each physical jd - if this is a number greater than 0 you have multiple sockets)

Processor : 7 (This number starts from 0, to n-1,this is the 8th logical core in its socket - looking at the largest number you have for a set of values sharing a physical id )

cpu cores : 4 (I have 4 physical cores - this will be the same for every core, and and since SMP generally uses identical cores, should be the same on a dual socket system)

My processor should allow me to run 8 threads simultaneously, assuming a core per thread. That said, depending on the run time, and other factors you may be able to get away with more

SO has quite a few questions on this and picking two of those, the answers to this question suggest that one thread per logical core is a good idea though this one suggests you may be able to go higher. As such, unfortunately the answer is to start with one thread per process, and tune it higher - which may be an insanely high number of threads, if they arn't long running, memory hungry threads.

Journeyman Geek

Posted 2013-10-10T07:47:19.057

Reputation: 119 122

Thanks, this provides very good points for reasoning. Just to add: in my case I actually do have (a few hundred) long running, memory-hungry threads. – None – 2013-10-10T09:34:20.103

0

Grid Engine is a specific program that makes your question kind of moot if your actually using it. It's whole point is to manage resources and jobs across systems so end users don't have to think in that level of detail.

Introduction

The Oracle Grid Engine software is a distributed resource management (DRM) system that enables higher utilization, better workload throughput, and higher end-user productivity from existing compute resources. By transparently selecting the resources that are best suited for each segment of work, the Oracle Grid Engine software is able to distribute the workload efficiently across the resource pool while shielding end users from the inner working of the compute cluster

Ref: Beginners Guide at the Oracle Grid Engine website.

Brian

Posted 2013-10-10T07:47:19.057

Reputation: 8 439

I have to disagree. Grid Engine can not handle application-internal multithreading support. Invoking programs once per thread introduces overhead, therefore application-level multithreading can be desirable. So Grid Engine and multithreading on application-level are not contradictory. – None – 2013-10-10T08:14:53.180