I'm managing a compute node in a HTC cluster. The node is a 56 cores / 112 threads dual Xeon machine, and the typical workload consists of many instances of single threaded Monte Carlo simulation jobs. Benchmarks show that the throughput scales nicely with the number of jobs up to about 56, with some non linearity due to turbo boost frequencies that are not sustained for large numbers of active jobs. All of this makes sense to me and I'd say it's the expected behavior.
The thing that I don't full understand is that scaling is almost completely lost for higher job count. Going to 64 jobs and higher up to 112 the throughput remains constant: the benefit of running more jobs in parallel is completely offset by the longer duration of the single job. I know that scaling is far from linear for hyperthreading, but a null scaling surprised me a bit.
Based on my extremely limited knowledge about the working principle of hyperthreading, my guess is that it might be effective for running two threads of the same process but not for running two separate processes. I'd need some confirmation about this, to definitely rule out the hypotheses of a malfunctioning and eventually disable the hyperthreading.