How to decide how to parallelize nested loops on GPU?

1

Suppose I have an algorithms which I want to implement on a GPU. The algorithm consists of a main loop, and all iterations of the loop can be run in parallel. Also, each iteration of the loop has an inner loop whose iterations can be run in parallel. Lets say that I need N iterations of the main loop, and M iterations of the inner loop (per main loop iteration), and that my GPU has L cores.

If N+N*M <= L, I can run everything in parallel. But if this is not the case, I need to decide what to run sequentially. How should I make this decision? For example, if N=10, M=5, L = 20, when should I choose each of these options (or any other options)?:

  1. Run all main iterations in parallel, and all inner loop sequentially.
  2. Run all main iterations sequentially, and all inner loop in parallel.
  3. Run all main iterations in parallel, two of the inner loops in parallel and the rest sequentially.
  4. Run three of the main iterations in parallel, run each of their inner loops in parallel, run the rest of the main iterations and their inner loops sequentially.

Lior

Posted 2016-04-08T14:57:09.203

Reputation: 119

No answers