The RSA keys only need to generate two random primes of the appropriate size, so they can be generated fairly quickly. The DH moduli you generate will contain many primes, and these are safe primes which take longer to find (prime p is a safe prime if (p – 1) / 2 is also prime). Using safe primes for Diffie–Hellman moduli protects it from the various security issues. In addition, generating a single 4096-bit DH modulus requires finding a 4096-bit safe prime, whereas generating a single 4096-bit RSA modulus requires finding two regular 2048-bit primes, which is much faster. This, combined with the fact that the modulus file contains many moduli, is why generating DH moduli takes so much longer. For both the RSA key and the DH moduli, you want to select 4096 bits.
The -a
option determines the number of KDF rounds used for decrypting your private key on your client with a passphrase. You want to make this as large as possible while not taking so long that it is annoying, assuming you wish to password-protect your private key. What this does is slow down a potential brute force attack against your private key. This flag also increases the number of primality tests done on generated keys to the number you specify, if you are generating DH moduli. Anything above around 64 will be plenty. Letting it use the default (100) is just fine and there's no real reason to increase it. The second half of my answer explains why this is in further detail.
Remember, a large RSA key makes it more difficult for an attacker to falsify authentication, and a large DH modulus makes it difficult to decrypt your session key and read (or modify) encrypted traffic. Using a custom modulus instead of an existing, common one protects against logjam, although not as much as simply choosing a larger modulus (e.g. a custom 1024-bit modulus does not provide as much security against precomputation attacks as using a public 2048-bit modulus).
You also asked in your bounty:
I would like to see algorithmic issues e.g. in the prime tests running on a standard PC considered.
Most prime generation uses the Miller–Rabin primality test, which I explain in more detail in another answer. This is a probabilistic test which can determine that a prime is composite with 100% certainty. It can only determine that a number is prime with at worst 75% certainty.* That is, the test will return either "certainly composite" or "probably prime". Prime numbers will always return "probably prime", no matter how many times the test is run. Composite numbers will only return "probably prime" at most 1/4th of the time. If the test is run multiple times in a row, the chance that a composite number consistently fools the test into thinking that it's prime is astronomically low. OpenSSH uses 100 rounds by default, so the chance that a generated prime is not actually prime is, at worst, 4-100.
There are deterministic algorithms such as the AKS primality test that can determine with 100% certainty whether or not a number is prime, but they are significantly slower and do not provide a meaningful increase in security, so probabilistic tests are used instead. The chance of accidentally generating a composite instead of a prime after 100 rounds is, at worst, the chance of guessing a 200-bit key on your very first attempt (because 4-100 = 2-200)! Knowing that, you can see why a PC can generate prime numbers with more than reasonable certainty that they are indeed prime.
Using -a
to adjust the number of primality tests has much less of an impact on the amount of time it takes to generate primes than using -b
to adjust the size of the primes. The reason for this is that it specifies the maximum number of primality tests to perform before concluding that the number is indeed prime. The vast majority of numbers are composite, so the vast majority of candidate primes will be discarded after only a few rounds because there's no reason to keep testing a number that has been confirmed to be composite. It doesn't matter all that much if you specify 100 or 1000, because only the numbers that actually are prime will go through (and pass) every test. The first time the Miller–Rabin algorithm returns "certainly composite" for a number, there's no reason to keep testing it and OpenSSH can go on to the next candidate.
* The accuracy of this probabilistic test is actually far higher if you're testing random odd numbers chosen from a uniform distribution, which is the case if you are generating primes. It is proven that a k-bit composite will survive a single round of testing with a chance less than k242-√k for k ≥ 2. Stronger bounds exist for larger numbers, with 2-75 for k = 600!
Another thing to be aware of, which dave_thompson_085 pointed out in a comment, is that modern OpenSSH implementations support better algorithms like Ed25519 for authentication and Curve25519 for key exchange. Curve25519 does not require generating prime moduli to work and Ed25519 keys can be generated far more quickly than RSA keys because it is unnecessary to generate random primes. Both these algorithms are extremely secure and should be preferred when possible.