I think I found the reasons why PKCS is used with RSA in SSL!
First of all, it's important to know that (Textbook-)RSA encryption is deterministic, that means a message m encrypts under the usage of the same public key e every time to the same cipher text c because c = m^e mod N.
Especially short messages can easily be calculated by an attacker because if m^e is smaller than modulus N, one just needs to calculate the eth root of c (because the modulus operation was never applied to m^e). m itself must be smaller than N, so: m < N and m^e > N
So why is the PKCS encoding needed? Just to bring in more randomness before encrypting the pms?
As stated above: (Textbook-)RSA is a deterministic encryption without randomization, so the attacker could just try a lot of different pms values until one of them results in the same cipher text as send by the victim.
So even if the pms is already a random value, the PKCS encoding adds more randomness which makes it harder for attackers to guess the pms by testing multiple different values.
Why must the length of the encoded pms equal the length of the RSA modulus in bytes?
The longer the plain text the more values one must test for a successful brute force attack. But as stated above the length of a message m must be smaller than the modulus N but m^e must be bigger than N.
So the PKCS encoding increases the length of the pms to its maximum (same byte length as the modulus N) but it's still smaller than N because the PKCS encoding starts with 0x00. But because of its maximum length it is made sure that m^e is bigger than N.
Also, the PKCS encoding seems to be the reason why the Bleichenbacher attack was possible. So what advantage can be found in using PKCS?
The Bleichenbacher attack was possible because the server's error messages helped the attackers to narrow down the possible values for the pms until only one value was left.
Deactivating those error messages solved the problem.
In PKCS#1-2.0 the Bleichenbacher attack was fixed.