Frankly I don't expect this to be terribly useful. Be that as it may ....
As a preliminary matter almost everything you say is for TLS up to version 1.2 only. TLS version 1.3, which makes fairly major changes in the protocol, was released last year (after a long delay) and is now in the process of spreading; based on historical experience it is likely that TLS<=1.2 will be pretty much gone in something like 3 years. To be fair, most of the resources you can easily find online, notably including the Ursine Epics at #20803, pre-date 1.3.
In either case, the typical TLS Handshake looks like this:
Not either, what you describe covers only RSA; DH is different. More below.
- ... In addition, a random 28-byte value called ClientHello.random ....
- ... ServerHello message ... along with its own random 28-byte value called ServerHello.random and a digital certificate.
The fields named .random
are actually 32 bytes, split into a 4-byte timestamp (which is not random if your computer's clock is even vaguely correct, as it should be) and 28-bytes of actual random data. The value used in key derivation etc. is the 32-byte value.
Strictly speaking the server certificate is not in the ServerHello
message, it is in a separate message. However, both these messages, plus ServerKeyExchange when applicable and ServerHelloDone always, can be part of one record and are usually part of one TCP-level transmission. More substantively, if the server cert requires one or more intermediate or 'chain' cert(s) to be verified, which is almost always the case nowadays, that(those) chain cert(s) should be included as well; there are many Qs on several Stacks about "browsers consider my server connection secure but $other_sw gives $some_error" and this is often due to not correctly configuring a chain cert. (The A often varies depending on the server software involved.)
- The client verifies the server's digital certificate against its trusted CA store. Then the client creates a
pre_master_secret
, encrypts it with the server's public key extracted from the server's digital certificate, and sends that back to the server. This is known as the ClientKeyExchange
- The server decrypts [premaster] using its private key, and then generates a master secret
The client verifies the server cert, (usually) via its chain, against the client's truststore, AND verifies that the server cert matches the name (or possibly address) of the server the client wants to connect to. (If we want to connect to HonestBank.com, and trying to connect gets us a cert that was issued by a trusted CA to WeAreCrooks.com, we don't want to send our bank info on that connection.)
Both the server and the client derive master_secret from premaster and the 2 random's.
If the server requests client authentication, also called client certificate or 'two-way' or 'mutual' authentication, the client actually sends Certificate before ClientKeyExchange and CertVerify after. This is all explained in 5246, but is rarely used.
- Afterward, the client also sends ChangeCipherSpec record (6 bytes) to the server, indicating it wants to use symmetric encryption ....
- From this point onward, all traffic will be communicated over TLS and are encrypted.
After CCS all traffic is encrypted and authenticated; both are important. The methods vary: older ciphersuites use a (pure) cipher to encrypt and a (separate) HMAC to authenticate (HMAC = Hash-based Message Authentication Code); 1.2 also had new (in 2008) authenticated ciphers, officially called AEAD = Authenticated Encryption with Additional Data, which do both encryption and authentication in one combined operation; compare section 6.2.3.3 to the immediately preceding sections.
Question 1: What is "master secret" in the derivation? In other words, what is the actual value?
It's different for every session, and nobody except the two endpoints (client and server) should know it, hence 'secret'. (Although sometimes debugging features let you extract it; there are several Qs on those.) Its value is computed using the formula you posted, from 8.1. In case the 'search' function on your browser is broken and some flaw in your display makes the table of contents invisible, PRF abbreviates Pseudo(R)andom Function and is explained in section 5.
Question 2: How does client encrypt its message? What key/secret is used? I am not sure the role of master_secret?
The master_secret is used to derive multiple working keys, or more exactly secrets; see section 6.3. The client uses the 'client_write_key' to encrypt, and the server uses it to decrypt. For ciphersuites that use IVs, which in 1.2 is only some AEAD ones, they also use the the client_write_IV
. For ciphersuites that use HMAC, which is the non-AEAD ones, the client uses client_write_MAC
to generate the HMAC, and the server uses it to verify. See Are session keys just the symmetric keys? or cross https://crypto.stackexchange.com/questions/1139/what-is-the-purpose-of-four-different-secrets-shared-by-client-and-server-in-ssl .
Question 3: In 3 (RFC section 7.3), it says the following, what are those "random values" and what are their purpose?
- Generate a master secret from the premaster secret and exchanged random values.
This is exactly the formula you posted in your 3 from section 8.1. The ClientHello.random sent to the server, and ServerHello.random sent to the client, are exchanged random values, and are combined with the (shared) premaster_secret to generate the (also-shared) master secret.
Question 4: I often read the term "session key". What is it? Is it the master_secret?
It can be either the master_secret, or the derived working keys/secrets (plural), or both. In particular, session resumption (aka re-use) in TLS<=1.2 is done by saving the session-id (in ServerHello) and the corresponding security parameters including the master-secret, and then using them on a subsequent or even concurrent connection.
RSA
I know that RSA can be used in step 3 above for integrity control, meaning using public-private key asymmetric encryption so pre_master_secret is not readable in plaintext.
'Plain' RSA keyexchange does use RSA encryption, which is a type of asymmetric encryption aka public-key encryption, so that pre_master_secret is not readable. Although asymmetric or public-key cryptography does use public and private keys, we don't normally say 'public-private key'. I have no clue what you mean by 'integrity control'; RSA encryption does not much resist an adversary manipulating the ciphertext, which allowed an attack by Bleichenbacher that remains an issue. The (only) protection on a plain-RSA handshake is the PRF values in the Finished messages, which functions as a kind of MAC (as long as at least one endpoint is honest and correct).
DH The major weakness of using RSA is that using server private key. An attacker can record all traffic and decrypt traffic if the server's private key is compromised. So to provide forward secrecy 4, DH can be used.
DH works on the principle of discrete algorithm. The mathematical properties allow both sides (the client and the server) to generate its own secret (a, b respectively), and derive to the same shared secret given p, G and g^X mod p (where x is a and b respectively) over the public channel (the world can read them) 5.
That's discrete logarithm. More exactly, Diffie-Hellman ephemeral provides forward secrecy; it is the 'ephemeral' that is critical. 1.2 (and earlier) also defines static (non-ephemeral) DH keyexchanges, but these are practically never used and serve mainly to cause confusion. (They are deleted entirely in 1.3.) There are technically two variants: the original DH-ephmeral using integers, designated DHE in TLS; and the elliptic-curve version, designated ECDHE. Although the same principles apply to both, the actual code (and data) to implement them is quite different.
Question 4: I believe all subsequent traffic will be encrypted using the shared secret, correct
Not directly. The secret generated by [EC]DHE agreement is used as the premaster-secret, in the same fashion as above: first derived to the master secret, then to the working keys/secrets. Compare sections 8.1.1 and 8.1.2, immediately after the excerpt you posted.
Perfect Forward Secrecy (PFS)
To ensure no one can read prior logged traffic, PFS was introduced. Basically, instead of using a long-lived shared key, client and server generate short-lived session keys which are discarded from memory.
Question 5: What are the short-lived keys? X (a, and b respectively) of client and server's?
Yes. Except that although generic DH is often described in terms of a/A and b/B (canonically, Alice and Bob), the TLS specs use different notation. For integer-DHE in 5246, the public keys for server and client respectively are dh_Ys
and dh_Yc
; (corrected!) the corresponding private keys presumably are Xs and Xc, but are not shown. For ECDHE in 4492, the server public key is simply named public
while the client one is named ecdh_Yc
(even though in ECC generally we use X,Y for coordinates of a point, and call the privatekey (integer) d and the publickey (point) Q) and again the private keys are not shown.
PSF and RSA
My understanding is RSA is used for authentication (what 'server' sends is coming from 'server', not MiTM). There's HMAC for integrity checks, which is generated during key exchange.
Question 6: Is that right?
(That's PFS. Or just FS.) I'm not at all sure what you're saying, so to mostly repeat what I said before:
for RSA keyexchange, the premaster is encrypted by RSA with the server's publickey in its certificate, and the only integrity check on the handshake is Finished, which uses PRF, which is based on but different from HMAC
for [EC]DHE keyexchange, the keyexchange parameters are signed with the server's publickey in its certificate. That key and thus the signature may be RSA (in either case), or it may be DSA (also called DSS for historical reasons) or ECDSA depending on the keyexchange
regardless of the keyexchange, depending on the cipher either HMAC or AEAD (but not both) is used to authenticate the data traffic