What are the additional threats of "plain" AES in contrast to DTLS?
Before I start explaining the additional threats, I will first describe what DTLS does and what it does you protect against.
The PSK "Handshake" of DTLS consists of sending nonces (both sides; server random, client random) and the identity (so the server knows to which client he is talking to and vice versa).
The pre-shared key is used as the so called premaster key. Together with the nonces, a unique session key is derived for each session.
This procedure is described in RFC 4279 for more details.
There is implicit authentication of both participants in this handshake.
With this session key, every message is encrypted and - as you intend to use CCM - protected from manipulation.
Every message has an explicit authenticated sequence number and a proper nonce for CCM.
This protects your application from message replay.
Improper nonce use can lead to disastrous failures, as described on Crypto SE.
In short DTLS gives you:
- Unique session keys (but no forward secrecy)
- Authentication
- Confidentiality (= encryption)
- Integrity (= protection against manipulation of messages)
- Replay protection
- Proper use of the blockcipher mode of operation.
If those properties are relevant to you is ultimately your decision.
In contrast your alternative proposal of using AES directly may lead to problems.
At first, you do not say anything about integrity protection of messages.
Encryption by itself does not protect from manipulation, even if an attacker does not know the plaintext.
You can find an example how this manipulation is possible on Crypto SE.
When using the AES-CBC manually, you also have to establish a way to have an unpredictable initialization vector (IV) -
while not as problematic as in other modes of operations, it can still lead to security vulnerabilities.
You also do not have any replay protection in your "plain" use AES: Thus an attacker may replay valid, but old data which you do not detect.
Last but not least, there is possibility of other security relevant mistakes you can make when writing the code for your custom protocol (time safety, padding oracles to only name two), leading to side channel attacks.
mbedTLS is a well established library and is likely written with those threats in mind or has been audited to be safe against them.
Returning to your question, your plain usage of AES has the following threats in contrast to DTLS:
- Spoofing and manipulating of messages
- Replay protection
- Potentially incorrect usage of the block cipher mode (leading to loss of confidentiality)
- Side channel attacks by improper writing code
How to mitigate these threat if not using DTLS?
To protect against those attacks you have to at least:
- Add integrity protection by using a MAC (e.g. HMAC) or AEAD cipher modes (like GCM, EAX or CCM instead of CBC).
- Add a replay protection e.g. by using an authenticated counter or an authenticated timestamp.
- Use a proper nonce or IV for each message (proper depends on the mode you use)
- Know about the risks and possible mistakes of writing cryptographic software and security relevant software (or hire experts to do it)
You should derive session keys from the pre-shared one like DTLS does, so somehow leaked session keys do not endanger the long term key.
Recommendation
Try to use DTLS. It is usually considered a bad idea
to write (or at least to actually use) your own crypto.
Most if not all arguments valid against writing a custom ciphers can be applied for writing custom cryptographic protocols as well.
The overhead of the DTLS handshake can be greatly reduced, by using features like session resumption.
This only requires a very small amount of storage for saving the session state.
There even does exist a variant for constraint environments (see RFC 5077) to reduce the storage requirements on the server
that is implemented by mbedTLS.
There is also RFC 7925 about (D)TLS for IoT devices in general which may be interesting for you to further minimize the resource usage of DTLS.
Another option would be to search for other protocols (and implementations) having similar properties as DTLS, but with less overhead.