You are asking a couple different questions at once: how is ML implemented by a certain vendor, and how could it be implemented to be effective. Let's focus on the later.
I have designed such algorithms for a major global financial institution, and I can give you the broad strokes.
It's not too good to be true. Any security analyst will look at the logs, get to know how people operate in the company, and "get a sense" for what "normal" looks like. It's called 'baselining'. After some time, you "just know" when something looks fishy and you need to investigate because:
- The network goes quiet every Friday afternoon (for reasons that are obvious to any human in the office at the time)
- Network bandwidth gets maxed when the World Cup is on (streaming, not DDoS)
- That server always throws that error (a bad config no one knows how to fix)
- The user can never remember their passwords and always gets locked out
- We just turn a blind eye to that exec because everyone is afraid to tell them to behave more securely
- etc.
Every other oddity, whether excessive errors or abnormally low errors (hey, why didn't that server throw that error?) deserves investigation. This used to be infosec bread and butter, but now goes under the fancy term of "threat hunting".
But this baselining is easy to teach a computer. And you are correct, it all depends on the ability for the computer to understand what is 'normal'.
It would be a crude approach to take the current state of the network and determine that that is 'normal'. You would train your algorithm to accept bad behaviour. You still do this as a factor, but you cannot depend on it entirely. You need another way of looking at the data. There are a couple ways to add another perspective.
There are lots of fancy terms, but when I talk to non-algorithm people, I use the terms: 'weirdness' and 'badness'. Baselining the current state helps to determine 'weirdness'. If the network is already hacked, and the hacker is freely floating around the network, then that is not 'weird' for this network. It's 'bad', but not 'weird'. If we are good at determining 'weirdness', then we can see if a new hacker comes in, or if the current hacker changes tactics.
We can augment the network baseline by determining what is 'weird' for user types by baselining subsets of the network. Devs act a certain way, execs act a certain way, cleaning staff act a certain way. If a cleaning staff account normally acts like an executive account in our current state, then we have something to investigate, even though it is 'normal' in the current state. So, this recursive baselining is one way to augment the ML perspective.
Another way to add to the perspective is to define 'badness'. Things might be normal, but we can define that activity as categorically 'bad'. It might be 'normal' for this printer to be trying to log into every server in the DC, but we know that this is categorically 'bad'. So, if we can input these parameters into the algorithm (a lookup into a signatures table), we can expose the badness in our otherwise normal network activity.
I first got into UBA after I ran a demo of a new UBA product about 3 years ago. I knew my network backwards and forwards, and instinctively knew what was normal and what wasn't. I looked into the UBA product as a backup for me and my small team to cover that skillset for when people were on holidays or if people left the team. I wasn't expecting much. But the product included this 'badness' perspective that instantly (within 2 hours) exposed badness in my network that I never knew was there even though I had baselined and was performing threat hunting daily.
ML is not too good to be true. It's not perfect, but it is better than humans at breadth, speed, and consistency.