Are there any real instances of threats to machine learning systems?

Question

I recently read a paper on The security of Machine Learning. Are there any instances of attacks against machine learning systems besides spam filters?

Again, the paper offers examples of the threats. Just combing through the references at the end has a number of citations from sources that explain the threats. Attacks against spam filtering algorithms have been going on since the filters were first introduced and those attacks are commonplace even today. — schroeder, Oct 22 '17 at 12:26
*"threats to machine learning __algorithms__?...attacks against machine learning __systems__?"* - your title and body don't match. There is a huge difference between a ML algorithm (i.e. SVM, NN, DCT...) and a ML system. The system uses algorithms ultimately but the weakness is usually not in the algorithm itself but in the kind of extracted features, amount of data to learn .. or the choice of the wrong algorithm for the problem or a bad tuning (i.e. overfitting, underfitting..). So what exactly you are asking about? — Steffen Ullrich, Oct 22 '17 at 12:32

score 2 · Accepted Answer · answered Oct 22 '17 at 14:57

Yes, there are recent examples of attackers targetting ML systems in order to later evade detection. The term used for the study of this effect is "Adversarial Machine Learning". There are a number of papers:

One of the techniques discussed is the "boiling frog" technique. Named after the effect that one can boil a live frog without it jumping away to safety if the heat is turned up slowly. Attacks using this method attempt to mistrain the ML system by introducing anomalous stimuli at a very low anomaly factor, but with high frequency. Over time, the anomaly factor increases until the ML system learns to ignore truly anomalous stimuli, at which point the actual attack on the target systems begin. Because the ML systems have been trained to ignore it, the attack goes undetected. (The same technique has been used for years against people and security guards).

This can be done with spam filters, IDS, UBA systems and Netflow analyzers.

For instance: if you have gained the credentials of a valuable user, and you want to be able to log in as that user during a time when that user does not normally log in, you can start logging in using those credentials at times only slightly different from the user's normal pattern while not performing any unusual activity. Once you have seeded the logs with enough logins around the time when you want to perform your malicious activity, you can make your attempt and know that you will not trigger alarms for the anomalous login time. By seeding the logs with non-malicious activity, secondary and tertiary analysis of the activity will support the decision that the activity is not anomalous.

There has been research into anti-ML ML systems. Like 'firewalker' tools that try to map out how firewall rules are configured, these systems attempt to determine how the detection mechanisms work and design their own systems to evade them. ML studying ML systems. I'm trying to locate this research. — schroeder, Oct 22 '17 at 15:35

Steffen Ullrich · Answer 2 · 2017-10-22T14:19:56.377

1

Are there any real instances of attacks against machine learning systems?

Probably the most well known systems based on ML in information security are spam filters. Since sending spam is a lucrative business these filters get attacked a lot by making they look more like non-spam for a machine. Many of the attacks target the feature extraction, for example by including invisible non-spam text into the HTML so that spam filter thinks that the majority of the text is non-spam and classifies the message as such. Other ways to evade the ML based filter is to use alternative spellings for phrases which got learned by the filter as spammy, i.e. use of misspellings, homographic characters, special unicode characters like zero-length spaces etc.

There are other uses of machine learning and heuristics in information security. They are not that well known as spam filters but one can find at least papers which deal with detecting drive-by-download attacks based on URL patterns, derive reputation of hosts based on DNS history and whois records, detect malware C&C communications based on the target URL etc. Attackers bypass such methods by making the URL's look more like typical innocent URL's, compromise hosts with a high reputation to spread their malware or use innocent targets like twitter, blogspot etc for C&C communication.

edited Oct 22 '17 at 14:19

answered Oct 22 '17 at 13:21

Steffen Ullrich

184,332
29
363
424

what's C+C communication? Pardon my ignorance ! – Towfik Alrazihi Oct 22 '17 at 14:16
1

@TowfikAlrazihi: command and control communication, i.e. how malware phones home and gets its instructions. – Steffen Ullrich Oct 22 '17 at 14:17
I disagree with your premise. First, the paper itself uses spam filters and attempts to mistrain filters as the primary examples. Second, evasion is very different from attacking the learning system. Your examples in your second paragraph have nothing to do with attacking the learning system, only with evading detection for a single instance. As such, this does not answer the question in the context of the paper. – schroeder Oct 22 '17 at 14:43
Examples of *actual* attacks on the new ML systems would be interesting. – schroeder Oct 22 '17 at 14:43
@schroeder: This would be definitely interesting. But apart from spam filters there is not much publicly known where and especially how ML gets used in information security. Several companies claim to use ML but provide no real details (i.e. ML as buzzword mostly). One need to rely mostly on academic papers for details how these systems might work internally and how they can be attacked. But, the examples I've given are real world examples how attackers adapt to current heuristics, which might be ML based or not (unknown). – Steffen Ullrich Oct 22 '17 at 15:29
I have been heavy in this space for a while. There are many companies using true ML in security applications and will talk about their techniques. It's not a buzz word. There is also a lot of snake oil, but some are really trying to do it right. – schroeder Oct 22 '17 at 15:33
@schroeder: I did not mean ML being __only__ a buzz word. What I meant was that many companies offer few details on how their systems actually work but like to use ML as a buzz word to advertise their systems as state of the art or better. If you have links to information which contain really useful details then I'm interested. – Steffen Ullrich Oct 22 '17 at 15:39

Are there any real instances of threats to machine learning systems?

2 Answers2