How should failures by a single user on a simulated phishing email be measured?

Question

I work in the IT Security function of my company as a team lead. We periodically send out phishing emails to all users on company network as a form of continuous education of users on how to spot malicious phishing emails. Our company operates in the regulated financial industry and have a diverse user base with various levels of technical ability from IT to customer service roles. We work frequently with sensitive customer data and personally identifiable information (PII).

My team does metrics reporting of user performance on these simulated emails. Sometimes end users take multiple ill - advised actions on a single simulated phishing email we sent such as clicking a link, or opening an attachment in the email.

My thinking is that given each bad action potentially represents a different attack vector that can be exploited by a threat agent, each bad action should be counted as a separate failure. After all, clicking on a malicious link in a true phishing email can result in compromise just as easily as opening an infected attachment in such email. The fact that a single user can take multiple bad actions on a single, albeit fake, phishing email seems to highlight how such end users are not really conscious of their actions or skeptical enough, which only emphasizes the value of this reporting methodology in my opinion.

Question

To most accurately measure end user behavior and where weaknesses may be, should multiple bad actions on an single email be counted as a 1 failure or should each action be counted as a failure on its own?

score 1 · Answer 1 · answered Nov 26 '19 at 06:37

I think focusing on the result of an individual phishing attempt is missing the forest for the trees. As a pentester, when presenting the results of a phishing campaign, I try to present the results to clients in the context of their training processes. Users who take improper action on phishing emails were not properly trained by the organization.

However with that in mind, as an internal IT team, I think you should focus on the aggregated results of phishing campaigns over time. After clicking the first time, did the user receive additional phishing training? After receiving that training, did the user click a similar phishing email later (with approximately the same detection difficulty)? I think those aggregate metrics are what matter, because ultimately the results should influence a change in training processes within the organization. Furthermore the policy within the organization should clearly define how tests are conducting and what the repercussions are for repeated failures. (And to go back to your original question, I think repeated failures in this case means failures repeated across multiple campaigns after repeated training, not multiple clicks on a single campaign).

schroeder · Answer 2 · 2019-11-26T08:29:36.600

Please be very certain about what your goal is. Are you creating measures or metrics?

A measure is raw data that can be used in a future context. If you wanted to understand how users responded to the various data points within the emails, then you would gather data and report as a measure to be used for analysis later.

A metric is a measure with context and meaning. If you want to compare results to expected parameters, if you have defined decisions based on user behaviour, if there are thresholds that you want to watch, then you use the measures and devise metrics from them.

So, the first question to answer is what you want to measure and why. If the multiple data points within the emails are important to you that you want to track over time, then you could create metrics. Otherwise, you are just gathering data without purpose. Which could be valid if you see a future use for that data.

The important question for any database design (which is basically what you are creating) is: what do I want to query in the data and how?

As for failures, you need to look at what success looks like. Can you map the individual failures to specific things they were taught? If not, then gathering that data is not useful. If they were taught about those things, what will you do with the data? Is the data actionable?

From a basic data gathering perspective, you also need to be sure that you can determine the effect of a single data point in isolation from the rest. Can you be sure that the combination of elements wasn't the influencing factor and not the individual elements?

You are dealing with human psychology and not a mechanical system. You need to employ psychological approaches to your question. The better way, is not to decompose the emails for data, but to ask the people directly why they think they failed and what they think they might need from you to equip them to respond differently in the future. That data is relevant, takes all factors into consideration, and is immediately actionable.

How should failures by a single user on a simulated phishing email be measured?

2 Answers2