I'm writing keystoke dynamics identifcation as an assignment and I'm having trouble with storing data. I was instructed only to "don't keep raw data" as it should've been transformed. I wasn't told what form data should take to not be considered raw or how to transform it. Unfortunately I couldn't find any articles/publications that'd tell me either.
I am recording raw data as follows:
key | Pressed/Released | Time since last event
Z P 64
Z R 96
A P 88
P P 72
A R 9
P R 64
O P 88
O R 81
Then I process it by separate metrics: UpToUp, dwell, flight, interval and latency. Let's look at UpToUp data (they all output frequencies in the same format). Timings are calculated:
Z -> A: 169 (88+72+9)
A -> P: 64
P -> O: 169
Then grouped and counted:
[(64,1), (169,2)] or more general [("60-70",1), ("160-170",2)]
Keystroke recording can be of any length. If it were bigger, we'd get more diverse frequencies that'd be normally distributed .
I'm looking for confirmation or recommendation, whether these frequency countings can be transformed further, so it's more distant from raw data. As I said, I couldn't find any publication which would answer me.
With voice biometrics I came across FFT, but I couldn't find it in keystroke context.