The domain problems in cybersecurity are too-narrow to willy nilly try to apply AIML or DL to.
Not saying to throw data science out the window. Saying you need way more domain expertise to make it go.
One excellent application of data science to the cybersecurity field is to understand the indicators, or IOCs, and how they are sighted (last seen, etc) in memory, on disks, and in network traffic (and positioned where in the network traffic relative to the sources, destinations, and passthroughs -- just like any dataflow). Instead of levering ML or DL, I would instead suggest to focus first on graph algorithms. Understand the relationships of these indicators and their interpretation as time series data
- A File "Hash" (e.g., a SHA256 checksum of a file or a section of memory or network traffic of a process). Here is an example of the domain problems associated with cybersecurity. The way the our signatures (e.g., Yara rules) work on-disk vs. in-memory vs. in network traffic are obviously different code and data, and code and data paths. Parameters or arguments to processes also matter especially for script code
- an IPv4, IPv6 address or path, and its associated network attributes such as BGP-4 ASN if registered with a Regional Internet Registry (RIR). There is often a history associated with these objects and narrowing into them may require understanding complex RWhois and SWIP registration processes
- an FQDN, or Fully-Qualified Domain Name -- sometimes hostname and with Windows Server Forest/Domain bits, perhaps older namestays such as NetBIOS or MS-RPC Named Pipes. Cloud stacks such as Azure AD are changing this nomenclature as well, moving to tenants, subscriptions, resources, et al. A set of Whois records identifying each unique Internet Domain Name can come with its own set of relationships including a rich history of timestamps, owners, name servers, and email addresses
- A credential, often an email address, e.g.,
bertrand.russell@math.onmicrosoft.com
but also a cred user/pass pair, i.e., bertrand:MathIsK00lB00ksRul3
if known (often if compromised)
As a hint of what would be possible, check out the work here -- https://threathunterplaybook.com/introduction.html -- which pivots nicely off of the fields (and parsing languages) from Azure Sentinel and M365/Azure data models