10

Assume there is an incident that requires immediate response, such as a virus outbreak over email, Cryptolocker actively encrypting files, or a DOS attack.

How should I approach this in a way that would not only be valued in terms of our customers (SLA, etc), but also be positively viewed by all levels of management and my peers?

I suppose there are the following phases:

  • Identification
  • Containment
  • Remediation

Sometimes an incident requires us to go backwards and re-identify the issue, (e.g. it's not a web server issue, it's a DOS attack), and often a well-intended technician will work on tasks that overlap and may not help the situation, or worse, they may impede other issues. (e.g. a SAN restore on the same LUN as production, killing performance)

Question

Since there are often many moving parts to solving the issues, what process can I look at for guidance to give the containment and remediation process more structure?

Some things I can think of include:

  • Identify affected users, business stakeholders
  • Identify people, vendors that are working on the solution
  • Communicate tasks, and status of all tasks between people and vendors working on the solution
  • Share audience appropriate status (helpdesk, management, executive)

There should be some kind of guidance that has already written that addresses this, e.g. in a "runbook" of sorts, but I'm not sure what it would be called. Search terms would be appreciated

AviD
  • 72,138
  • 22
  • 136
  • 218
makerofthings7
  • 50,090
  • 54
  • 250
  • 536
  • 2
    Incident Response Policy – schroeder Mar 05 '15 at 01:16
  • 3
    SANS has papers on this: http://www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901 – schroeder Mar 05 '15 at 01:24
  • 1
    First you need to have a list of stakeholders for the affected services. There's nothing worse than wondering who needs to be informed and the finger pointing blame game that happens because no one wants to decide on who owns what and what can be removed. Given a choice stakeholders without a real stake in the service/machinery would rather have things running perpetually than be aware of changing operational risks. Computers may be machines, but they're not perpetual motion machines without issues. – munchkin Mar 05 '15 at 11:20

1 Answers1

1

Plan your business continuity. You should identify the proper people for mission critical and non-mission critical services, which systems contain the most important information, who can decide when to take the systems offline and at what threshold.

Good overview on (wikipedia)[http://en.wikipedia.org/wiki/Incident_management]

ITIL has tons of information on this:

Activities of ICM defined by ITIL v3

  • Identification - detect or reported the incident Registration - the incident is registered in an ICM System
  • Categorization - the incident is categorized by priority, SLA etc. attributes defined above
  • Prioritization - the incident is prioritized for better utilization of the resources and the Support Staff time
  • Diagnosis - reveal the full symptom of the incident
  • Escalation - should the Support Staff need support from other organizational units
  • Investigation and diagnosis - if no existing solution from the past could be found the incident is investigated and root cause found
  • Resolution and recovery - once the solution is found the incident is resolved
  • Incident closure - the registry entry of the incident in the ICM System is closed by providing the end-status of the incident[5]

Incident Manager responsibilities

  • understand any incident/fault on a basic level (at least) in order to use the appropriate competences (resources)
  • drive the restoration team to gather sufficient information to start an analysis maintain a general overview of the incident (keeping the focus on the restoration via a workaround)
  • understand the functionality of multiple areas (RAN, Core Network, VAS, BSS/OSS)
  • obtain guidance on priorities to the teams starting the immediate urgent unexpected recovery work
Jonathan
  • 2,288
  • 13
  • 16