Incident management
An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions. Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. These incidents within a structured organization are normally dealt with by either an incident response team (IRT), an incident management team (IMT), or Incident Command System (ICS). Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions.[1]
Description
An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions.[2] Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. If not managed, an incident can escalate into an emergency, crisis or a disaster. Incident management is therefore the process of limiting the potential disruption caused by such an event, followed by a return to business as usual. Without effective incident management, an incident can disrupt business operations, information security, IT systems, employees, customers, or other vital business functions.[1]
Physical incident management
Incident management is considered to be much more than just the analysis of perceived threats and hazards towards an organization in order to work out the risk of that event occurring, and therefore the ability of that organization to conduct business as usual activities during the incident. An important part of risk management process and business resilience planning that Incident management is a real time physical activity.
The planning that has happened to formulate the response to an incident—be that a disaster, emergency, crisis or accident—has been done so that effective business resilience can take place to ensure minimal loss or damage whether that is to tangible or non tangible assets of that organization. Efficient physical management of the incident—making best use of both time and resources that are available and understanding how to get more resources from outside the organization when needed by clear and timely liaison—ensure the plan is implemented.
National Fire Protection Association states that incident management can be described as, '[a]n IMS [incident management system] is "the combination of facilities, equipment, personnel, procedures and communications operating within a common organizational structure, designed to aid in the management of resources during incidents".[3][4]
The physical incident management is the real time response that may last for hours, days, or longer. The United Kingdom Cabinet Office have produced the National Recovery Guidance (NRG), which is aimed at local responders as part of the implementation of the Civil Contingencies Act 2004 (CCA). It describes the response as the following: "Response encompasses the actions taken to deal with the immediate effects of an emergency. In many scenarios, it is likely to be relatively short and to last for a matter of hours or days – rapid implementation of arrangements for collaboration, co-ordination and communication are, therefore, vital. Response encompasses the effort to deal not only with the direct effects of the emergency itself (eg fighting fires, rescuing individuals) but also the indirect effects (eg disruption, media interest)".[5][6]
International Organization for Standardization (ISO), which is the world's largest developer of international standards also makes a point in the description of its risk management, principles and guidelines document ISO 31000:2009 that, "Using ISO 31000 can help organizations increase the likelihood of achieving objectives, improve the identification of opportunities and threats and effectively allocate and use resources for risk treatment".[7] This again shows the importance of not just good planning but effective allocation of resources to treat the risk.
Computer security incident management
Today, an important role is played by a Computer Security Incident Response Team (CSIRT), due to the rise of internet crime, and is a common example of incident faced by companies in developed nations all across the world. For example, if an organization discovers that an intruder has gained unauthorized access to a computer system, the CSIRT would analyze the situation, determine the breadth of the compromise, and take corrective action. Computer forensics is one task included in this process. Currently, over half of the world's hacking attempts on Trans National Corporations (TNCs) take place in North America (57%). 23% of attempts take place in Europe.[8] Having a well-rounded Computer Security Incident Response team is integral to providing a secure environment for any organization, and is becoming a critical part of the overall design of many modern networking teams.
Roles
Incidents within a structured organization are normally dealt with by either an incident response team (IRT), or an incident management team (IMT). These are often designated beforehand or during the event, and are placed in control of the organization whilst the incident is dealt with, to restore normal functions.
The Incident Command System (ICS) is designed to deal with a larger incident involving a respond from multiple agencies. Popular with public safety agencies and jurisdictions in the United States, Canada and other countries, it is growing in practice in the private sector as organizations begin to manage without or co-manage emergencies with public safety agencies. ICS is a command and control mechanism that provides an expandable structure to manage emergency agencies. Although some of the details vary by jurisdiction, ICS normally consists of five primary elements: command, operations, planning, logistics and finance/administration. Several special staff positions, including public affairs, safety, and liaison, report directly to the incident commander (IC) when the emergency warrants establishment of those positions.
Another response system is the Gold–silver–bronze command structure, which is used by emergency services in the United Kingdom and other countries. The system has three levels of command: a gold commander sets the overall strategy, a silver commander is in direct charge of those at the scene and bronze commanders direct responders on the ground. An individual agency may use the system, or multiple agencies may use the system as they liaise. A common feature of the ICS and Gold-Silver-Bronze systems is that they create a separate command system to the agencies' usual hierarchy.
Usually as part of the wider management process in private organizations, incident management is followed by post-incident analysis where it is determined why the incident happened despite precautions and controls. This analysis is normally overseen by the leaders of the organization, with the view of preventing repetition of the incident through precautionary measures and often changes in policy. This information is then used as feedback to further develop the security policy and/or its practical implementation. In the United States, the National Incident Management System, developed by the Department of Homeland Security, integrates effective practices in emergency management into a comprehensive national framework. This often results in a higher level of contingency planning, exercise and training, as well as an evaluation of the management of the incident.[9]
Incident management software systems
Incident management software systems are designed for collecting consistent, time sensitive, documented incident report data. Many of these products include features to automate the approval process of an incident report or case investigation. These products may also have the ability to collect real time incident information such as time and date data. Additionally incident report systems will automatically send notifications, assign tasks and escalations to appropriate individuals depending on the incident type, priority, time, status and custom criteria. Modern products provide the ability for administrators to configure the Incident report forms as needed, create analysis reports and set access controls on the data. These incident reports may have the ability for customization that may best suit the organizations using the systems. Some of these products have the ability to collect images, video, audio and other data. Incident management software systems exist that relate directly to specific industries.
Root cause analysis
Human factors
During the root cause analysis, human factors should be assessed. James Reason conducted a study into the understanding of adverse effects of human factors.[10] The study found that major incident investigations, such as Piper Alpha and Kings Cross Underground Fire, made it clear that the causes of the accidents were distributed widely within and outside the organization. There are two types of events: active failure—an action that has immediate effects and has the likelihood to cause an accident—and latent or delayed action—events can take years to have an effect and are usually combined with triggering events that then cause the accident.
Active failures are unsafe acts (errors and violations) committed by, for instance, the operators of machinery and supervisors of tasks. It is the people at the human-system interface whose actions can have immediate adverse consequences.
Latent failures are created as the result of decisions taken at the higher echelons of an organisation. Their damaging consequences may lie dormant for a long time, only becoming evident when they combine with local triggering factors (e.g., the spring tide, the loading difficulties at Zeebrugge harbour, etc.) to breach the system's defences. Decisions taken in the higher echelons of an organization can trigger the events towards an accident becoming more likely, the planning, scheduling, forecasting, designing, policy making, etc., can have a slow burning effect. The actual unsafe act that triggers an accident can be traced back through the organization and the subsequent failures can be exposed, showing the accumulation of latent failures within the system as a whole that led to the accident becoming more likely and ultimately happening. Better improvement action can be applied, and reduce the likelihood of the event happening again.[11]
See also
References
- UK, Small Business Service, Kingsgate House, 66-74 Victoria Street, London SW1E 6SW. "What qualifies as an 'incident'? | Business Link". webarchive.nationalarchives.gov.uk. Archived from the original on 2011-06-15. Retrieved 2018-01-04.
- Glossary of Terms, The Business Continuity Institute Good Practice Guidelines 2010 Global Edition Archived 2015-04-30 at the Wayback Machine. thebci.org Retrieved on 2015-09-03.
- "List of NFPA Codes and Standards". www.nfpa.org. 2013. Retrieved 10 April 2013.
- "Incident Management | Ready.gov". www.ready.gov. 2012. Retrieved 10 April 2013.
- "National Recovery Guidance - GOV.UK". www.gov.uk. 2007. Retrieved 10 April 2013.
- "Civil Contingencies Act 2004". www.legislation.gov.uk. Expert Participation. 2012. Retrieved 10 April 2013.CS1 maint: others (link)
- "ISO 31000 Risk management". www.iso.org. 2009. Retrieved 13 April 2013.
- Hacking Incidents 2009 – Interesting Data – Roger's Security Blog – Site Home – TechNet Blogs. Blogs.technet.com (2010-03-12). Retrieved on 2012-11-17.
- About the Contingency Planning and Incident Management Division | Homeland Security Archived April 2, 2012, at the Wayback Machine. Dhs.gov (1999-02-22). Retrieved on 2012-11-17.
- Reason J (June 1995). "Understanding adverse events: human factors". Quality in Health Care. 4 (2): 80–9. doi:10.1136/qshc.4.2.80. PMC 1055294. PMID 10151618.
- O’Callaghan, Katherine Mary, Incident Management: Human Factors and Minimising Mean Time to Restore Archived 2011-09-17 at the Wayback Machine, Ph.D. Thesis, Australian Catholic University, 2010.
External links
Further reading
- Adam Krug (2014-09/16), "Incident Management Software System Case Studies", Case Studies 1 – 34
- Wearne S H & White-Hunt, K (2010), Managing the Urgent and Unexpected, Gower Publishing – Case studies