17

I have recently moved to the manufacturing sector to take care of security of systems/products, specifically operational technology (OT) products. Based on a recent US CISA advisory, I had to apply a patch to multiple units of the same series/product from a particular vendor. The products are used in the assembly line for automating certain jobs.

How do I make sure that the new update will not brick the products installed in the assembly line?

Here, what are the best practices to do the patching for these kinds of products which will not disrupt the line? The products could be patched only during the shift changes (3 shifts are going on).

"Some industrial sectors require 99.999% or greater ICS uptime. This requirement relates to 5 minutes and 35 seconds or less allowable downtime per year for any reason, making unscheduled patching out of the question."

This is true to a large extent as unavailability for even a single day is not acceptable.

Also, as the products are very costly, I can't afford to have a standalone unit to check whether the new software update is not bricking the unit and fully compatible.

In this case, should I be asking the supplier of the product or the manufacturer of the product to provide evidences, that the new update is safe to install?

hft
  • 4,910
  • 17
  • 32
  • 2
    Is the product _designed_ to be used in this kind of environment? If so, I would hope that the software components that need patching would be on a separate, swappable daughterboard or similar component, so you could have a spare daughterboard but not duplicate other equipment. At some point this gets into fitness-to-purpose (or if the manufacturer's design intent is for each customer to have a spare on-site, then perhaps it comes down to the individual customer being too cheap to follow relevant best practices). Regardless, this is something the designer of the product should be considering. – Charles Duffy Mar 14 '22 at 19:09
  • 1
    (In a past life, I was lead engineer for a piece of expensive equipment; we had the operating software live as a read-only image on a dedicated two-drive RAID-1 array, and by the end of my tenure one absolutely could swap out that array for two drives with a newer version of the software). – Charles Duffy Mar 14 '22 at 19:18
  • 1
    @CharlesDuffy I do OT security assessments including patch assessments. It's not always that simple or that easy. – schroeder Mar 14 '22 at 20:56
  • 2
    Any reason not to contact the manufacturer of this very costly product? – Pablo H Mar 15 '22 at 13:42

3 Answers3

35

Don't touch anything on the line without the vendor providing direct oversight, or even better, doing the work themselves. This should be performed under a support/maintenance contract.

If you cannot test the patch, then you need to transfer the liability to the vendor and be protected by support contracts.

schroeder
  • 123,438
  • 55
  • 284
  • 319
  • 1
    It makes sense to use the contractual agreement if it covers the AMC(Annual Maintenance Contract) and directly work with the vendor to do the patching. What do we do in case of no such support/contract is available? – Baranikumar Venkatesan Mar 14 '22 at 11:06
  • 16
    Well, with the addition you've made to the question, you now see the predicament companies put themselves in. They need nearly 100% uptime, but do not provide the resources to be as resilient as they need to be able to do something like patching. They either test the patch on a vendor's demo equipment, get the vendor to do it, or patch and pray. Trying to hope for resilience doesn't work. – schroeder Mar 14 '22 at 13:48
  • Can you take one device/line offline and test the patched device there? As a scheduled maintenance period. – Rodrigo Murillo Mar 14 '22 at 15:08
  • @RodrigoMurillo from the description from the OP, no. 5-nines uptime means no "offline" time. – schroeder Mar 14 '22 at 15:30
  • 12
    Yes, I saw that. Keep in mind that depending on the SLA and what parties agreed to, business objectives, etc, uptime guarantees sometimes do not include scheduled outages or maintenance. The parties might agree to 8 hours of maintenance per month for example, and still be obligated to meet the 5-nines outside of that period. – Rodrigo Murillo Mar 14 '22 at 16:10
  • @RodrigoMurillo if there were maintenance windows, the question would not be asked ... – schroeder Mar 14 '22 at 20:56
  • As you pointed out, due to given constraints, they may want to consider one in the future. – Rodrigo Murillo Mar 14 '22 at 21:02
14

OT is a different world.

First, what schroeder said. You want to contact the vendor and discuss this with them.

You also want to check with plant management regarding any certifications, Health-and-Safety inspections and other stuff that might become invalid if anything in the system is changed.

You also don't apply patches just because they're there. We do that in IT, because it's usually better to be patched than not. That is not necessarily true in the OT world. In fact, you'll find plenty of outdated systems where the risk analysis came to the conclusion that that's better or that other mitigation measures cover the risk sufficiently.

So no, you never have to apply some patch. You always choose to do so, based on the vendor recommendations, security/risk analysis and operational requirements.

So, in short, here's what I'd recommend (I am currently acting CISO of a manufacturing plant):

  1. be clear why you decide to apply that patch, if there are alternatives (such as other mitigations) to doing so and why you discarded those. I strongly recommend having that in writing.
  2. contact the vendor and inform them that you see the need to apply a patch and schedule a discussion with them. They are the ones who should check the patch works on their systems and guide you through the patching process itself.
  3. contact plant management and inform them that you want to apply a patch, and that according to the vendor this will cause a downtime of X. Get their approval. Also list the risk that the downtime could be longer than anticipated and have them sign off on that risk.
  4. once everyone who is in any way involved is on board, schedule the maintenance, do all the paperwork, let the vendor come in to handle or at least assist in the actual process, and have a rollback/restore plan ready, just in case.
Tom
  • 10,124
  • 18
  • 51
  • 3
    OP stated "Based on the recent US CISA advisory..." but didn't specify which one. It's entirely possible the advisory only applies in specific conditions which may not be met, always important to read the details carefully to figure out the actual risk. – barbecue Mar 16 '22 at 21:12
10

One system I worked on had a simple solution. To guarantee 5 nines, redundant hardware was used. For patching purposes, a fail-over could be initiated manually. One half of the system was patched, another fail-over was initiated, and then the second half was patched. Patches could be reverted in case of failure, and the impact of the temporarily non-redundant system failing was acceptable (<5 minutes).

Of course, this could be generalized to triple-redundant systems so you can fail-over even during patching. Reliability comes at a price.

MSalters
  • 2,699
  • 1
  • 15
  • 16
  • 5
    If you are lucky enough to be able to have redundant equipment. Many OT environments are physically built around the machinery and there is no room for redundancies. – schroeder Mar 15 '22 at 16:41
  • @schroeder: IT hardware usually isn't the biggest thing of the shop floor. Having redundant machinery would be a pain, though, and if the IT hardware is physically part of the machinery then the redundancy is determined by the manufacturer. – MSalters Mar 15 '22 at 16:54