It's difficult to know exactly why the organisations that handle critical infrastructure choose to implement TCP/IP into such networks, but here are a few of my best guesses at their requirements for networking their control systems together:
- Need a reliable and maintainable way to interface with infrastructure equipment remotely.
- Need a way to address large amounts of equipment across a huge geographic area.
- Must be able to interface with existing computer control systems.
- Must be extensible, so that new devices can be added to the network easily.
- Must be cross-platform, so that embedded devices, different OSes and different computer systems can communicate properly.
- Must have low latency and high throughput for strong performance during critical tasks.
As far as choosing TCP/IP goes, it has a lot of good qualities:
- TCP/IP is mature, stable, fully documented, and one of the most analysed communications protocols in existence.
- TCP/IP provides strong packet ordering, corruption detection (checksums), spoofing resistance, client identification, addressing, etc. with relative ease.
- TCP/IP is patent free, and there are hundreds of open source implementations, for a large range of different processor architectures, under a variety of licenses.
- TCP/IP is available on almost every modern computer device and operating system.
- Existing network infrastructure is cheap to use, mature, reliable and self-healing.
- There are thousands of pre-existing application layer protocols that sit on top of the TCP/IP stack.
In terms of security, SSL and IPsec are two existing protocols that are designed to work with TCP/IP to provide endpoint-to-endpoint security, and both can be implemented with relative ease.
Now, to perform a proper risk assessment on such a huge block of infrastructure, I'd need some specifics. Since I don't have any, I'm just going to make a guess. At this point I'm working on conjecture alone, so don't assume my conclusions to be accurate.
Introducing networking into the infrastructure provides a huge benefit in terms of:
- Monitoring the status and health of equipment without visiting the site.
- Updating and configuring equipment remotely in a centralized fashion.
- Immediate alerting and logging on equipment failure.
- Reduced downtime of equipment due to ability to predict failure.
- Reduce maintenance and inspection costs due to centralization of data.
- Better planning through more accurate supply/demand and load statistics.
- Integration of services for better consumer experience.
The security implications are as follows:
- Every device becomes a potential security vulnerability.
- Potential for snooping in telecommunications channels.
- A hostile entity (terrorist group, enemy nation, etc.) might exploit vulnerabilities to cripple the country's infrastructure.
- Possibility for citizens to abuse/hack the system, e.g. free phone calls, free power, etc.
Part of the perceived security of the system is through obscurity. These machines are likely to run on their own proprietary control systems, and are likely to be interfaced using custom application layer protocols. Furthermore, their addresses are not catalogued or distributed anywhere. It would take a team with strong reverse engineering / pentesting skills, specialised equipment and insider knowledge to identify and break the systems. Of course, we know this has happened before with Stuxnet and SCADA control systems.
Assuming the control systems run custom embedded systems or *nix, the potential for strong security controls is there. One would hope that a multi-billion pound (or dollar) industry would consider proper security and engage in active and frequent security reviews, but sadly this is often not the case. Without access to internal company information, there's no way to tell if such precautions have been taken.
All in all, I think the benefits do warrant the potential security risks. Having working power, gas and water is critical, and 99.9% of the time we're not going to be in a war-time situation. During normal operation, we need the ability to monitor and control everything as quickly and reliably as possible, to ensure proper operation and minimal failure rates. On the security front, the obscurity factor is a useful time delay, and any real security that has been implemented is likely to be more than enough to determine all but the most advanced persistent threats.