» Lord Harris – Why Software is the Elephant in the Room

Various forms of software now underpin virtually every aspect of our daily lives. At work, we are reliant on IT. This is true whether you work in an office, a hospital, a supermarket or an industrial site. At home, we rely on software for our entertainment, and increasingly the appliances in our homes are controlled by software or have software embedded within them. Journeys, work-related or otherwise, are governed by software – not only the governance and control of transportation systems themselves but the individual vehicles have software embedded in them. Accounting systems are based on software and are even mandated for businesses by the tax authorities.

The list goes on. And we all take that software for granted. And on trust. We assume it will work. We assume it will be accurate. And if it goes wrong, we assume it will be quickly rectified with a patch, or that there is a work-around that, more or less, works.

Yet, all these systems are vulnerable – both to internal defects and external threats. Failures are almost inevitable. We get used to these, such as “my video call won’t go through”, and the inevitable injunction “try switching it off and on again”. Of course, even these interruptions to business as usual accumulate a significant cost.

Momentary inconveniences are one thing, but some of the vulnerabilities lead to more serious consequences. Malfunctions can lead to serious repercussions – just ask the sub-postmasters caught up in the Post Office Horizon scandal. In other cases, confidential information and personal privacy may be compromised, or the information needed for a task is unavailable or incorrect. And there is always the risk of a catastrophe.

This was the context for a roundtable held late last year by the National Preparedness Commission (which I chair), in conjunction with the IT Leaders Forum (ITLF) of the BCS. This recognised that:

Software failures are already a significant cost to the UK economy. The ITLF estimates that this may amount to £12 billion per year.
The potential impact of these failures is increasing – particularly as our systems become more complex and more interdependent. Systems are frequently built on top of other legacy systems without full awareness of the potential weaknesses being created. It is not surprising that many software outages happen following upgrades or the introduction of new software modules.
Most software will contain errors in coding. Indeed, according to the US National Institute of Standards and Technology, there are an average of 25 errors per 1000 lines of code. Although these do not normally interfere with the performance of the programmes concerned, they lurk there and may lead to serious and unexpected complications when new software is added.
Finally, software creation is largely unregulated and exists in an environment where speed-to-market often takes precedence over considerations of resilience and reliability.

A report of the roundtable can be found here. The main conclusion was that awareness of the risk of software failure and its consequences needs to be raised.

Organisations need to understand how their own vulnerabilities will be exposed if the systems on which they rely cease to function. Such failure may be caused by an inherent weakness of the software used; it might be caused by a cyber-attack; or it might be a consequence of a problem affecting a key part of the supply chain. No doubt, the software will have been subject to some quality assurance. There will probably be cyber security systems and protocols in place, and similarly supply chains will have been subject to due diligence. However, prudent organisations should still plan for things to go wrong and develop procedures to ensure that core functions can continue.

The roundtable also identified other needs, including:

Improved understanding of IT amongst senior managers and non-executives.
An obligation on software suppliers to improve the robustness and security of the systems they supply. It was suggested that this could be led through more rigorous government procurement standards.
The development of guidelines for organisations to improve the resilience of their systems and their plans for recovery following a system failure.

According to Normal Accident Theory, accidents are inevitable in systems that have two characteristics – complexity and tight coupling. There is no doubt that the dependence of our economy and our society on digitalisation has been gradually increasing. As that has happened, our systems have become more complex and more interconnected.

A further difficulty is that the people using software-based systems have differing understandings of how software operates, what can go wrong, and how to respond. So, resilience includes more than system recovery, it needs to assess the various causes of and responses to failure.

Finally, tight coupling means that components of a process are critically interdependent: they are linked with little room for error or time for recalibration or adjustment. It is clear then that these increasingly complex and tightly coupled systems need to be built to appropriate margins, even when they are not being used in ‘safety critical’ applications.

The consequence of this increased reliance and of the interdependence of systems is that software failure has become ‘the elephant in the room’ and it is an elephant that we must not continue to ignore.

Lord Toby Harris is Chair of the National Preparedness Commission and a Vice Chair of PICTFOR

Pictfor

If you would like further information on this topic or about PICTFOR’s programme of events, please get in touch!