Good analogy needed: Sec issues due to different coders implementing the same features in different ways for the same app

Question

I have to give a school presentation about vulnerabilities found in the Moodle platform. Of course, they only apply to a legacy version which has since been patched.

The catch is that the presentation should be aimed at an audience with no technical knowledge. So I'm not allowed to explain anything using the code, but I should give an explanation which helps the audience to understand the problem even if they have zero technical understanding.

Moodle's problems arose from different contributors implementing the same functionality in different ways which created some security loopholes. One of the vulnerabilities alone wouldn't have sufficed to do considerable damage to the system, yet combining exploits of false assumptions, an object injection, a double SQL Injection and a permissive Administrator dashboard RCE became possible.

I thought of explaining it with a house which was originally build safely, but then got insecure because of some additional stuff built onto it, but I'm looking for something a little more specific or maybe some real-world historic scenario which would fit.

I thought maybe you might have a better idea.

I guess what I'd ask is what's the general idea you're trying to convey? It sounds like each Moodle vulnerability was quite real, but difficult to exploit, unless combined together. If that's the idea, I'd look towards synergy as an analogy. Like alcohol and anti-anxiety medication for instance. — Steve Sether, Jun 05 '18 at 15:01
@SteveSether no, it was small vulnerabilities that exposed a bigger vulnerability when used in combination: "false assumptions, an Object Injection, a double SQL Injection and a permissive Administrator dashboard" — schroeder, Jun 05 '18 at 15:07
you might review some of the airplane crashes that lead to CRM (cockpit resource management) training, because there's a lot of situations where doing the right thing, but in a vacuum can be problematic. — dandavis, Jun 05 '18 at 19:34
@dandavis That's exactly what I was thinking. Sometimes we leave in the possibility of SQLi because we're all too focused on a light bulb or something. — forest, Jun 06 '18 at 01:43
Analogy: A long time ago I talked to someone who had been responsible for the actual construction process of nuclear power plants. With all the different subcontractors doing things (partially) their way, of course things went wrong. He remembered an incident where contractor X build a pipe 'passthrough/duct' through a concrete wall at location A, and contractor Y laid pipes ending at that wall at location B two meters away. They 'solved' this by drilling through the wall location B. What did that mean for the safety of the reactor? — , Jun 06 '18 at 12:28
Thanks to everybody for the great suggestions! I checked Tavian's answer as the right one, because I'm going to use that example in my presentation for it's simplicity. Of course, with such a question there's not one true answer... — SuperSpitter, Jun 06 '18 at 19:23
@JanDoggen Make that an answer before a moderator deletes it and says "don't provide answers as comments" (and then you have to waste time retyping it). — Pharap, Jun 08 '18 at 01:49
@Pharap Thanks but this was a long time ago and I have no actual data about the incident. That's why I left only a comment. — , Jun 08 '18 at 07:16
I suggest changing the title to _... implementing the same features in different ways **so introducing different weaknesses** ..._. Although implementing the same features in different ways is bad enough per se from a maintenance and quality perspective, it's going to increase risk only if some or all of these developers introduces even just minor wrong assumptions / bugs / bad practices, which will then combine into something not so minor. We know that it's almost automatic that new code has wrong assumptions / bugs etc., but a non-technical audience might not take that for granted. — SantiBailors, Jun 10 '18 at 08:57

score 178 · Accepted Answer · answered Jun 05 '18 at 21:20

178

Here's an idea for an analogy that I think is fairly accurate while generally understandable:

A bank requires two forms of ID to get a loan: a driver's license and a birth certificate. Bank employees Alice and Bob are lazy in different ways: Alice always stamps "driver's license verified" without checking, while Bob always stamps "birth certificate verified" without checking.

Individually this is bad but not too bad -- anyone applying with forged documents would get caught by the one check the employee still does perform. But one day Alice is running late, stamps "driver's license verified" on a form, and leaves it for Bob to finish up. Bob sees the form, assumes Alice actually verified the license, and stamps "birth certificate verified" without checking like he always does. The loan is approved, without either form of ID having been checked.

answered Jun 05 '18 at 21:20

Tavian Barnes

1,046
1
7
5

138

An observant bad actor might arrange for Alice to run late. – emory Jun 06 '18 at 03:13
12

You can make this more apt. "Alice and Adam are a team at the bank, Alice always checks for drivers licences, Adam always checks birth certificates" "Bob and Barry are another team, Bob does license, Barry does birth certs" one day the manager rearranges the teams, they are so used to doing things their way. Alice and Bob get paired up, Barry and Adam get paired up. Now their is security flaws regardless of 2 people checking, because they checked for the same thing twice! – Ryan Leach Jun 07 '18 at 09:54
9

@RyanTheLeach the problem is not checking some things twice, but checking things never. Your scenario works, but the reasoning for the security flaw is off. – Arsenal Jun 07 '18 at 14:38
30

Fun fact: Alice and Bob slack off on their form processing duties because they're trying to work out crazy math problems for secret communication. – corsiKa Jun 07 '18 at 22:13
37

There was a [real life case](https://www.wired.com/2012/08/apple-amazon-mat-honan-hacking/) like this. Amazon's service department allowed you to add a credit card to your account if you knew the account's name, email and billing address. Meanwhile, Amazon's account recovery process verified your credit card number (which you just added). If you tried to buy something on the victim's card shipped to a new address, it would stop and ask you to verify the "full card number ending in 1234". Great for Amazon, but Apple's recovery process used those same last four digits to verify your AppleID... – that other guy Jun 08 '18 at 20:34

schroeder · Answer 2 · 2018-06-05T15:13:14.063

38

The situation is: people working independently without coordination, to design functionality meant to be useful locally, but when combined, created a disaster.

The first historical references that come to mind:

the chaos of the UK rail system where each train line owner ran their own tracks, track sizes, trains, and timetables (and sometimes, their own clocks)
the first New York fire-fighting companies where they fought for customers (sometimes literally, with fire and fisticuffs)

Both make for fun stories and exciting visuals.

The solution to both was central oversight and central planning, which is the same solution to Open Source fiascos, like the one Moodle experienced.

edited Jun 05 '18 at 15:13

answered Jun 05 '18 at 14:46

schroeder

123,438
55
284
319

11

Based on the description of what happened, I don't really agree with your assessment. It sounds like there's 3 vulnerabilities that alone would be hard to exploit, but combined lead to a major problem. I see no reason this has anything to do with OSS, and would be solved by centralized planning. – Steve Sether Jun 05 '18 at 14:58
@SteveSether I do not understand your comment. Having one train line or one firefighting company is hard to exploit. Having multiples that use different priorities creates the problem. The solution is alignment. Large Open Source projects require alignment else problems like this occur (same in large, decentralised closed source projects) – schroeder Jun 05 '18 at 15:04
@schroeder Thanks, those examples sound interesting. I'm already working my way through Wikipedia's article on the History of the British Railway. Could you specify the era, when that situation happened? – SuperSpitter Jun 05 '18 at 15:16
Damn, I missed that had already provided links. Thx! – SuperSpitter Jun 05 '18 at 16:30
4

@schroeder I think the rail example is poor. Here the metro system is completely isolated from the regular rail lines and it increases security (it is physically impossible for a metro train to make a wrong turn and get on the regular tracks). This is important bc metro trains are not rated for a collision with a freight train. – emory Jun 06 '18 at 03:09
@emory the rail example has to do with the eventual combining of the rail lines – schroeder Jun 06 '18 at 05:56
@schroeder it is an efficiency nightmare because your trains can not run on my tracks and vice versa, but it is good security. I won't worry so much about whether your trains are up to my safety standards. It is up to you whether to equip the trains with brakes or whether train operators can drink on the job. (Hypothetically, your Windoze viri could not possibly infect my Linux systems because the two are completely non interoperable.) – emory Jun 06 '18 at 13:44
@schroeder You mention *the chaos of the UK railway system* as if it's purely historical. For ideological reasons, the British railway system was privatised and broken up again in the 1990s, with various operators running various routes with numerous overlaps. Apart from track sizes, ***all*** of those problems have existed on British railways from the moment of privatisation. Even worse, timetable changes recently have caused cancellation of services round much of the country, simply because they can't organise to have trains ready for use. – Graham Jun 06 '18 at 15:52
2

These are both good analogies. I'm not sure why people are getting caught up in the fact that the analogies don't PERFECTLY match a security scenario. They're analogies! They don't need to be 1:1 for the situation being explained! One of my focuses is end user education, and I wouldn't hesitate to use these analogies to explain the 'too many cooks' problem. – Monica Apologists Get Out Jun 07 '18 at 13:01

score 38 · Answer 3 · answered Jun 05 '18 at 19:01

Here's a perfect example: the loss of the Mars Climate Orbiter. https://www.wired.com/2010/11/1110mars-climate-observer-report/

To quote:

A NASA review board found that the problem was in the software controlling the orbiter's thrusters. The software calculated the force the thrusters needed to exert in pounds of force. A separate piece of software took in the data assuming it was in the metric unit: newtons.

This resulted in a disagreement of about 4.5 times the right amount. The satellite entered the atmosphere much too deeply and burned up.

score 29 · Answer 4 · edited Jun 16 '20 at 09:49

This brings to mind the Hyatt Regency walkway collapse in 1981. TL;DR the architect stipulated one design, a manufacturer on contract substituted their own design, mechanical failure (and fatality) ensues. It is a common case study for engineering students.

Source: https://en.wikipedia.org/wiki/Hyatt_Regency_walkway_collapse#Investigation

EDIT to provide additional clarity:

The engineer of record attributed the fatal design flaw to a breakdown in communication. He ... entrusted the responsibility to the associate in charge of each project.

The engineer of record further contended that it was common practice in the industry for the structural engineer to leave the design of steel-to-steel connections to the fabricator...

... the fabricators ... subcontract[ed] the work to an outside detailer. That detailer, in turn, mistakenly believed ... the shop drawings had already been designed and therefore performed no calculations on the connection himself.

... the engineer of record affixed his seal to the documents ... [and] ... stated that he had not personally checked all calculations and had relied on the work of his project engineer and design team.

Source: https://www.asce.org/question-of-ethics-articles/jan-2007/

Note that a lot of information has been subtracted to keep the block quote brief, but with the full analysis you really get a sense of the perfect storm of assumptions and miscommunications which had to align in order for mechanical failure to be fully realized.

I like this one personally. It's always better using real world examples to get real world points across. — Shane Andrie, Jun 06 '18 at 15:22

score 28 · Answer 5 · edited Jun 05 '18 at 19:18

28

I think what op is describing best corresponds to Swiss Cheese security: https://en.wikipedia.org/wiki/Swiss_cheese_model

The Swiss cheese model of accident causation illustrates that, although many layers of defense lie between hazards and accidents, there are flaws in each layer that, if aligned, can allow the accident to occur.

edited Jun 05 '18 at 19:18

AndrolGenhald

15,436
5
45
50

answered Jun 05 '18 at 19:17

holmesmalone

281
2
2

This actually provides an excellent *visual* to go with the accepted answer. The main picture in the wiki article shows how this could apply to many layers, instead of just two. – Xodarap777 Jun 10 '18 at 02:40

score 12 · Answer 6 · edited Jun 16 '20 at 09:49

One thing you should note in your talk is that catastrophic events rarely have a sole cause. Instead, they're most often a string of failures that came together to produce disaster. One such software-related disaster was the failure of Knight Capital Group. While there were several failure points that contributed, the software end fits what you describe (emphasis mine)

The update to SMARS was intended to replace old, unused code referred to as “Power Peg” – functionality that Knight hadn’t used in 8-years (why code that had been dead for 8-years was still present in the code base is a mystery, but that’s not the point). The code that that was updated repurposed an old flag that was used to activate the Power Peg functionality. The code was thoroughly tested and proven to work correctly and reliably. What could possibly go wrong?

Due to a failed rollout (onto the New York Stock Exchange), one of the servers still had the original code executing. So guess what happens when you repurpose code that was originally meant to do something else entirely...

Its important to understand what the “dead” Power Peg code was meant to do. This functionality was meant to count the shares bought/sold against a parent order as child orders were executed. Power Peg would instruct the the system to stop routing child orders once the parent order was fulfilled.

When the Power Peg flag on the eighth server was activated the Power Peg functionality began routing child orders for execution, but wasn’t tracking the amount of shares against the parent order – somewhat like an endless loop.

Imagine what would happen if you had a system capable of sending automated, high-speed orders into the market without any tracking to see if enough orders had been executed. Yes, it was that bad.

In the first 45-minutes the market was open the Power Peg code received and processed 212 parent orders. As a result SMARS sent millions of child orders into the market resulting in 4 million transactions against 154 stocks for more than 397 million shares. For you stock market junkies this meant the Knight assumed approximately $3.5 billion net long positions in 80 stocks and $3.15 billion net short positions in 74 stocks. In laymen’s terms, Knight Capital Group realized a $460 million loss in 45-minutes. Remember, Knight only has $365 million in cash and equivalents. In 45-minutes Knight went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.

So Programmer A wrote a function with an argument for "Power Peg", and Programmer B removed that code to make that same argument stand for SMARS. A single failed rollout and... disaster.

score 11 · Answer 7 · edited Jun 05 '18 at 16:19

11

It sounds like what you're describing is "synergism". Three different agents who have minor effects alone, but when combined are more than the sum of their parts.

https://en.wikipedia.org/wiki/Synergy

If you want an analogy, you might look at combining two drugs like a barbiturate and a benzodiazepine (or I believe, alcohol). This is likely the best-known synergy I know of.

The linked wikipedia article has several other examples that may prove useful.

edited Jun 05 '18 at 16:19

schroeder

123,438
55
284
319

answered Jun 05 '18 at 16:10

Steve Sether

21,480
8
50
76

1

In addition, I think the better concept is "[emergence](https://en.wikipedia.org/wiki/Emergence)" because the effects of each agent are, in combination, contrary. – schroeder Jun 05 '18 at 16:26
@schroeder Emergence is OK, and I did indeed think of that too. It's not quite as close since emergence can involve the same entity interacting with itself, like a termite colony, or ice crystals. – Steve Sether Jun 05 '18 at 19:29

taswyn · Answer 8 · 2018-06-08T17:13:19.730

The Capture of Château Gaillard

Castle Gaillard was considered a truly state of the art, strongly fortified castle (it's literally in its name, which translates roughly as "strong castle"), commissioned and designed originally by King Richard of England. A number of wider fortifications beyond the castle were also used to try to prevent the French army from taking it, which the army in turn circumvented through various means, including "simply" filling a defensive ditch, making a bridge of boats across the Seine river (with floating fortifications to help keep the avenue of attack persistent), and tunneling the outer wall.

It fell though, in the end, after all of the other defensive measures and walls were breached, because of a later choice in design resulting in two different methods employed to make piercings in the second curtain wall: in addition to the retractable bridge and main fortified gate from the first to second bailey, the castle's newly added chapel had windows made which pierced that bailey's wall at merely 3 meters high.

Richard the Lionheart was acclaimed for his cutting edge understanding of castle engineering. After his death, though, his brother John Lackland had a large chapel constructed in the castle's second bailey, and decreed that it should have south facing windows. Presumably due to the height of the curtain wall, the windows were made to pierce the wall itself to bring in light.

Ideally, these south facing windows never would have been an issue: they were too small to provide an adequate means to take the castle directly with a larger force. But after the outer, first bailey fell to engineering (tunneling/mining), the French army exploited the chapel windows in order to enter a small force into the second bailey, enough to take the gates from the inside (some sources claim they set fire to the chapel to cause confusion) and open them, letting the main army then enter from the first bailey.

While the castle was a feat of defensive engineering with a revolutionary outer curtain wall with a series of arcs to better deflect and withstand projectiles, multiple ditches, and a divided design that saw the first bailey entirely isolated from the second and third baileys, the fact that the outer curtain wall of the second bailey was pierced both for an entrance (via a removable bridge from the first bailey) and for light (the chapel windows) became a critical flaw only exploitable due to both openings, created for different purposes and by different designers, being present. The third bailey fell quickly due to having been constructed with thinner walls (most likely as a necessary corner to cut to meet the highly accelerated construction schedule, with an assumption that the outer baileys would never both fall).

The more points of exposure you create, even ones that seem benign or of a controllable risk under normal circumstances, the higher the potential that something you've done will allow for circumventing other security measures. The more openings you create through security measures, even openings that are behind other security measures, the more likely someone will simply bypass your security on your main gate and find a different way in. Assuming that it's safe to cut corners with security on something behind other security measures means that if those front facing measures ever fail or are circumvented, attackers will have an easy time gaining their final objective.

Defense in depth (often called the castle strategy) requires approaching the concept of security from the point of view that every other measure beyond the one being considered may be defeated, rather than relying on them not failing, and fortifying with that perspective.

Castles can and have fallen simply because someone got lazy or otherwise made a mistake about security measures that were behind outer walls, and unwittingly provided a secondary attack surface that was less secure than the primary one (the main gates) for a given ring (wall) of security. The more holes you put in a wall, the more likely someone is going to make a mistake on how those holes are gated, and the more likely that is to be overlooked. People make mistakes. The more places you have to look for a mistake in, the more likely one is to be missed. A dedicated attacker will probe every avenue of attack made available to them, even the ones you aren't cognizant of yourself, or ones you haven't conceived of as being vulnerabilities.

While there is a rendition of the full story of the siege on wikipedia, it inaccurately (apparently apocryphally, as it's a common reference, although some sources apparently claim it may have been deliberate misinformation by the French to avoid admitting to desecrating a chapel) refers to the breach of the second bailey as having been due to a lavatory chute. The actual history is available on Histoire Normandie in french, which this answer heavily references now.

This is a good story but doesn't speak much of coordination (between whoever made or designed that chute and whoever made the rest). — Nemo, Jun 07 '18 at 10:54
@Nemo That's entirely fair, and **I'm** *also* curious enough about the specifics that I'll try seeing if I can't dig up any related information. Effectively, though, much like software, it's easy to tell this as a story of two different methods of having a hole in a wall, where the security of one was avoidable by attackers because of the difference in implementation. The more holes, particularly of different construction, you put in a wall, the more chances attackers will find a critical vulnerability in one of them, and the more things you have to keep in mind/monitor/audit — taswyn, Jun 08 '18 at 15:43
that's a good point, which could make it easier for the OP to use this story as analogy. — Nemo, Jun 08 '18 at 16:10
@Nemo well on further reading, I found out that the story I've always heard about it is actually presumed inaccurate by historians, and the real story not only gets more interesting as a security lesson but also involves two different designers (Richard for the original design, John's chapel being the design addition that allowed the cascading vulnerability). I may come back and edit this again for clarity or just rewrite it entirely, as retroactively editing in the newer version doesn't read as cleanly in parts, but I think it fits even better now (not perfectly, but certainly conceptually) — taswyn, Jun 08 '18 at 17:10
Can you fix the wikipedia page with links to your references? — mattdm, Jun 11 '18 at 14:43

score 6 · Answer 9 · answered Jun 09 '18 at 03:28

Here's a simple analogy that applies in the real-world and should be easy for students of any tech level to understand. As a bonus, it can work for middle- and high-school students as well.

There are two prevailing types of bicycle locks:

Cords/Chains

and

U-Locks

Each of these locks can be defeated using a simple tool.

A cord or a chain can be cut with heavy-duty wire clippers.
A U-lock can be forced open with a crowbar.

However, because clippers and crowbars are both bulky items, bicycle thieves will generally only carry one of them at a time. So, if you have a cord lock, you're protected from thieves with crowbars, and if you have a U-lock, you're protected from thieves with clippers.

But how can you protect yourself from both? By locking your bicycle using both a cord AND a U-lock! That way, thieves must be carrying both tools in order to steal your bike.

However, this only works for security if both the cord and the u-lock are attached separately to your bicycle.

In Moodle's case, the cord and the U-lock are attached to each other (with one looping around the bike and one looping around the object you are attaching it to) , meaning that the thief only has to be able to break one of them in order to steal your bike! So, instead of doubling your security by carrying two different locks, you've cut your security in half, since the thief only needs to break through one of them to get to your bike.

In the moodle crack, you needed to exploit all 3, one to pass the exploit to another. — schroeder, Jun 11 '18 at 11:33

score 3 · Answer 10 · answered Jun 05 '18 at 20:21

You just bought an older house. It has been modified over time by lots of different individuals.

You've got a safe in your basement. There's two doors you must go through to get to the basement. You wanted the safe to be extra safe, so when buying the house, you were sure to add another door between it and the outside world, so now there are three doors! Sounds pretty good.

Unfortunately, the door was manufactured by a low budget firm who didn't really follow any sort of standard and who wasn't held accountable to actually build secure doors. You just rambled down to the local building supplier and bought the door. Even worse, the other two doors were similarly chosen - haphazardly, not according to any larger design.

Unfortunately, there are easily-available ways in which any of the three styles of door may be breached. Hence, even though there are three doors, it's trivial for a skilled robber to get to your safe and steal your valuables. Sure, a casual theft-of-opportunity is prevented - the rascally neighborhood kid isn't going to make off with your riches, but the determined thief will find it easy.

Compare that to your neighbor, who not only used better quality doors, but who hired a contractor to build his house - a contractor who understands home safety. That contractor knew how to select doors that were hard to defeat - or, even, doors for which no known defeat method existed. He knew how to coordinate between the different sub-contractors who were delivering and installing the doors. He knew how to check that the doors were built correctly. Because of this oversight and coordination, your neighbor's safe will not be as easily plundered as yours.

score 2 · Answer 11 · answered Jun 06 '18 at 05:51

Here is an analogy inspired by the Trojan Horse.

In the wake of the original Horse, the city has tried to harden itself against that type of attack. But inside the city, large vehicles need to move around to deliver goods. So:

Step 1: Smuggle the pieces of the new "Horse" (maybe it's a Cow this time to be less suspicious) into the city. One at a time the wooden pieces pose no threat so they are not questioned. However, you can't smuggle weapons in at this point.

Step 2: There is a craftsman in the city who builds wooden vehicles. If the pieces turns up in his yard he will assemble them. Afterwards one person can conceal himself within the Cow.

Step 3: On festival day the Cow is paraded around the city along with other entertainments. With everyone drunk, nobody will notice the occupant of the cow climbing on top of it and jumping from there onto the city wall.

Step 4: Normally you need to climb the city walls via a guardhouse, and only guards would be allowed through. So when the person turns up at the gatehouse and tells the gatekeeper his shift is over, the gatekeeper believes him (he wants to join the revelry anyway). Now with a compromised gatekeeper you can keep the city gate wide open and let in an army.

To specifically address the issue of "different coders implementing the same feature in different ways", consider that there are two ways to bring items into the city. One is small items with little scrutiny, the other is large items with a lot of scrutiny. The problem is nobody thought about small items being brought into the city and then being assembled into a larger item once there.

This left me a bit confused. How does the occupant of the cow get into the city so they can participate in steps 2 and 3? Getting the enemy into the city in the first place was the point of the original horse, after all. As the security flaw is just some rather unspecific unattentiveness of various people at various points, it seems like the entire part about smuggling in parts to build a cow is just there to justify the connection of this story to the Trojan Horse. — O. R. Mapper, Jun 08 '18 at 16:56

score 1 · Answer 12 · answered Jun 05 '18 at 20:14

1

I'm not sure this fits completely and it might be too dark for the audience but one thing that came to mind was the intelligence failures around the 911 attack due to different agencies not communicating or sharing information.

answered Jun 05 '18 at 20:14

JimmyJames

2,956
2
16
25

score 1 · Answer 13 · edited Jun 06 '18 at 16:15

To me, it just looks like someone wrote "unserialize" in their program without thinking very hard about the implications. There was nothing wrong with the first version.

Perhaps the deeper design and methodology problem was that the contract of that part which was re-implemented wasn't documented and tested well.

I think that it's probably more entertaining for a noob audience (or any audience really) to peek into the funny things that tend to happen when someone uses serialization without utmost care. Especially in an interpreted or very dynamic language. Of course for that presentation to work you need to explain a little bit of technical stuff, but such is the price of an awake audience.

score 1 · Answer 14 · answered Jun 08 '18 at 01:52

For a non technical audience:

You have three doors, each hooked up to a central alarm system.

The security manual for door 1 says:

if the alarm goes off:

Turn off the alarm
Test the door
Fix any problems found
Contact building management if any problems could not be fixed.

The security manual for door 2 says:

if the alarm goes off:

Wait 30 seconds to see if the alarm resets itself
If the alarm is still going
- test the door
- Fix any problems found
- Contact building management if any problems could not be fixed.

The security manual for door 3 says:

if the alarm goes off:

Wait 30 seconds to see if the alarm resets itself.
If the alarm is still going
- test the door
- fix any problems found
Contact building management to let them know what happened.

You are now in a position where one security guard can turn off an alarm before the other security guard realizes there is a problem they need to address.

Management gets some notifications of problems and false alarms and doesn't realize that other alarms are being ignored.

score 1 · Answer 15 · answered Jun 08 '18 at 09:31

Without delving too deeply into the linked article. this looks like a problem where the underlying assumptions weren't common between the people working on it.

Essentially, the system was originally set up to have a list of "things users are allowed to do" that you can ask for. These are clearly defined and since you just tell the system to do them and they're in and of themselves secure, no problem, you include your security password as authentication.

The problem came up when a later developer added the ability to update your account data, including your security privileges. Meaning that you can use this new system to effectively bootstrap your privileges if you know what to ask for, because the system for setting the data isn't very secure, that was the point of having an "Ask, not Order" approach in the first place.

Essentially, normally you ask the system to do things for you and it's allowed to tell you "No" if you don't have authority.

But a later update by someone who didn't think about why it was set up this way added the ability to ask the system to increase your privileges in a way it can't actually refuse.

Imagine you want to go to a private party, but you're not on the guest list. The security guy won't let you in. But if you tell the organiser you're invited, they'll put you on the list without questioning it.

score 1 · Answer 16 · answered Jun 08 '18 at 11:37

1

If you have the same door installed exactly the same way by the same carpenter on all entrances to a building, you only have to thoroughly test one (try to kick it down, try to force the lock, etc) and then you can just take a quick look at the others to be sure they are the same. If you have a bunch of carpenters and some use different doors, you have to thoroughly test all of them.

answered Jun 08 '18 at 11:37

AndyB

19
1

@JoshuaJones No kidding - the question was "Good analogy needed..." so answers would be analogies, I did not explain how it applies - is that your complaint? If it's a good analogy, you don't have to explain how it applies. Good analogies are also short IMO. I think the long explanations with diagrams would only further confuse the audience. – AndyB Jun 11 '18 at 12:15
@JoshuaJones - The OP wold be correct. if it requires explanation, it's a bad analogy. – AndyB Jun 12 '18 at 13:10

score 1 · Answer 17 · answered Jun 08 '18 at 14:56

1

How about Apollo 13? The carbon-dioxide scrubbers for the command module and the lunar lander were created by different companies. One was square, one was round.

See the documentary with Tom Hanks for more details :-)

answered Jun 08 '18 at 14:56

Neil

119
4

score 0 · Answer 18 · edited Jun 16 '20 at 09:49

MTBF, or "failures" in the 6σ sense

While these are completely different things, they share some common properties which apply to your problem.

The Mean Time Between Failures is a commonly used measure to assess the reliability of things. If, for example, you buy something like a car or a harddisk, then that particular thing may work without any problems until the day you die. But on the average, it will eventually encounter failure, after an average time X. That's the MTBF.

Six Sigma (6σ) is basically the same kind of thing, except for the most part you do not deal with things but with processes, and you analyze (and optimize for) not the operational time, but the number of "opportunities", which may be... whatever, and failures, which may, again, be... whatever. This can be about producing something, delivering in time, or just answering a phone correctly.

In a more concrete example, if e.g. your shoe factory produces one million sneakers per month, you are trying to achieve that no more than 3 of them (ideally zero) come out with the wrong color or without shoelaces, and cannot be sold.

How does that apply here?

The MTBF has a well-known implication, it goes down proportionally with the number of units used going up. Which means that although it is very unlikely that your cellphone will explode in your pocket during two to three years of typical usage, it is practically guaranteed to happen to someone if ten million people own one (that was the reason for e.g. Samsung's infamous recall campaign / PR nightmare a year or so ago -- it's not like you were really in danger if you owned one).

Similarly, looking at it from the 6σ angle, if your shoe factory produces not just one million sneakers, but one billion, then you will have 3,000 defect pairs of shoes, not 3.

A few years ago, RAID-5 was discouraged from being used. How so, it provides extra data safety, doesn't it? It so happens that harddisks have a very, very small chance of corrupting a sector so it's unrecoverable. That never happens... well, almost.
But if your disks are large enough (as modern disks are), with many sectors, and you have several of them bunched together, you are basically guaranteed to have it happen during a re-sync operation, i.e. at the precise time when you just don't need that to happen because you're already down one disk. Plus, you have the chance of a second disk catastrophically failing half way down the re-sync. Which also never happens... well, almost. The more disks, the more likely it is to happen.

The same applies for re-implementing the same functionality in a software many times. Each implementation (every function, not just the ones that duplicate functionality) is one "opportunity", or the equivalent of a harddisk. More functions, via duplicating functionality, means more opportunities for failure. Plus, more code to review.

While your programmers are mostly working error-free (well, hopefully), there is always a tiny chance for them making a false assumption, or an outright mistake. The more opportunities given, the more likely it is to happen.

score 0 · Answer 19 · answered Jun 11 '18 at 13:58

An alternative answer to my original, having read up on the topic and had some more time to think about it.

Imagine a company with many isolated offices. Each department always does exactly what it's told to do via form-letters.

To ensure the letters aren't being sent maliciously, you have to list a password in the letterhead, The department will ask a central office in the basement whether the password they received is allowed to tell them what to do.
if it's true, the instructions are followed, otherwise the letter is thrown away.

The Central office also handles all the personnel data for the company, The central office doesn't have its own mailing address for security reasons, requiring any changes to be made in person.

One day, the management get tired of having to trek down to the basement to update the employee data and simply provided the central office with their own mailing address and form-letters. This allows them to easily make changes from their own office.

They're not stupid, they made sure that the central office form-letters don't include anything about changing passwords.

The quirk is that all forms are validated in a common system by looking for particular letterhead.

So when an enterprising individual creates a new form that matches the company guidelines but includes a change-password field, the central office happily accepts it and makes their changes.

Suddenly that person can change their permissions to allow them to send valid letters anywhere in the company and gains full control over the system.

I'd say this is a pretty solid analogy for both the system and what happened to it in Moodle example.

Substitute Letters for Ajax, and SQL injection for the altered version of the Form letter with the password field.

Good analogy needed: Sec issues due to different coders implementing the same features in different ways for the same app

19 Answers19

The Capture of Château Gaillard

MTBF, or "failures" in the 6σ sense

How does that apply here?