34 Replies to “How Complex Systems Fail”

  1. Could also be called “Operating Room porn”. It’s a good article, basically recapitulating what those of us who’ve worked for a long time in complex work environments have known all along. I always emphasize to my trainees the importance of configuring their work environment and procedures so as to minimize the risk of error, making error easier to spot, and making recovery from error easier and faster. Dealing with toxic members of the work environment is the latest hot topic because of the compromising effect they have on patient safety. Slowly some headway is being made.
    I also encourage my trainees to watch the series “Mayday” on discovery channel. While most of the scenarios apply to transportation, especially aviation accidents, the lessons apply to numerous work environments.

  2. “Complex Systems”.
    In addition to “interesting systems (e.g. transportation, healthcare, power generation)” most of the points are applicable to social constructs (e.g. government, organized religion) and also (and most interestingly to me)to human beings individually. After all, what are these things if not complex systems.

  3. Good, but would be better in a Ted Talk format… performed by Loki … after a couple pots of coffee.

  4. As a retired manufacturing engineer all I can say is that this is the best analysis of systems I have seen. And I started in the business in the 60’s.

  5. Well, it takes an MD to tell us 60 years later The relationship of complex systems to failure. That long ago, MSA and Inland Steel got together to figure out a way to stop serious injury. They produced a manual to describe, in general, the precursors to accident and serious injury. This was long before OSHA was a gleam in the progressive eye. Put simply, all, even the most minor accidents, had to be immediately reported and documented. One of the purposes of this was not to provide bureaucrats power, but to determine the statistics of serious accidents. Inland was extremely far ahead of the rest of that industry in statistical and probability analysis. Simple fact that would bring home their views to show their workers the urgency of the system, was that every 8 times you bruised or pinched a finger, it would be cut off. This information was used in a constant battle against pinch points.
    In those years, it was not uncommon for dozens of deaths to occur in steel mills. By the way, this was all started by Andrew Carnegie, personally, in response to the 44 steel mill deaths in Pittsburgh in a few months of one year in the 20s.
    Of course, OSHA, when it was started, either took this all over or copied it under their names. There is no keeping credit from government employees. I am happy that this MD has finally come to terms with Mr. Murphy.

  6. Of course, over the years engineers would give scientific explanations to all this stuff that goes wrong:
    Shit always happens.
    It’s never easy.
    Things will always go wrong, at the worst time.
    The squeaky wheel always gets the most grease.
    One more quarter turn on the nut.
    If you hear a strange noise in the engine, run it faster until you can find the source (or the source finds you).
    (Similarly) No plan of battle ever survives contact with the enemy.
    Just hit it with the hammer right there >.

  7. I’ve been around long enough to realise that there are two major theories to watch:
    1. The Peter Principle: in an organizational hierarchy, every employee will rise or get promoted to his or her level of incompetence.
    and
    2. Murhpy’s Law: Anything that can go wrong, will go wrong.
    Leading to Peter Murphy’s Law. Works every time!!

  8. Reminds me of my time working in oil refineries… The Fluid Catalytic Cracker, an enormous beast of refining technology, was always running very close to complete and utter disaster, and with less than one minute’s inattention by operators could end up over that line. It never happened, because robust and multileveled systems were in place. Operators who could not keep with the program were either fired or shifted to less critical operations.

  9. Agree with the above comments.
    I guess we all think complex systems are so technologically advanced and so new we always feel safe.
    I am reminded of the fantastic new vessel, so wonderfully constructed, the most safe vessel ever made by man.
    Maiden Voyage-
    “Even God Would Not Be Able To Sink This Vessel.”

    Although Titanic had advanced safety features such as watertight compartments and remotely activated watertight doors, there were not enough lifeboats to accommodate all of those aboard due to outdated maritime safety regulations.
    Titanic only carried enough lifeboats for 1,178 people—slightly more than half of the number on board, and one-third her total capacity.
    The person that said God would/could not sink Titanic screamed ‘OH SHIT’ as it sank in Iceberg waters created by God..

  10. I have read a lot of coroner reports on diving and ship fatalities. there is always a chain of events involved. As rescue divers it was part of our job to recognize a chain forming and interrupt it before things went bad.

  11. 7) Post-accident attribution accident to a ‘root cause’ is fundamentally wrong…
    No isolation of the ‘root cause’ of an accident is possible. The evaluations based on such reasoning as ‘root cause’ do not reflect a technical understanding of the nature of failure but rather the social, cultural need to blame specific, localized forces or events for outcomes.”
    Justin Trudeau: Take note!

  12. as a service technician I found that 100% of equipment failures had a human component, and about 75% were human only failures that caused the equipment to fail. Human EGOs being what they are dictated that “reports” always needed to be clouded with BS so as not to lose the customer for future sales. So, the weak link is human, whether as operators or on the design end of things, and/or throw I a sales idiot, some engineers and the job just got more interesting:-))

  13. Thanks for this Kate – as a production process engineer/auditor it is priceless insight built from failure analysis.
    Great Xmas gift for any engineer.

  14. Intelligent engineers practice that all the time, and never have serious injuries.
    In modern bureaucracies people get promoted way above their level of incompetence where they sit on their fat asses avoiding anything that might jeopardize their golden handshake pension.

  15. Mike, you have summed it up, any more analysis is not necessary. I was wondering when someone would say what you did. I also say what can go wrong will go wrong no matter how many systems are in place.

  16. This reads like a doctor has just discovered ‘flight safety’. Despite great resistance, I do know of a doctor who has taken his flying experience into the operating room by using ‘CRM’ (Cockpit Resource Management). This is, simply put, where anyone observing a potential problem may raise the point; doctors in operating rooms tend to have large egos (like left-seat pilots) and don’t like underlings pointing out errors or oversights – but it is a necessary step in avoiding human error. Likewise, every aviator knows what a checklist is and most understand tool control, but these have, to my knowledge, only been recently introduced into the operating room. How many patients got stitched up with a sponge inside?

  17. Very nice article. The other day in the thread about the Sydney shooting I opined that after-action reports on dynamic entry shootings are always a waste of time. This paper covers that beautifully. If a dynamic entry is required, its already too late.
    The paper also covers -exactly- why centralized command and control systems over a fairly small size will and must always fail. Either decisions and actions are distributed, or the thing will fly apart. Because it has to.
    Pity we’ve had 200 years of ever-increasing propaganda on the joys of top-down governance in public life and in business. Think of all the freedom that would have been preserved and all the money we’d have made if this shibboleth had been put to the sword of reason as it should have been.

  18. I remember someone saying the most difficult thing in the world to design was a flat end screwdriver. as you could not possible anticipate all the things it would be used for.

  19. Great article, truly concise statement of the obvious.
    As noted by The Phantom, this is why big bureaucracy will always crash and burn.
    The chaos of complex systems is beyond the skill set of most all who desire to rule and regulate us all.
    But the religion of modern safety will never acknowledge this, hence there is always a guilty causing agent and more restrictions, impositions and “safeguards” must be implemented.
    Workers compensation should be forced to read that article as many times as necessary for their staff to understand, but never fear they will just up their rate of extortion while denying the obvious.
    Where government creates a pool of money, to be paid out under set circumstances, there will always be persons who will adjust their lifestyle to conform to those conditions.

  20. At our regional community hospital the top surgeon is also a pilot and widely respected. He introduced the ‘CRM’ approach to his surgical team many years ago and since then it’s become the standard policy here. We have some of the best surgical care in the country, and doctors want to come here to practice and raise families.

  21. I find myself nodding in agreement with the majority of comments. As Colin mentioned, there are a chain of events that lead to catastrophic failures and accidents. The cockpit (or crew as they now call it) resource management courses bring awareness to identifying and breaking the links in the chain of events that lead to accidents. The Dryden Fokker accident investigation brought about a major shift in thinking in Canadian aviation. Having flown aircraft that normally have redundant systems, with critical systems sometimes having triple redundancy it is easy to become complacent while performing complex tasks.
    Modern day safety management systems, CRM courses and decision making courses all have some overlap with the above linked paper. A good read,
    Kate, Thanks for the link.

  22. I’m also a service tech and building systems inspector, and my experience has been the same. Design engineers never accept the blame for their faulty designs, to them it’s always the fault of the equipment manufacturer or the installers. The owners are often at fault for not following maintenance protocols. The trick is to write the reports so the data clearly show where the fault is, without getting personal and bruising anyone’s fragile ego.

  23. “The paper also covers -exactly- why centralized command and control systems over a fairly small size will and must always fail. Either decisions and actions are distributed, or the thing will fly apart. Because it has to.” And therein lies the central problem with a centralized control economy. And the essential strength of a pluralistic society.
    In Murphy I trust. The God of Abraham is sometimes erratic, but Murphy ALWAYS delivers!

  24. Thanks for the link Kate; I’ve passed it on to my professional friends.
    Great holiday ‘gift’.

  25. I’v been lucky with engineers over time, worked with about 20 PHDs and had to only ever “slap” one down because of his arrogance, and he’s the only one that probably wishes he had never met me. He is also the one who is the most prominent in his chosen field. But that was many years ago when I worked in plastics:-))
    Generally I’v had good luck with well educated ppl, it’s the lessors that were problematic, especially managers at all levels. they hated being told what was wrong, as they didn’t like to admit they had missed the obvious :-))

  26. This paper is essentially a guide book for failure mode analysis and a sales pitch for lean manufacturing and minimalist design – these findings validate the old KISS principle – the less complicated a design/system/technology statistically the fewer opportunities for error/failures.
    You can read a great deal of truth into applying their findings about bloated and complex systems never having catastrophic failure , just endless sub-system error and failure – none enough to kill the over all function of the larger system, but when this happens routinely (as it is certain to do in large complex systems) the end result is the overall performance of the larger system is degraded to a point that makes its complexity redundant to its end function – IOW, a smaller less complex, more efficient easy to operate system would provide the same or better end performance as a large complex system with inherent internal failures – a real recommendation for small government.

  27. You are quite correct and I should point out that doctors as a group make the worst pilots. I presently own and fly one of the famous “Doctor Killers” the Beechcraft Bonanza. I used to fly the CF-104 known as the widow-maker. Canada lost over 100 104’s and 37 pilots, a number of whom were friends. The Germans lost over 200 aircraft and over 100 pilots. I know what you’re thinking; yeah, they’re big.

  28. As a former aviation engineer , “S” rating, the intelligence of some of the men and women from around the world that went into designing building and maintaining these aircraft (most aircraft) is amazing.
    It’s not so much about redundant systems, but more about (for my trade at least) accessibility to do proper maintenance. Designing the aircraft so that you have proper access to drill out certain fasteners so you can remove parts to get access to the other parts for servicing or changing is very impressive (although there just never seems to be enough inspection panels) ei putting a panel on the wing to fuse fairing so you can inspect and if necessary drill out fasteners holding a part that is riveted to the inside of the plane with out having to remove the entire faring is quit genius and shows how much people love their jobs. On the other hand you have the Fokker f-28 !! OMG burn em all!!!
    I now work in the oil n gas industry on the rig site not on the rig I gotta go to the rig every so often and I find allot of overlap their as well from the aviation industry a lot of hobby pilots , directional drilling is the same as flying a plane pretty much.
    Here is something I herd in the oil industry that was common place in aviation , If you wouldn’t stick your D&ck in it don’t put your hand in it. Be careful , look up and live Ect ect . Common sense and self reliance will and are always the best safety policy out there. If you want root causes of accidents and terrorism ask justin for his insight. And vote for him. Cuz you clearly need to be bridled and led around like a weak a$$ pu$$y!!

  29. It’s not so much about redundant systems, but more about (for my trade at least) accessibility to do proper maintenance.
    In my trade it’s not just accessibility, but ensuring that the maintenance actually gets done. The aircraft industry ‘paper trail’ is better than the building industry’s for good reason. Buildings won’t generally fall out of the sky if maintenance doesn’t get done, they just become unhealthy.
    The building industry has made significant advances in workplace safety in the past decade. Not just hard hats, but safety shoes, hi-viz vests, proper rigging when working at heights, etc. Accident rates have fallen, but any cost savings have been consumed by the ever-growing WCB bureaucracy.

  30. Agreed John,
    The aviation industry is just the anal retentive its truly abunch of OCD maintenance engineers that get horny when their symmetry checks are with in .010 of an inch, but really would you want anyone else working on your aircraft?
    Do me a favor if anyone (especially Aircraft maintenance engineers A.M.E.) Come into your buisness and ask for a product be it windows installed siding installed, A trailor to be made dont cut corners. they are exteremely anal and can probably ply your trade far better than you. the problem is it will take 4 times as long and you will be bankrupt when they are done but it will be perfect. And trust me thats all thatmatters to A.M.E.’S. lol.

  31. Agreed John,
    The aviation industry is just the anal retentive its truly a bunch of OCD maintenance engineers that get horny when their symmetry checks are with in .010 of an inch, but really would you want anyone else working on your aircraft?
    Do me a favor if anyone (especially Aircraft maintenance engineers A.M.E.) Come into your business and ask for a product be it windows installed siding installed, A trailer to be made don’t cut corners. they are extremely anal and can probably ply your trade far better than you. the problem is it will take 4 times as long and you will be bankrupt when they are done but it will be perfect. And trust me that’s all that matters to A.M.E.’S. lol.

  32. Scott Montgomery said, “The more they overthink the plumbing, the easier it is to stop up the drain.”
    In other words, the more complex something is, the easier it is to gum up the works.
    Always use the KISS (Keep It Simple Stupid) Method.
    The more complex a project is, the more likely its implementation will be delayed and over budget.
    (These principles of system design and implementation have been kicking around in IT for years.)

Navigation