The Ubuntu âCircle of Friendsâ logo.
Depending on the kind of company you work at, itâs either:
If you work at the first place, reach out to me on LinkedIn â I know some people who might want to work with you.
If youâre at the third place, you should probably get out now. Whatever theyâre paying you, or however much the stock might be worth come the IPO, itâs not worth the pain and suffering.
If youâre at the second place, congratulations â youâre at a regular, ordinary workplace that could do with a little better management.
A surprisingly great deal.
Whenever thereâs a security incident, there should be an investigation as to its cause.
Clearly the cause is always human error. Machines donât make mistakes, they act in predictable ways â even when they are acting randomly, they can be stochastically modeled, and errors taken into consideration. Your computer behaves like a predictable machine, but at various levels it actually routinely behaves like itâs rolling dice, and there are mechanisms in place to bias those random results towards the predictable answers you expect from it.
Humans, not so much.
Humans make all the mistakes. They choose to continue using parts that are likely to break, because they are past their supported lifecycle; they choose to implement only part of a security mechanism; they forget to finish implementing functionality; they fail to understand the problem at hand; etc, etc.
It always comes back to human error.
Occasionally I will experience these great flashes of inspiration from observing behaviour, and these flashes dramatically affect my way of doing things.
One such was when I attended the weekly incident review board meetings at my employer of the time â a health insurance company.
Once each incident had been resolved and addressed, they were submitted to the incident review board for discussion, so that the company could learn from the cause of the problem, and make sure similar problems were forestalled in future.
These werenât just security incidents, they could be system outages, problems with power supplies, really anything that wasnât quickly fixed as part of normal process.
But the principles I learned there apply just as well to security incident.
The biggest principle I learned was âroot cause analysisâ â that you look beyond the immediate cause of a problem to find what actually caused it in the long view.
At other companies, who canât bear to think that they didnât invent absolutely everything, this is termed differently, for instance, âthe five whysâ (suggesting if you ask âwhy did that happen?â five times, youâll get to the root cause). Other names are possible, but the majority of the English-speaking world knows it as âroot cause analysisâ
This is where I learned that if you believe the answer is that a single humanâs error caused the problem, you donât have the root cause.
Whenever I discuss this with friends, they always say âBut! What about this example, or that?â
You should always ask those questions.
Hereâs some possible individual causes, and some of their associated actual causes:
Bob pulled the wrong lever | Who trained Bob about the levers to pull? Was there documentation? Were the levers labeled? Did anyone assess Bobâs ability to identify the right lever to pull by testing him with scenarios? |
Kate was evil and did a bad thing | Why was Kate allowed to have unsupervised access? Where was the monitoring? Did we hire Kate? Why didnât the background check identify the evil? |
Jeremy told everyone the wrong information | Was Jeremy given the right information? Why was Jeremy able to interpret the information from right to wrong? Should this information have been automatically communicated without going through a Jeremy? Was Jeremy trained in how to transmute information? Why did nobody receiving the information verify it? |
Grace left her laptop in a taxi | Why does Grace have data that we care about losing â on her laptop? Can we disable the laptop remotely? Why does she even have a laptop? What is our general solution for people, who will be people, leaving laptops in a taxi? |
Jane wrote the algorithm with a bug in it | Who reviews Janeâs code? Who tests the code? Is the test automated? Was Jane given adequate training and resources to write the algorithm in the first place? Is this her first time writing an algorithm â did she need help? Who hired Jane for that position â what process did they follow? |
I could go on and on, and I usually do, but itâs important to remember that if you ever find yourself blaming an individual and saying âhuman error caused this faultâ, itâs important to remember that humans, just like machines, are random and only stochastically predictable, and if you want to get predictable results, you have to have a framework that brings that randomness and unpredictability into some form of logical operation.
Many of the questions I asked above are also going to end up with the blame apparently being assigned to an individual â thatâs just a sign that it needs to keep going until you find an organisational fix. Because if all you do is fix individuals, and you hire new individuals and lose old individuals, your organisation itself will never improve.
[Yes, for the pedants, your organisation is made up of individuals, and any organisational fix is embodied in those individuals â so blog about how the organisation can train individuals to make sure that organisational learning is passed on.]
Finally, if youâd like to not use Ubuntu as my âcircle of blameâ logo, thereâs plenty of others out there â for instance, Microsoft Alumni:
Leave a Reply