Joe Williams home

In Benoit Mandelbrot’s seminal The Misbehavior of Markets suggest market behavior has five rules.

The book focuses on the behavior of financial markets, and using the rules can reduce society and an individual’s “financial vulnerability”. Reading through them I can’t help but to identify how they can be applied to web operations. Below I paraphrase the rules from the book and rework them with a focus on teams and the services they run, lastly introducing the idea of operational vulnerability.

Rule 1 - Markets are risky Systems are unstable

“Extreme price swings are the norm in financial markets - not aberrations that can be ignored. Price movements do not follow the well-mannered bell curve assumed by modern finance; they follow a more violent curve that makes an investor’s ride much bumpier.”

Teams building software that perform important, valuable services are by their nature unstable, that is, constantly learning, adapting and acting (i.e. they are solving problems of organized complexity). This change and adaptation can make for a bumpier ride for the individuals on the team and the services they run.

Teams and services that do nothing are stable, never requiring the stress of adaptation. This makes for a smooth ride but unfortunately there is no pay off for building teams and running services that do nothing.

Rule 2 - Trouble runs in streaks Failures come in waves

“Market turbulence tends to cluster. This is no surprise to an experienced trader. … They know that when a market opens choppily, it may well continue that way. They know that a wild Tuesday may well be followed by a wilder Wednesday.”

Errors, failures and outages tend to cluster. This is no surprise to an experienced manager or service operator. They know that when a service or team begins to have problems it may well continue that way. An outage Tuesday can lead to a cascade of failures Wednesday. A mismanaged project one quarter can lead to missed objectives in subsequent quarters.

Rule 3 - Markets have a personality Systems have a personality

“Prices are not driven solely by real-world events, news, and people. When investors, speculators, industrialists, and bankers come together in a real marketplace, a special, new kind of dynamic emerges – greater than, and different from the sum of the parts. … In substantial part, prices are determined by *endogenous effects peculiar to the inner workings of the markets themselves, rather than solely by the exogenous action of outside events. Moreover, this internal market mechanism is remarkably durable.”*

Behavior of an internet service is not driven solely by real-world events and people. When management, sales, developers, security and operators come together to build a product, a special, new kind of dynamic emerges – greater than, and different from the sum of the parts. In substantial part, system behavior, in its broadest sense, from organization down to an individual team or service, is determined by endogenous effects peculiar to the inner workings of the organization, team or service itself. This internal behavior is remarkably durable regardless of the purpose, type or scale, persisting through organizational tumult and refactoring.

Rule 4 - Markets mislead Systems mislead

“Patterns are the fool’s gold of financial markets. The power of chance suffices to create spurious patterns and pseudo-cycles that, for all the world, appear predictable and bankable. But a financial market is especially prone to such statistical mirages.”

Patterns are the fools gold of observability. The power of chance suffices to create spurious patterns and pseudo-cycles that, for all the world, appear predictable and repeatable. Organizations, teams and individuals are especially prone to such statistical mirages. The size, shape and frequency of requests to one service isn’t identical to the next. The mitigation for a problem on one service does not work on the next. Building a product as a part of one team is nothing like building a similar product on another. It’s easy to trick ourselves into seeing a pattern when there is none. A given pattern may be helpful but it isn’t always repeatable nor applicable in every situation.

Rule 5 - Market time is relative System time is relative

“There is what one may call the relativity of time in financial markets. … markets are operating on their own “trading time” – quite distinct from their linear “clock time” … This trading time speeds up the clock periods of high volatility, and slows down in periods of stability.”

There is what one may call the relativity of time in teams and services. Teams and services operate in their own time – quite distinct from their linear “clock time”. “Team time” speeds up in times of organizational volatility, and slows down in periods of stability. “Service time” speeds up during outages and incidents and slows down in periods of stability.

Operational vulnerability

Teams and services are eternally linked. Services don’t get built nor run without a team of individuals to organize and do the work. Teams don’t exist without a purpose, that purpose is to build and run a service or product. Understanding the rules that teams and services play by can help us to become more situationally aware. When we act on that awareness we can adapt and improve our response to incidents, understand the personality (i.e. emergent behavior) of the teams we belong to and the services we run, be less susceptible to being misled by blindly following patterns, and use our intuition to estimate the severity of the current situation, we reduce the risk of normal, everyday operations.

As I define it, operational vulnerability it is the risk within the team and/or service, that when left unchecked tends to create only more risk, leading to failure, outages and missed opportunities. Like in a financial market, operational vulnerability provides a spectrum of risk and reward. For instance creating a new product inherently introduces risk into the system but provides more value to the organization. As humans in these systems our first, perhaps only, job is to balance this tension.

High operational vulnerability tends to manifest itself, similar to the “this is fine” comic, as stressed out teams trying to keep the lights on in a burning house. When a team or service are operationally vulnerable an otherwise small mishap can snowball to a cluster of failures. The burning house could be an existing service that is crumbling under its own weight or a poor performing organization that isn’t giving the team the support it needs. Either way the risk of failure and missed opportunities for the team or service are increasing and mitigations are needed to bring back a healthy balance.

Mild operational vulnerability tends to mean that a team or service can provide value while adapting and being resilient to failure. For instance, a service maintains its availability during a DDOS while preventing impact to downstream services. A team delivering high quality code in the face of personnel or organizational changes. Like a circuit breaker in a house preventing electrical fire, problems tend to be remediated before they cascade throughout the system and become out of control. There are risks but not so much as to dwarf the value generated by building and maintaining the system in the first place.

No operational vulnerability means the team or service quite literally are doing nothing. All actions introduce risk and operational vulnerability. Without introducing risk we cannot build anything of value.

Operational vulnerability scales, from the single line of code to the entire organization. For the individual this could mean introducing technical debt to ship a product, deliberately increasing operational vulnerability, while adding more testing and validation of that code, decreasing operational vulnerability. For a manager this could mean finding ways to increase development velocity, while shielding a team from organizational politics so they can focus on getting work done. For leadership it could mean taking on a large, demanding customer while creating a culture of diversity, inclusion and support.

As humans, at each layer in these systems, we can use the five rules we can identify desirable system behaviors, balancing risk with reward. Increasing operational vulnerability when the time is right and creating more opportunities for value or decreasing it when the risk is too great to stomach, creating stability and resilience in the system.

Fork me on GitHub