<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Joe's Blog</title>
 <link href="http://joeandmotorboat.com/atom.xml" rel="self"/>
 <link href="http://joeandmotorboat.com/"/>
 <updated>2026-01-05T18:01:14+00:00</updated>
 <id>http://joeandmotorboat.com/</id>
 <author>
   <name>Joe Williams</name>
   <email>williams.joe@gmail.com</email>
 </author>

 
 <entry>
   <title>Incomplete Thoughts</title>
   <link href="http://joeandmotorboat.com/2026/01/03/incomplete-thoughts"/>
   <updated>2026-01-03T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2026/01/03/incomplete-thoughts</id>
   <content type="html">
&lt;p&gt;Going into 2026 I’ve been trying to figure out how I feel about 2025. Here are some very rough and incomplete thoughts.&lt;/p&gt;

&lt;h3 id=&quot;earth-is-a-good-spot-to-live&quot;&gt;Earth is a good spot to live&lt;/h3&gt;

&lt;p&gt;Humans evolved to live on earth, it’s unlikely anywhere else universe will be more hospitable. If there is a place better than earth it’s probably too far away. We should do everything we can to preserve earth, all alternatives will be worse. Exploration and space travel should still be pursued. Not to find Earth 2 because we’ve ruined our home but because we learn more about the universe, ourselves and how to preserve and prevent further ruination of the earth we have.&lt;/p&gt;

&lt;h4 id=&quot;further-reading&quot;&gt;Further Reading&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://news.harvard.edu/gazette/story/2013/04/not-as-evolved-as-we-think/&quot;&gt;Not as evolved as we think&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.sciencedaily.com/releases/2025/02/250214225042.htm&quot;&gt;Does planetary evolution favor human-like life? Study ups odds we’re not alone&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;human-intellegence-is-irreplaceable&quot;&gt;Human intellegence is irreplaceable&lt;/h3&gt;

&lt;p&gt;Humans exhibit broad intelligence in the mundane, just look how much engineering, money and computing power are needed to write coherent sentences or drive a car. It’s not purely mechanical, knowing how to use a pencil does not make you an author. This “everyday intelligence” is undervalued by individuals and society.&lt;/p&gt;

&lt;h4 id=&quot;further-reading-1&quot;&gt;Further Reading&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.amazon.com/dp/1788730062&quot;&gt;The Eye of the Master: A Social History of Artificial Intelligence&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.technologyreview.com/2025/08/21/1122288/google-gemini-ai-energy/&quot;&gt;In a first, Google has released data on how much energy an AI prompt uses&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://archive.is/pEthw&quot;&gt;How much energy will AI really consume? The good, the bad and the unknown&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;dualism-for-non-dualists&quot;&gt;Dualism for non-dualists&lt;/h3&gt;

&lt;p&gt;No amount of data about me, be it writings, video recordings, brain scans or anything else fed into a computer will be me. It might be a high resolution 2D photograph of me but will never fully replicate nor replace me. To me the mind and body are the same highly integrated system and there is a material difference between what I do and the computer does, both in form and function, using data about me. Perhaps even this photograph of me is conscious but is still not me, it’s something new that may have many similarities. Similarly reproductions of art are not the same as the original. Like the camera and internet before it, AI is a copy machine. Invariably it will change the way we see art and ourselves.&lt;/p&gt;

&lt;h4 id=&quot;further-reading-2&quot;&gt;Further Reading&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.acsa-arch.org/proceedings/Annual%20Meeting%20Proceedings/ACSA.AM.113/ACSA.AM.113.19.pdf&quot;&gt;The Work of Art in the Age of Algorithmic (Re)production&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://crookedtimber.org/2026/01/02/a-note-on-the-threat-to-art-from-ai/&quot;&gt;A note on the threat to art from AI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://web.mit.edu/allanmc/www/benjamin.pdf&quot;&gt;The Work of Art in the Age of Mechanical Reproduction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;scaling&quot;&gt;Scaling&lt;/h3&gt;

&lt;p&gt;Scaling feels like a brute force approach that, like others have said, has diminishing returns. I’m not convinced the authors of Attention Is All You Need for us to bet the world economy on scaling alone, just that we can get pretty far in a wide variety of situations. That said there’s lots of interesting software and hardware innovation that’s happened as a result even if we ignore the wild infrastructure build out happening. Regardless, brute force techniques are a good place to start but inevitably give way to smarter and more advanced approaches. If an LLM comes up with a novel solution to a problem it’s not due to insight, intuition nor understanding, it’s due to having a load of data and effectively trying everything until something works. Not to mention this is entirely based on the corpus of human works built over millennia of hard work, creativity and empathy.&lt;/p&gt;

&lt;h4 id=&quot;further-reading-3&quot;&gt;Further Reading&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://garymarcus.substack.com/p/scaling-is-over-the-bubble-may-be&quot;&gt;Scaling is over, the bubble may be deflating, LLMs still can’t reason, and you can’t trust Sam&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;Attention is all you need&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;ai-people-and-bitcoin-people-are-the-same-people&quot;&gt;AI people and bitcoin people are the same people&lt;/h3&gt;

&lt;p&gt;The AI data center builders’ goal is to “monetize energy” no different than it was when their data centers ran bitcoin mines. Both in terms of energy and society’s data, AI is an extractive industry. AI in it’s current form is useless without all our data. Additionally, this is similar to social media being useless unless we login and post. Often this feels like work I perform to keep the computers and companies running rather than work the computers and companies are doing for me.&lt;/p&gt;

&lt;h4 id=&quot;further-reading-4&quot;&gt;Further Reading&lt;/h4&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.msn.com/en-us/money/markets/bitcoin-miners-thrive-off-a-new-side-hustle-retooling-their-data-centers-for-ai/ar-AA1SVZjT&quot;&gt;Bitcoin miners thrive off a new side hustle: Retooling their data centers for AI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://archive.is/Ch3JL&quot;&gt;Bitcoin Miners Are Finding Promised AI Panacea Can Be Elusive&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content>
 </entry>
 
 <entry>
   <title>New Job Reading List</title>
   <link href="http://joeandmotorboat.com/2023/12/17/new-job-reading-list"/>
   <updated>2023-12-17T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2023/12/17/new-job-reading-list</id>
   <content type="html">&lt;p&gt;I left my &lt;a href=&quot;https://ngrok.com/&quot;&gt;job&lt;/a&gt; in September and I am about to start a &lt;a href=&quot;https://www.fastly.com/&quot;&gt;new one&lt;/a&gt; in January. The last few months I’ve spent most of my time wrangling a toddler, riding bicycles and taking care of household chores and projects. Important work but not geared towards thinking about how a packet gets from place to place in a correct, fault-tolerant and performant way. With a few weeks to go I have given myself an assignment of doing some reading to hopefully learn something new, learn more about the tech my soon-to-be employer has built and generally get the juices flowing. A self-directed “warm up” before the “race” if you will. What follows is that list; some of this is a refresher course and some is purely curiosity. Other parts are to learn something about a self-preceived weak spot in my knowledge (ahem … notice all those MPLS links at the bottom). Wish me luck! We’ll see how far I get during nap time and between diapers.&lt;/p&gt;

&lt;h3 id=&quot;fastly&quot;&gt;Fastly&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fastly.com/blog/turning-a-fast-network-into-a-smart-network-with-autopilot&quot;&gt;Turning a Fast Network into a Smart Network with Autopilot&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fastly.com/blog/how-fastly-protects-customers-from-ddos-including-rapid-reset-attack&quot;&gt;How Fastly Protects its customers from Massive DDoS threats including the novel Rapid Reset attack&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fastly.com/blog/traffic-delivery-reliability-improvements&quot;&gt;Fastly’s Fast Path Failover Technology Improves Deliverability&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.usenix.org/conference/nsdi21/presentation/landa&quot;&gt;Staying Alive: Connection Path Reselection at the Edge&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fastly.com/blog/building-and-scaling-fastly-network-part-1-fighting-fib&quot;&gt;Building and Scaling the Fastly Network, Part 1: Fighting the FIB&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fastly.com/blog/building-and-scaling-fastly-network-part-2-balancing-requests&quot;&gt;Building and Scaling the Fastly Network, Part 2: Balancing Requests&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;ipv6&quot;&gt;IPv6&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://majornetwork.net/2023/12/dhcpv6-relay/&quot;&gt;DHCPv6 Relay&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.apnic.net/2020/06/01/why-is-a-48-the-recommended-minimum-prefix-size-for-routing/&quot;&gt;Why is a /48 the recommended minimum prefix size for routing?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.apnic.net/2022/10/13/ipv6-extension-headers-revisited/&quot;&gt;IPv6 extension headers revisited&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.apnic.net/2023/08/28/behavioural-differences-of-ipv6-subnet-router-anycast-address-implementations/&quot;&gt;Behavioural differences of IPv6 subnet-router anycast address implementations&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.apnic.net/2023/11/17/ipv6-the-dns-and-happy-eyeballs/&quot;&gt;IPv6, the DNS and Happy Eyeballs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;dns&quot;&gt;DNS&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.apnic.net/2023/11/29/dns-at-ietf-118/&quot;&gt;DNS at IETF 118&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.netmeister.org/blog/https-rrs.html&quot;&gt;Use of HTTPS Resource Records&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/rfc9460/&quot;&gt;Service Binding and Parameter Specification via the DNS (SVCB and HTTPS Resource Records)
RFC 9460&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/zirngibl2023svcb.pdf&quot;&gt;A First Look at SVCB and HTTPS DNS Resource Records in the Wild&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2302.11393&quot;&gt;How Ready Is DNS for an IPv6-Only World?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;bgp-routing-anycast&quot;&gt;BGP, routing, anycast&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://ncs.uclouvain.be/phd/2023/10/04/wirtgen.html&quot;&gt;Improving the Agility of BGP Routing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://labs.ripe.net/author/hausheer/scion-a-novel-internet-architecture/&quot;&gt;SCION - A Novel Internet Architecture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/understanding-bgp-labeled-unicast/&quot;&gt;Understanding BGP Labeled Unicast&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.ipspace.net/2022/01/bgp-af-nerd-knobs.html&quot;&gt;Three Dimensions of BGP Address Family Nerd Knobs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/3452296.3472891&quot;&gt;Anycast In context: a tale of two systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;security&quot;&gt;Security&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://labs.ripe.net/author/kathleen_moriarty/security-control-changes-due-to-tls-encrypted-clienthello/&quot;&gt;Security Control Changes Due to TLS Encrypted ClientHello&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://zhauniarovich.com/publication/2023/anghel2023peering/anghel2023peering.pdf&quot;&gt;Peering into the Darkness: The Use of UTRS in Combating DDoS Attacks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2312.03305&quot;&gt;A path forward: Improving Internet routing security by enabling zones of trust&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2307.05936&quot;&gt;Introducing Packet-Level Analysis in Programmable Data Planes to Advance Network Intrusion Detection&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2310.17851&quot;&gt;Measuring CDNs susceptible to Domain Fronting&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;performance-congestion-etc&quot;&gt;Performance, congestion, etc&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.theverge.com/23655762/l4s-internet-apple-comcast-latency-speed-bandwidth&quot;&gt;The quiet plan to make the internet feel faster&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.microsoft.com/en-us/research/publication/achieving-high-utilization-with-software-driven-wan/&quot;&gt;Achieving High Utilization with Software-Driven WAN&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/3487552.3487860&quot;&gt;Examination of WAN traffic characteristics in a large-scale data center network&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/abs/10.1145/3230543.3230551&quot;&gt;AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/2999572.2999593&quot;&gt;ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/3544216.3544223&quot;&gt;Starvation in end-to-end congestion control&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/abs/10.1145/3452296.3472935&quot;&gt;AnyOpt: predicting and optimizing IP Anycast performance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;automation-configuration-verification-and-correctness&quot;&gt;Automation, configuration, verification and correctness&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2310.04641&quot;&gt;Towards Equitable Peering: A Proposal for a Fair Peering Fee Between ISPs and Content Providers&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2312.04159v1&quot;&gt;Zero-Touch Networks: Towards Next-Generation Network Automation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2204.09635&quot;&gt;LIGHTYEAR: Using Modularity to Scale BGP Control Plane Verification&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2311.00335&quot;&gt;BGP Typo: A Longitudinal Study and Remedies&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2201.03999&quot;&gt;CDN Slicing over a Multi-Domain Edge Cloud&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;monitoring-and-measurement&quot;&gt;Monitoring and measurement&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2310.17012&quot;&gt;Packed to the Brim: Investigating the Impact of Highly Responsive Prefixes on Internet-wide Measurement Campaigns&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ant.isi.edu/~johnh/PAPERS/Moura22a.pdf&quot;&gt;Old but Gold: Prospecting TCP to Engineer and Live Monitor DNS Anycast&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;load-balacing&quot;&gt;Load balacing&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/&quot;&gt;Unimog - Cloudflare’s edge load balancer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://engineering.fb.com/2018/05/22/open-source/open-sourcing-katran-a-scalable-network-load-balancer/&quot;&gt;Open-sourcing Katran, a scalable network load balancer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf&quot;&gt;Maglev: A Fast and Reliable Software Network Load Balancer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://thlujy.github.io/papers/Lujianyuan-sigcomm-sailfish.pdf&quot;&gt;Sailfish: accelerating cloud-scale multi-tenant multi-service gateways with programmable switches&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.usenix.org/conference/atc22/presentation/xu&quot;&gt;Hashing Design in Modern Networks: Challenges and Mitigation Techniques&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mpls&quot;&gt;MPLS&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/mpls-101-the-basics/&quot;&gt;MPLS 101 – The Basics&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/mpls-101-label-distribution-protocol-ldp/&quot;&gt;MPLS 101 – Label Distribution Protocol (LDP)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/mpls-101-dynamic-routing-with-bgp/&quot;&gt;MPLS 101 – Dynamic routing with BGP&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/bgp-lu-and-mpls-vpns/&quot;&gt;BGP-LU and MPLS VPNs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/mpls-101-mpls-vpns/&quot;&gt;MPLS 101 – MPLS VPNs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.dasblinkenlichten.com/fundamentals-of-mpls-lsps/&quot;&gt;Fundamentals of MPLS LSPs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://systemsapproach.substack.com/p/was-mpls-traffic-engineering-worthwhile&quot;&gt;Was MPLS Traffic Engineering Worthwhile?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content>
 </entry>
 
 <entry>
   <title>Failure Is Not An Option</title>
   <link href="http://joeandmotorboat.com/2020/12/28/failure-is-not-an-option"/>
   <updated>2020-12-28T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2020/12/28/failure-is-not-an-option</id>
   <content type="html">&lt;h2 id=&quot;failure-is-not-an-option&quot;&gt;Failure Is Not An Option&lt;/h2&gt;

&lt;p&gt;I ride my bike &lt;a href=&quot;https://www.strava.com/athletes/wjoe&quot;&gt;a lot&lt;/a&gt;. During most of 2020 much of that time has been solo riding. I got an Audible subscription and have been listening to books during my hours of solitude. Recently I listened to &lt;a href=&quot;https://www.amazon.com/Failure-Not-Option-Mission-Control/dp/1439148813/&quot;&gt;&lt;em&gt;Failure is not an option&lt;/em&gt;&lt;/a&gt; by Gene Kranz. Overall an interesting insider’s perspective on flight control and the career of an flight director during the Gemini and Apollo missions. More than that it has lessons for any team dealing with complex systems in high pressure environments. Throughout the book I heard interesting anecdotes that rang true for me having been on numerous teams of folks running software and infrastructure.&lt;/p&gt;

&lt;h4 id=&quot;multiple-systems-connected-systems-are-a-single-system&quot;&gt;Multiple systems connected systems are a single system&lt;/h4&gt;

&lt;p&gt;Kranz talks about a mistake they made during &lt;a href=&quot;https://en.wikipedia.org/wiki/Gemini_8#Emergency&quot;&gt;Gemini 8&lt;/a&gt;. When considering the Gemini and Agena they initially thought of them as two different spacecrafts and failed to see them as a single system when docked. This has the same vibe to me as the classic Leslie Lamport quote “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”. It really drives home that we need to take a holistic view our systems and be considerate of emergent behavior that two or more interacting subsystems can create. Even relatively simple systems create behavior we couldn’t have imagined. I think one tool Kranz used to combat this complexity is ensuring his teams had &lt;em&gt;shared context&lt;/em&gt;.&lt;/p&gt;

&lt;h4 id=&quot;strong-teams-require-shared-context&quot;&gt;Strong teams require shared context&lt;/h4&gt;

&lt;p&gt;A number of times in the book Kranz talks about getting groups of subject matter experts working together. In every case without saying it he seems to be trying to create a shared context within the team. I think he realizes early on that it’s impossible for any single team member to understand the entire system. Beyond getting people talking and working together, one way he does this is to create his mission “binder”. The binder provides teams with a knowledge base about all parts of the system and mission they might need right at their finger tips.&lt;/p&gt;

&lt;p&gt;A shared context not only ensures everyone is on the same page, it makes the team more resilient to unexpected situations. It allows team members that are experts on disparate parts of the system to understand how their subsystem interacts with the other subsystems. &lt;a href=&quot;http://joeandmotorboat.com/2020/03/03/thoughts-on-system-resilience-and-organized-complexity&quot;&gt;Teams of diverse expertise can solve problems of greater complexity&lt;/a&gt;, we see this today with interdisciplinary teams, devops, &lt;a href=&quot;https://cloud.google.com/blog/products/devops-sre/how-sre-teams-are-organized-and-how-to-get-started&quot;&gt;SRE embedding&lt;/a&gt; and &lt;a href=&quot;https://www.amazon.com/Range-Generalists-Triumph-Specialized-World-ebook/dp/B07H1ZYWTM/&quot;&gt;the rise of generalists&lt;/a&gt;. A shared context breaks down silos and is the glue that holds a team of specialists together.&lt;/p&gt;

&lt;h4 id=&quot;end-to-end-testing-includes-human-factors&quot;&gt;End to end testing includes human factors&lt;/h4&gt;

&lt;p&gt;It’s impossible to miss the amount of testing Kranz and his teams did prior to launch. There are &lt;a href=&quot;https://en.wikipedia.org/wiki/Design_review_(U.S._government)&quot;&gt;readiness reviews&lt;/a&gt; and seemingly endless simulations.  When each mission includes huge risks the only way to prepare is to simulate real world conditions, you can’t canary deploy in space. The &lt;a href=&quot;https://blogs.nasa.gov/waynehalesblog/2010/02/16/post_1266353065166/&quot;&gt;sim sup&lt;/a&gt; would devise numerous scenarios that mission control would need to work their way out of. In one serious but extreme simulation they even had a flight controller fake a heart attack. They saw the importance of viewing the human systems and mechanical systems as a single hybrid system that needed to be tested as one. What results is &lt;a href=&quot;https://dzone.com/articles/mechanical-sympathy&quot;&gt;mechanical sympathy&lt;/a&gt;, not just knowing how the system works but having an intuition about how it behaves, including the humans operating those systems.&lt;/p&gt;

&lt;h4 id=&quot;roll-backs-dont-exist&quot;&gt;Roll backs don’t exist&lt;/h4&gt;

&lt;p&gt;Another important lesson is that there’s no such thing as an &lt;em&gt;undo&lt;/em&gt; button. This becomes crystal clear launching spacecraft. Kranz commented that they had to continuously deal with each problem in a forward looking manner as it wasn’t possible to un-launch. Each flight was a series of go/no-go (or &lt;a href=&quot;https://www.archives.gov/exhibits/featured-documents/apollo-11-flight-plan&quot;&gt;stay/no-stay&lt;/a&gt;) decisions. Once the decision was made they had to figure out how to fix any problems that might arise, always moving forward towards the ultimate goal.&lt;/p&gt;

&lt;p&gt;I think this is true in software as well. We are trained to think the last change that was made was the one that introduced the problem. As a result we tend to think we can undo those changes by reverting our changes. Unfortunately, it’s a trap. Rolling back happens in linear time like all macroscopic events. Our mental model of code changes, in distributed systems speak, should be &lt;a href=&quot;http://www.bailis.org/blog/linearizability-versus-serializability/&quot;&gt;linearized&lt;/a&gt;. We might be able to revert the code we just deployed but we don’t always know the effect it’ll have on the system. In the Gemini 8 example above the &lt;em&gt;roll back&lt;/em&gt;, in this case undocking from the Agena, made matters worse for Gemini, increasing the rate the craft was spinning. When dealing with complex systems rolling back &lt;strong&gt;is&lt;/strong&gt; rolling forward, even if it’s the previous version.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Objects Of Power</title>
   <link href="http://joeandmotorboat.com/2020/09/30/objects-of-power"/>
   <updated>2020-09-30T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2020/09/30/objects-of-power</id>
   <content type="html">&lt;h2 id=&quot;objects-of-power&quot;&gt;Objects of Power&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://twitter.com/langdonw&quot;&gt;Langdon Winner&lt;/a&gt;’s &lt;a href=&quot;https://www.cc.gatech.edu/~beki/cs4001/Winner.pdf&quot;&gt;&lt;em&gt;Do Artifacts have Politics?&lt;/em&gt;&lt;/a&gt; (&lt;a href=&quot;https://pne.people.si.umich.edu/kellogg/069a.html&quot;&gt;synopsis of the essay&lt;/a&gt;) has been a bit of a touch stone for me in the last few months. The essay makes clear that the things humans make have both intended and unintended consequences. Some of these objects affect society by being built with expressly political design but others are innately political simply by existing and being used. The essay discusses a number of examples such as low overpasses &lt;a href=&quot;https://en.wikipedia.org/wiki/Robert_Moses#Racism&quot;&gt;Robert Moses&lt;/a&gt; built deliberately to limit public transit from reaching Long Island and hegemonic energy infrastructure such as nuclear vs more decentralized and democratic solar. However, I am interested in viewing this work through the lens of modern information technology and I think a lot of what Winner describes carries forward, the internet itself is a prime example.&lt;/p&gt;

&lt;p&gt;Notions of centralization and decentralization are common when talking about the structure of the internet. While the underlying infrastructure of &lt;a href=&quot;https://www.rand.org/about/history/baran.html&quot;&gt;the internet started as a distributed network&lt;/a&gt;; today it is far more centralized due to the economics to running data centers and networks. This pressure towards centralization shouldn’t be too surprising given the military origins of the internet. Going back to Winner’s essay, Jerry Mander is quoted regarding nuclear power:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;“… if you accept nuclear power plants, you also accept a techno-scientific-industrial military elite. Without these people in charge, you could not have nuclear power.”&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;During the creation of the &lt;a href=&quot;https://en.wikipedia.org/wiki/ARPANET&quot;&gt;ARPANET&lt;/a&gt; and the later internet it would not be a stretch to say &lt;em&gt;if you accept the internet, you also accept a techno-scientific-industrial military elite. Without these people in charge, you would not have the internet&lt;/em&gt;. The same is true for many foundational technologies we use today, such as &lt;a href=&quot;https://en.wikipedia.org/wiki/Global_Positioning_System&quot;&gt;GPS&lt;/a&gt;. Additionally, it’s arguable that the &lt;a href=&quot;https://www.washingtonpost.com/news/capital-business/wp/2014/03/05/why-ashburn-va-is-the-center-of-the-internet/&quot;&gt;“center” of the internet exists near the Dulles airport&lt;/a&gt;, it’s not a mistake that it happens to be in the US.  While the internet looks a lot different today the vestiges of it’s military origins and centralized power remain.&lt;/p&gt;

&lt;p&gt;On top of this physical infrastructure we have built the web. Just having infrastructure isn’t very useful, we need applications and code running in those data centers to do things we want. At the application layer we are faced with the economic challenges and complexity of running distributed systems. As a result internet services tend to become centralized. A visceral example is that basically no one runs their own email servers these days opting for a relatively small number of companies to do it for us. It simply doesn’t make sense for everyone to run a email server at their house. The result is a small number of companies dictating how we communicate, how we store all of our family photos, writing and anything else we want anyone, including our future selves, to experience or see.&lt;/p&gt;

&lt;p&gt;What we end up with is the internet built with political design and innately political services running on top of it. Society set the conditions for the internet to exist as it does today and the internet now affects society. I venture that this feedback loop carries a “memory” of past decisions and biases with it, something along the lines of &lt;a href=&quot;https://en.wikipedia.org/wiki/Long-range_dependence&quot;&gt;long-range dependence&lt;/a&gt;. I think &lt;a href=&quot;https://en.wikipedia.org/wiki/Butterfly_effect&quot;&gt;sensitive dependence on initial conditions&lt;/a&gt; likely also plays a role, without the cold war the internet would not exist as we know it today. &lt;a href=&quot;https://en.wikipedia.org/wiki/Conway%27s_law&quot;&gt;Conway’s Law&lt;/a&gt; suggests that organizations design things that mirror their organization structure, as a result the military built the internet in it’s own image. It’s clear that the physical manifestation of the early internet has tangible and lasting effects on what would later get built on top of the it and how society would use those services. The implications and &lt;a href=&quot;https://www.nytimes.com/live/2020/07/29/technology/tech-ceos-hearing-testimony&quot;&gt;politics&lt;/a&gt; of which now impact literally every facet of our lives.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Changing gears a little, I don’t think Winner could have anticipated how pervasive technology has become in the subsequent forty years since the essay was written but that doesn’t make the essay any less true. We all read articles reciting how &lt;a href=&quot;https://en.wikipedia.org/wiki/Ubiquitous_computing&quot;&gt;our phones are always-on, networked supercomputers&lt;/a&gt;. Many of us feel “naked” without them, unable to connect with anyone or anything that matters. In &lt;a href=&quot;https://www.amazon.com/Computational-Thinking-Press-Essential-Knowledge/dp/0262536560&quot;&gt;Computational Thinking&lt;/a&gt; the authors describe technology, and more specifically information technology and computing, as a &lt;em&gt;human multiplier&lt;/em&gt; allowing us to do more with less. In many cases this may be the same behaviors we have always done but now amplified in every way that matters, such as speed, effort and reach. I like this multiplier analogy because it drives home that technology is a tool that humans use for human purposes rather than it be technology only for technology’s sake. As a result technology is a powerful multiplier for our own biases, be it a bridge or a cloud service.&lt;/p&gt;

&lt;p&gt;Additionally, the internet enables &lt;a href=&quot;https://stratechery.com/aggregation-theory/&quot;&gt;zero distribution and transaction costs&lt;/a&gt; and the power can’t be understated. The fact that any digital product can be created and then downloaded, installed and used by just about anyone is remarkable and carries huge implications for society. I would go so far to say that while the economics of running the cloud tend to create pressure towards centralization at the infrastructure level, zero distribution and transaction costs creates decentralization from the perspective of who can be involved and what gets created. We see this tension in the &lt;a href=&quot;https://en.wikipedia.org/wiki/Epic_Games_v._Apple&quot;&gt;platform battles today&lt;/a&gt;. We have centralized platforms sandwiched between decentralized consumers and creators. Without the former the later would be hard to find, consume and pay, without the latter the former would be a vacant strip mall. While there are caveats to the current system, it goes without saying that it’s never been easier to create something and get it into the hands of someone who might use it.&lt;/p&gt;

&lt;p&gt;So, what does all of this mean? The bottom line is biases we see in society, however egregious, can end up being reflected in the objects we build. Left unchecked these biases impact how and who use them, prolonging whatever biases and assumptions they were built with well into the future. For instance, &lt;a href=&quot;https://www.bloomberg.com/news/articles/2017-07-09/robert-moses-and-his-racist-parkway-explained&quot;&gt;overpasses are rarely replaced&lt;/a&gt;. We must also be aware that technology can affect society not only by what it does but how it’s used, banal technologies used in abhorrent ways are just as bad as abhorrent technologies used in banal ways. The combination of ubiquitous computing and zero distribution and transaction costs means we have more power than ever to multiply and distribute whatever we create. It’s important for everyone, as creators, designers and engineers, to be considerate of the impact our work on society and take an active role in checking our biases and be aware of how the things we build get used. As the old saying goes &lt;a href=&quot;https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility&quot;&gt;&lt;em&gt;with great power comes great responsibility&lt;/em&gt;&lt;/a&gt;. If we are not considerate in our creation we will build the internet equivalent of low overpasses for future generations.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>On A Plus Minus For Engineering Teams</title>
   <link href="http://joeandmotorboat.com/2020/09/28/on-a-plus-minus-for-engineering-teams"/>
   <updated>2020-09-28T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2020/09/28/on-a-plus-minus-for-engineering-teams</id>
   <content type="html">&lt;h2 id=&quot;on-a-plusminus-for-engineering-teams&quot;&gt;On a Plus/Minus for Engineering Teams&lt;/h2&gt;

&lt;p&gt;Management is rife with sports metaphors, we are all &lt;em&gt;team players&lt;/em&gt; trying to get our projects &lt;em&gt;over the line&lt;/em&gt;. Netflix’s CEO famously said they are a &lt;a href=&quot;https://hbr.org/2014/06/your-company-is-not-a-family&quot;&gt;sports team, not a family&lt;/a&gt;. We even go so far to identify individuals and create roles on engineering teams that map on to sports team archetypes, someone might be a good &lt;em&gt;facilitator&lt;/em&gt;, &lt;em&gt;quarterback&lt;/em&gt; or in &lt;em&gt;clutch&lt;/em&gt; situations. What is less obvious nor standardized is how we evaluate individual and team performance. Many times quantifying performance is far more qualitative and mysterious than anyone in the process would prefer. How can we understand the impact someone has on their team, projects and organization as a whole the same way that a basketball team might? How can we ensure that their evaluation encompasses the myriad of ways they might contribute, rather than just counting things like commits or shipping code?&lt;/p&gt;

&lt;p&gt;First off, let me just say I don’t have the answer but I think we as an industry can do better. A step in the right direction might be going beyond the sports analogy and evaluating individuals and teams using sports-like statistics. In basketball there is an all encompassing statistic called &lt;a href=&quot;http://www.basketballinsiders.com/the-virtues-of-plus-minus-statistics/&quot;&gt;plus/minus&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“In its simplest form, plus-minus is exactly what it sounds like – when a given player is on the floor, be it for a single game, group of games or a season, does his team get outscored or does it outscore the opponent? This very simple metric is housed in most common single-game box scores, and is the rawest way of determining what sort of effect a player has on his team (and the opponent) while on the court.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“&lt;em&gt;… the general goal remains to contextualize the effect a player has on his team and opponents while accounting for as many situations and player combinations as possible. Rather than tracking what a player accomplishes individually, the idea is to determine what each individual player’s cumulative contribution has meant to what their team does while they’re on the floor.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most engineering teams don’t have direct opponents or games but we do have goals and projects, during which we are competing against time, costs and complexity. The rest of the analogy and what plus/minus evaluates applies, rather than focus on any individual statistic that an engineer produces it shows how that individual impacts the team, regardless of the way they might contribute. It captures whether the individual is good at scoring (shipping), facilitating (helping others succeed), defense (avoiding pitfalls), in clutch moments (during an outage) and any other possibility. This agnosticism is powerful because it abstracts away the details and focuses on impact.&lt;/p&gt;

&lt;p&gt;A vivid example of this is the story of Shane Battier, the &lt;a href=&quot;https://www.nytimes.com/2009/02/15/magazine/15Battier-t.html&quot;&gt;No Stats All Star&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Battier’s game is a weird combination of obvious weaknesses and nearly invisible strengths. When he is on the court, his teammates get better, often a lot better, and his opponents get worse — often a lot worse. He may not grab huge numbers of rebounds, but he has an uncanny ability to improve his teammates’ rebounding. He doesn’t shoot much, but when he does, he takes only the most efficient shots. He also has a knack for getting the ball to teammates who are in a position to do the same, and he commits few turnovers. On defense, although he routinely guards the N.B.A.’s most prolific scorers, he significantly ­reduces their shooting percentages. At the same time he somehow improves the defensive efficiency of his teammates — probably, Morey surmises, by helping them out in all sorts of subtle ways. “I call him Lego,” Morey says. “When he’s on the court, all the pieces start to fit together. And everything that leads to winning that you can get to through intellect instead of innate ability, Shane excels in. I’ll bet he’s in the hundredth percentile of every category.””&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“In his best season, the superstar point guard Steve Nash was a plus 14.5. At the time of the Lakers game, Battier was a plus 10, which put him in the company of Dwight Howard and Kevin Garnett, both perennial All-Stars. For his career he’s a plus 6. “Plus 6 is enormous,” Morey says. “It’s the difference between 41 wins and 60 wins.” He names a few other players who were a plus 6 last season: Vince Carter, Carmelo Anthony, Tracy McGrady.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Obviously having a team of &lt;em&gt;Shane Battiers&lt;/em&gt; won’t help an organization to win a championship or ship a new product but hard to quantify abilities of &lt;em&gt;Shane Battiers&lt;/em&gt; are critical to successful teams. &lt;a href=&quot;http://joeandmotorboat.com/2020/03/03/thoughts-on-system-resilience-and-organized-complexity&quot;&gt;Diverse teams can solve problems that specialized teams simply cannot.&lt;/a&gt; The bottom line is we should all do better to identify and reward the &lt;em&gt;Shane Battiers&lt;/em&gt; like we do the more obvious contributions of specialists and &lt;em&gt;high scorers&lt;/em&gt; and that starts with improving how we evaluate each persons impact.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Operational Vulnerability</title>
   <link href="http://joeandmotorboat.com/2020/04/20/operational-vulnerability"/>
   <updated>2020-04-20T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2020/04/20/operational-vulnerability</id>
   <content type="html">&lt;p&gt;In Benoit Mandelbrot’s seminal &lt;a href=&quot;https://www.amazon.com/Misbehavior-Markets-Fractal-Financial-Turbulence/dp/0465043577&quot;&gt;The Misbehavior of Markets&lt;/a&gt; suggest market behavior has five rules.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Rule 1 - Markets are risky&lt;/li&gt;
  &lt;li&gt;Rule 2 - Trouble runs in streaks&lt;/li&gt;
  &lt;li&gt;Rule 3 - Markets have a personality&lt;/li&gt;
  &lt;li&gt;Rule 4 - Markets mislead&lt;/li&gt;
  &lt;li&gt;Rule 5 - Market time is relative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The book focuses on the behavior of financial markets, and using the rules can reduce society and an individual’s “financial vulnerability”. Reading through them I can’t help but to identify how they can be applied to web operations. Below I paraphrase the rules from the book and rework them with a focus on teams and the services they run, lastly introducing the idea of &lt;em&gt;operational vulnerability&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&quot;rule-1---markets-are-risky-systems-are-unstable&quot;&gt;Rule 1 - &lt;del&gt;Markets are risky&lt;/del&gt; Systems are unstable&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Extreme price swings are the norm in financial markets - not aberrations that can be ignored. Price movements do not follow the well-mannered bell curve assumed by modern finance; they follow a more violent curve that makes an investor’s ride much bumpier.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Teams building software that perform important, valuable services are by their nature unstable, that is, constantly learning, adapting and acting (i.e. they are solving &lt;em&gt;&lt;a href=&quot;http://joeandmotorboat.com/2020/03/03/thoughts-on-system-resilience-and-organized-complexity&quot;&gt;problems of organized complexity&lt;/a&gt;)&lt;/em&gt;. This change and adaptation can make for a bumpier ride for the individuals on the team and the services they run.&lt;/p&gt;

&lt;p&gt;Teams and services that do nothing are stable, never requiring the stress of adaptation. This makes for a smooth ride but unfortunately there is no pay off for building teams and running services that do nothing.&lt;/p&gt;

&lt;h3 id=&quot;rule-2---trouble-runs-in-streaks-failures-come-in-waves&quot;&gt;Rule 2 - &lt;del&gt;Trouble runs in streaks&lt;/del&gt; Failures come in waves&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Market turbulence tends to cluster. This is no surprise to an experienced trader. … They know that when a market opens choppily, it may well continue that way. They know that a wild Tuesday may well be followed by a wilder Wednesday.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Errors, failures and outages tend to cluster. This is no surprise to an experienced manager or service operator. They know that when a service or team begins to have problems it may well continue that way. An outage Tuesday can lead to a cascade of failures Wednesday. A mismanaged project one quarter can lead to missed objectives in subsequent quarters.&lt;/p&gt;

&lt;h3 id=&quot;rule-3---markets-have-a-personality-systems-have-a-personality&quot;&gt;Rule 3 - &lt;del&gt;Markets have a personality&lt;/del&gt; Systems have a personality&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Prices are not driven solely by real-world events, news, and people. When investors, speculators, industrialists, and bankers come together in a real marketplace, a special, new kind of dynamic emerges – greater than, and different from the sum of the parts. … In substantial part, prices are determined by *endogenous&lt;/em&gt; effects peculiar to the inner workings of the markets themselves, rather than solely by the &lt;em&gt;exogenous&lt;/em&gt; action of outside events. Moreover, this internal market mechanism is remarkably durable.”*&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Behavior of an internet service is not driven solely by real-world events and people. When management, sales, developers, security and operators come together to build a product, a special, new kind of dynamic emerges – greater than, and different from the sum of the parts. In substantial part, system behavior, in its broadest sense, from organization down to an individual team or service, is determined by &lt;em&gt;endogenous&lt;/em&gt; effects peculiar to the inner workings of the organization, team or service itself. This internal behavior is remarkably durable regardless of the purpose, type or scale, persisting through organizational tumult and refactoring.&lt;/p&gt;

&lt;h3 id=&quot;rule-4---markets-mislead-systems-mislead&quot;&gt;Rule 4 - &lt;del&gt;Markets mislead&lt;/del&gt; Systems mislead&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;“Patterns are the fool’s gold of financial markets. The power of chance suffices to create spurious patterns and pseudo-cycles that, for all the world, appear predictable and bankable. But a financial market is especially prone to such statistical mirages.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Patterns are the fools gold of observability. The power of chance suffices to create spurious patterns and pseudo-cycles that, for all the world, appear predictable and repeatable. Organizations, teams and individuals are especially prone to such statistical mirages. The size, shape and frequency of requests to one service isn’t identical to the next. The mitigation for a problem on one service does not work on the next. Building a product as a part of one team is nothing like building a similar product on another. It’s easy to trick ourselves into seeing a pattern when there is none. A given pattern may be helpful but it isn’t always repeatable nor applicable in every situation.&lt;/p&gt;

&lt;h3 id=&quot;rule-5---market-time-is-relative-system-time-is-relative&quot;&gt;Rule 5 - &lt;del&gt;Market time is relative&lt;/del&gt; System time is relative&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;“There is what one may call the relativity of time in financial markets. … markets are operating on their own “trading time” – quite distinct from their linear “clock time” … This trading time speeds up the clock periods of high volatility, and slows down in periods of stability.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is what one may call the &lt;em&gt;relativity of time in teams and services&lt;/em&gt;. Teams and services operate in their own time – quite distinct from their linear “clock time”. “Team time” speeds up in times of organizational volatility, and slows down in periods of stability. “Service time” speeds up during outages and incidents and slows down in periods of stability.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;operational-vulnerability&quot;&gt;Operational vulnerability&lt;/h3&gt;

&lt;p&gt;Teams and services are eternally linked. Services don’t get built nor run without a team of individuals to organize and do the work. Teams don’t exist without a purpose, that purpose is to build and run a service or product. Understanding the rules that teams and services play by can help us to become more situationally aware. When we act on that awareness we can adapt and improve our response to incidents, understand the personality (i.e. emergent behavior) of the teams we belong to and the services we run, be less susceptible to being misled by blindly following patterns, and use our intuition to estimate the severity of the current situation, we reduce the risk of normal, everyday operations.&lt;/p&gt;

&lt;p&gt;As I define it, &lt;em&gt;operational vulnerability&lt;/em&gt; it is the risk within the team and/or service, that when left unchecked tends to create only more risk, leading to failure, outages and missed opportunities. Like in a financial market, operational vulnerability provides a spectrum of risk and reward. For instance creating a new product inherently introduces risk into the system but provides more value to the organization. As humans in these systems our first, perhaps only, job is to balance this tension.&lt;/p&gt;

&lt;p&gt;High operational vulnerability tends to manifest itself, similar to the &lt;a href=&quot;http://gunshowcomic.com/648&quot;&gt;“this is fine” comic&lt;/a&gt;, as stressed out teams trying to keep the lights on in a burning house. When a team or service are operationally vulnerable an otherwise small mishap can snowball to a cluster of failures. The burning house could be an existing service that is crumbling under its own weight or a poor performing organization that isn’t giving the team the support it needs. Either way the risk of failure and missed opportunities for the team or service are increasing and mitigations are needed to bring back a healthy balance.&lt;/p&gt;

&lt;p&gt;Mild operational vulnerability tends to mean that a team or service can provide value while adapting and being resilient to failure. For instance, a service maintains its availability during a DDOS while preventing impact to downstream services. A team delivering high quality code in the face of personnel or organizational changes. Like a circuit breaker in a house preventing electrical fire, problems tend to be remediated before they cascade throughout the system and become out of control. There are risks but not so much as to dwarf the value generated by building and maintaining the system in the first place.&lt;/p&gt;

&lt;p&gt;No operational vulnerability means the team or service quite literally are doing nothing. All actions introduce risk and operational vulnerability. Without introducing risk we cannot build anything of value.&lt;/p&gt;

&lt;p&gt;Operational vulnerability &lt;em&gt;scales&lt;/em&gt;, from the single line of code to the entire organization. For the individual this could mean introducing technical debt to ship a product, deliberately increasing operational vulnerability, while adding more testing and validation of that code, decreasing operational vulnerability. For a manager this could mean finding ways to increase development velocity, while shielding a team from organizational politics so they can focus on getting work done. For leadership it could mean taking on a large, demanding customer while creating a culture of diversity, inclusion and support.&lt;/p&gt;

&lt;p&gt;As humans, at each layer in these systems, we can use the five rules we can identify desirable system behaviors, balancing risk with reward. Increasing operational vulnerability when the time is right and creating more opportunities for value or decreasing it when the risk is too great to stomach, creating stability and resilience in the system.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Thoughts On System Resilience And Organized Complexity</title>
   <link href="http://joeandmotorboat.com/2020/03/03/thoughts-on-system-resilience-and-organized-complexity"/>
   <updated>2020-03-03T00:00:00+00:00</updated>
   <id>http://joeandmotorboat.com/2020/03/03/thoughts-on-system-resilience-and-organized-complexity</id>
   <content type="html">&lt;p&gt;&lt;img src=&quot;https://www.researchgate.net/profile/Seif_Haridi/publication/221047631/figure/fig1/AS:305582897680384@1449868048114/Randomness-versus-complexity-taken-from-Weinberg-38.png&quot; width=&quot;450&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Outages make me contemplative, every incident is an opportunity to learn and understand our environment better so we can become more resilient in the future. During a cross country flight yesterday &lt;a href=&quot;https://twitter.com/williamsjoe/status/1234610844463484928&quot;&gt;I re-read some of my guideposts&lt;/a&gt; on how to think about complexity and systems. These papers remind me of both how hard the problems we, socitey and more specifically people in technology, are trying to solve really are and that every individual on a team contributes to the expertise and diversity to combat complexity with complexity.&lt;/p&gt;

&lt;p&gt;I started with &lt;a href=&quot;https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf&quot;&gt;How Complex Systems Fail&lt;/a&gt; by &lt;a href=&quot;https://twitter.com/ri_cook&quot;&gt;Richard I Cook&lt;/a&gt;. It’s a hit list for how the interactions in complex systems give rise to complexity and emergent and unexpected behaviors. Many of the items on the list will be familiar to us such as &lt;em&gt;Change introduces new forms of failure&lt;/em&gt;. The paper ends on a positive note, to me at least, with &lt;em&gt;People continuously create safety&lt;/em&gt; and &lt;em&gt;Failure free operations require experience with failure&lt;/em&gt;. Reminding me that we, as engineers, practitioners and leadership, are the only way that the system as a whole can improve and become more resilient to failure through our ingenuity and experience.&lt;/p&gt;

&lt;p&gt;Next up, I was reminded of &lt;a href=&quot;https://en.wikipedia.org/wiki/Warren_Weaver&quot;&gt;Warren Weaver’s&lt;/a&gt; paper &lt;a href=&quot;https://people.physics.anu.edu.au/~tas110/Teaching/Lectures/L1/Material/WEAVER1947.pdf&quot;&gt;Science and Complexity&lt;/a&gt;. This paper, from 1948, digs into what the role of science, it’s history and impact on society and how complexity will lead to us needing a new way to solve hard problems. The paper categorizes the problems that science tries to solve, what Weaver calls &lt;em&gt;problems of simplicity&lt;/em&gt;, &lt;em&gt;problems of disorganized complexity&lt;/em&gt; and &lt;em&gt;problems of organized complexity&lt;/em&gt;. The first are straightforward problems of collection and classification that science addressed in 1900s. The second are problems that have enumerable variables and interactions but can be addressed using statistical methods, such as averages, the example Weaver uses is that of a billiard table with millions of balls. The last category, &lt;em&gt;organized complexity&lt;/em&gt;, are the most difficult to solve and sit somewhere between the aforementioned simple and organized problems. What makes these problems hard to solve is that they can’t be solved by any specific technique, they also happen to be the problems that solving will be foundational to our progress as a species. On the bright side organizations that prioritize collaborative openness and D&amp;amp;I efforts are on the right track:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“… in spite of the modern tendencies toward intense scientific specialization, that members of such diverse groups could work together and could form a unit which was much greater than the mere sum of its parts. It was shown that these groups could tackle certain problems of organized complexity, and get useful answers.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Finally I revisited &lt;a href=&quot;https://web.archive.org/web/20131008160618/http://www.sustainabilityinstitute.org/pubs/Leverage_Points.pdf&quot;&gt;Leverage Points: Places to intervene in a system&lt;/a&gt; by &lt;a href=&quot;https://en.wikipedia.org/wiki/Donella_Meadows&quot;&gt;Donella Meadows&lt;/a&gt;. This paper focuses on societal and economic examples but the lessons are applicable to any system. Meadows develops a list of common places to make changes to a system and how to get the response you want from your change. I personally like #6 &lt;em&gt;The structure of information flows&lt;/em&gt;, simply sharing information and visibility of a problem can make a big impact. Meadows uses an example of two houses with there electric meters mounted in two different places:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“There  was  this  subdivision  of  identical houses, the story goes, except that for some reason  the  electric  meter  in  some  of  the houses was installed in the basement and in others it was installed in the front hall, where the residents could see it constantly, going round faster or slower as they used more or less electricity. With no other change, with identical prices, electricity consumption was 30  percent  lower  in  the  houses  where  the meter was in the front hall.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Another powerful lever is #4 &lt;em&gt;The power to add, change,evolve, or self-organize system structure&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“The ability to self-organize is the strongest form of system resilience. A system that can evolve  can  survive  almost  any  change,  by changing  itself.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An important lesson from this paper is that complexity and emergent behavior is hard to predict and can behave counter-intuitively so we as the humans pulling the levers need to think critically about the consequences of our actions.&lt;/p&gt;

&lt;p&gt;Together these papers elucidate the difficult but tractable problems we have ahead of us. If we intend to change the future and the world with technology, and I’m hopeful that we can, then I think Weaver summed up our charge pretty well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;“In one sense the answer is very simple: our morals must catch up with our machinery. To state the necessity, however, is not to achieve it. The great gap,which lies so forbiddingly between our power and our capacity to use power wisely, can only be bridged by a vast combination of efforts. Knowledge of individual and group behavior must be improved. Communication must be improved between peoples of different languages and cultures, as well as between all the varied interests which use the same language, but often with such dangerously differing connotations. A revolutionary advance must be made in our understanding of economic and political factors. Willingness to sacrifice selfish short-term interests, either personal or national, in order to bring about long-term improvement for all must be developed.”&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
</content>
 </entry>
 
 
</feed>
