A working repository

Problem Timing

Differential Problem-Solving

A discipline for choosing which problems to attack now and which to defer, given that the cost of solving things is collapsing on multiple curves at once and AI is reshaping which problems matter.

"Problems are inevitable. Problems are soluble."

— David Deutsch
14
numbered files
12
field guides
9
audience one-pagers
150
ranked projects

How to use this page

This is the entire Problem Timing repository on a single page — about ninety-five thousand words of framework, examples, lists, field guides, one-pagers, the academic paper, and the dual-use modification. Click any section in the sidebar to jump. Click any heading to expand or collapse. The interactive widgets are the diagrams, the scoring rubric, and the filterable lists.

Start here
Manifesto

Manifesto

Possibilities do not merely add up; they multiply.

— Paul Romer, on the combinatorial nature of new ideas

The cost of solving things is collapsing. Sequencing a genome, training a model, simulating a protein, launching a kilogram, reading a million pages — every one of these is on a curve, and the curves are steep. The world's hardest problems are not getting easier in the abstract; they are getting easier on a schedule. Most people working on them are not paying attention to the schedule.

This is the central observation, and it has consequences.

The first is that which problem you choose to attack is now a more important decision than how you attack it. For most of the history of work, the binding constraint was capability — could the thing be done at all, and could you do it. That constraint is releasing. The new binding constraint is selection. Of all the problems in the world, which one deserves your attention right now, given that next year a different one will?

The second is that good selection is unevenly distributed. Most institutions still allocate as they did when capability was scarce. PhD students inherit their advisors' priorities. Foundations chase causes that resemble their founder's biography. National science budgets reflect last decade's headlines. Companies fund the projects that made sense at their last strategy offsite. The result is a portfolio that is far from optimal — and getting further from optimal each year as the underlying tractability landscape moves faster than the funding does.

The third is that some of the most important problems in front of us are exactly the ones the consensus is mispricing. Hard problems whose difficulty is about to drop sharply. Boring problems whose cascade value is enormous and invisible. Old problems whose evidence base is closing. Moonshots whose 5-per-cent case is worth fifty times any safer bet. The people who can read the curves and act on them have an arbitrage opportunity unlike anything that existed when capability was scarce.

What we are calling Problem Timing is the discipline of reading the curves. The formal version is Differential Problem-Solving — the move, made by Bostrom for technologies, of asking not just whether something will be developed but in what order. The visual is the Tractability Frontier: the moving boundary of what can be cheaply solved at any given moment, with problems on one side worth attacking now and on the other side worth waiting on. The vocabulary, the dimensions and the worked examples are in the rest of this repository.

The argument is not that you should wait. The argument is the opposite: the cases where the framework says attack now are more aggressive, more confident and more counter-consensus than the consensus would tolerate. Attack now when the cascade is large. Attack now when demonstration is the unlock. Attack now when the window is closing. Attack now when the by-products of a brute-force attempt are valuable in their own right. Attack now when no one else is looking and the price of attention is mispriced. The framework is contrarian by construction; it tells you when to do the unfashionable thing.

But it also tells you when to stop. Stop hand-tuning what AI will do better next year. Stop transcribing what OCR will do for free in eighteen months. Stop building rule systems that will be replaced by foundation models. Stop running brute-force projects whose by-products are worthless and whose headline payoff is uncertain. Most institutions need this side of the framework more than the first. It is much easier to start a project than to stop one.

Three claims, then.

First, problem selection is the leverage point of the next decade. More than capability, more than capital, more than talent — though all three remain necessary. Among allocators of resource, the ones who think clearly about timing will out-perform the ones who don't, by margins that compound.

Second, the discipline of timing is teachable. Not as a calculator. As a habit of mind, with a vocabulary, a checklist of dimensions, a small library of canonical examples and a willingness to revise when the world surprises you. The framework here is meant to make the discipline legible, criticisable and improvable.

Third, the right time to build this discipline is now. The cost-collapse is happening, the misallocation is happening, the mispricing is real. A small group of allocators — funders, founders, researchers, AI agents acting on behalf of any of them — who think about timing seriously will move resources by orders of magnitude more efficiently than the consensus, and will pull a disproportionate share of the next decade's important results in their direction. We would prefer that group to be large, public, and arguing in the open. This repository is the start of that argument.

The world has more important problems than it has people who can recognise which ones to attack and when. If you are one of the people, this is for you.

— Siri Southwind

Background

Background

If what you are doing is not important, and if you don't think it is going to lead to something important, why are you working on it?

— Richard Hamming, You and Your Research (1986)

The question

For most of human history, the dominant question facing anyone who wanted to be useful was can this even be done? Once a problem looked tractable, you attacked it. Resources were scarce, lives were short, the bar for "worth doing" was simply "achievable in our lifetime."

That question has not gone away, but it has been joined by another, and the second question is now often the more important one:

Given how fast the cost of solving things is falling, when is the right moment to attack this particular problem?

The right answer is sometimes "now, because it unlocks a cascade of other things." Sometimes it is "in three years, because the brute-force version we would build today will be obsolete before it ships." Sometimes it is "never, because the value is too small or the same effect will be a free by-product of something else." And sometimes — this is the case worth taking seriously — it is "now, even though the probability of success is small, because the payoff if we succeed is huge and the option will not exist later."

We do not, as a civilisation, ask this second question rigorously. Funders fund the same problems they have always funded. PhD students inherit their advisors' priorities. National science budgets reflect lobbying, prestige and last decade's headlines. Foundations chase the cause that most resembles their founder's biography. The result is a portfolio of work that is far from optimal — and getting further away each year as the underlying tractability landscape shifts faster than the funding does.

Why this matters now

Three things are converging.

Costs are collapsing on multiple curves at once. Sequencing a human genome cost roughly three billion dollars in 2003 and is now closer to two hundred dollars. Training a model with GPT-3-class capability cost tens of millions in 2020 and is now trivially affordable on consumer hardware for the same task. Cube-sat launches, custom protein synthesis, robotic manipulation, materials simulation — all on steep cost-decline curves with different slopes. The timing of when to attack a problem now depends on which curve it sits on, and we have no shared framework for talking about that.

AI is changing the cost of cognitive labour faster than any prior technology changed the cost of any prior input. Many problems that were "hard because someone has to read ten thousand pages and notice patterns" are about to become weekend projects. Problems that were "hard because the search space is too big for humans" are next. The phase change is uneven; some kinds of cognitive work are being automated essentially overnight while others remain stubbornly human for now. Knowing which is which — for your problem — is suddenly a high-value skill.

The opportunity cost of getting it wrong is rising. When everyone has a thousand-fold productivity multiplier on certain kinds of work, choosing badly costs a thousand-fold more. Every brilliant scientist hand-curating data that an LLM will sort in a second next year is a scientist not working on the next AlphaFold-shaped opening.

The asymmetry

The payoff structure is asymmetric in a way that matters.

The downside of attacking a problem too early is that you spend a lot to do something that will be cheap later. You waste resources but you usually do not destroy anything; often you generate a useful by-product (hardware, data, infrastructure, talent, evidence-of-possibility). The downside of waiting too long is that you forfeit a window — maybe forever, if someone else captures the position, or maybe for years, if the dependent problems behind your problem cannot move until yours does.

The upside of getting the timing right is enormous in both directions. Attack now and you sit at the head of a wave. Wait correctly and you pick up the same prize for a fraction of the cost.

This is fundamentally a real options problem. Each problem in the world is an option whose strike price (the cost to solve it) is changing over time. Most options are getting cheaper to exercise. A few are getting more expensive — knowledge being lost, ecosystems collapsing, witnesses dying, archaeological sites eroding. A small number are essentially constant. The job is to know which.

The ambition

What follows is an attempt to systematise the question. Not to replace judgement — the framework is not a calculator, and treating it as one would be worse than not having it — but to make the judgement legible, comparable, and arguable.

If it works, the framework should help:

  • A founder choose between two adjacent product ideas with different timing profiles.
  • A foundation decide whether to fund the deep, expensive version of a project or wait for the cheap version that may or may not arrive.
  • A government science agency justify why some fields deserve a brute-force sprint and others deserve patience.
  • An individual researcher pick the problem that, if solved, unlocks the most downstream work — including the Hamming question, what are the important problems in your field, and why aren't you working on them?
  • An AI agent deciding what to allocate compute and tool calls to.

If it works really well, it should also support a market for problems — a way of pricing them — which is the most ambitious goal and probably the furthest from being usable. See Moonshots, arbitrage & markets.

What this is not

It is not an argument for waiting. Patience has its own cost, and most of the people drawn to this kind of thinking are already too patient. The framework is meant to find the cases where action now is correct precisely because the option will close, the cascade is large, or the demonstration of possibility is itself the unlock.

It is also not a moral framework. It does not tell you whether a problem is good to solve, only whether it is well-timed. Curing a disease and building a more addictive slot machine sit on the same axes here. Importance, in the moral sense, is exogenous to this framework — though the adjacent fields it borrows from take that question seriously.

Related framings worth knowing

The closest existing analogue is Nick Bostrom's Differential Technological Development: the idea that we should accelerate beneficial technologies relative to dangerous ones, rather than accepting whatever order they happen to arrive in. Differential Problem-Solving is the same move applied one level down: choose which problems get attacked when, given that the cost of attacking each one is moving on its own curve.

The Effective Altruism community's importance × tractability × neglectedness framework is also a close cousin. It does most of what is needed except for the time dimension, which is precisely where this framework adds something. A problem can be highly important, currently intractable, and crowded with people who will not solve it — and the right answer can still be either attack now or wait depending on how the tractability is moving.

Richard Hamming's You and Your Research is the spiritual ancestor. He was asking individual scientists why they were not working on the most important problems. The framework here is the same question scaled up to civilisations and across time.

See Intellectual lineage for a fuller map.

The framework
Dimensions

Dimensions

Important problems are problems that interest you and that you have a chance of doing something about.

— Richard Hamming

This section lists the dimensions along which a problem can be characterised. The point is not to score every problem on every axis — most are not worth that effort — but to have a checklist sharp enough that the interesting dimension for any given problem becomes obvious when you walk down it.

The dimensions are grouped into four families: cost and difficulty, value, time and curve dynamics, and strategic context. The boundaries are fuzzy and a few axes belong in more than one family.

Two dimensions deserve to be foregrounded. Cost trajectory (Family 3) and verification cost (Family 1) are the two most important variables in almost every interesting allocation decision today. Cost trajectory captures the framework's central temporal insight. Verification cost captures the generation–verification asymmetry — the fact that AI has made it dramatically cheaper to produce hypotheses, code, designs and analyses than to check them. The bottleneck has moved to the verification side and is staying there. A 2026 allocator who is not asking the verification question on every project is not yet using the framework.

Before walking down the families one at a time, the diagram below lays out the whole set on one page. Twenty-nine dimensions in four families. Two of them — verification cost and cost trajectory — are starred because they carry most of the weight in 2026 readings; the rest are modifiers. Most allocation decisions turn on two or three dimensions, not all of them, and the art is identifying which dominate for the problem in front of you.

29 dimensions across 4 families. ★ marks the two foregrounded for 2026. Cost & difficulty ★ verification cost difficulty today required talent density capital intensity coordination cost information completeness physical-resource dependency Value direct value cascade value demonstration value optionality value externalities decay rate of value distributional shape Time & curves ★ cost trajectory tractability trajectory window reversibility time-to-impact path-dependence Strategic context neglectedness · crowding comparative advantage information asymmetry demonstrability of progress reusability of by-products regulatory · crowdability asymmetry · strategic dual-use Verification cost (Family 1) and cost trajectory (Family 3) carry most of the weight today.
29 dimensions across 4 families. ★ marks the two foregrounded for 2026 — verification cost in the cost-and-difficulty family, cost trajectory in the time-and-curves family.

Family 1 — Cost and difficulty

Verification cost. Once a candidate solution exists, how do you know it actually solves the problem? Some problems have cheap verifiers (chess, theorem-proving, protein folding once you have ground truth, mathematical proofs). Others have verifiers that are themselves hard problems (cancer-cure efficacy, education interventions, AI alignment, the truthfulness of an AI-generated report). High verification cost means slow learning loops, which compounds across time. In 2026 this dimension is doing more work than ever, because generation cost is collapsing while verification cost is essentially flat. AI systems can produce a thousand candidate hypotheses, designs, code patches or proofs in the time a human team can review one. The asymmetry has made verification the binding constraint on most ambitious projects, and a problem whose verifier is itself hard is now substantially worse-positioned than the same problem ten years ago. The framework's reading of a project should usually start here.

Difficulty today. What does it actually take to solve this with the best tools currently in existence? Not "could a sufficiently funded national lab do it" but "what is the realistic cost path from where we are now to a solved problem?" Express in some combination of money, person-hours, compute, exotic materials, regulatory consent and time-on-the-clock.

Required talent density. Some problems can be brute-forced by a thousand competent technicians. Others require a small handful of specific people who already exist and whose attention is the binding constraint. The two have very different timing profiles because talent density does not fall as fast as compute cost.

Capital intensity. Distinct from total cost: how much of the cost is up-front, sunk, and irreversible? A two-billion-dollar fab is a very different bet from two billion dollars of operating expense over a decade.

Coordination cost. How many independent actors must agree, contribute or stay out of the way for the problem to be solved? The Human Genome Project's coordination cost was a substantial part of its expense; for many software-shaped problems it is near zero.

Information completeness. Is the problem well-specified, or does the act of attacking it reveal that the problem itself was the wrong one? Problems that mutate as you attack them are far more expensive than nominal cost suggests.

Physical-resource dependency. What does the project require from the physical world that cannot be substituted away? Three sub-aspects matter. Energy intensity — what does the project cost in electricity, particularly at the scales the framework's larger bets imply (training runs, manufacturing, refining, refrigeration, propulsion). Atom-class dependency — does the project require specific elements (rare earths, lithium, gallium, helium, certain isotopes, particular biologics) whose supply is geographically or geologically concentrated. Permitted-action availability — does the project depend on physical actions (siting, construction, testing, animal use, environmental impact, dual-use export) that face binding regulatory or political constraints. When cognition is cheap, physical-resource dependency is increasingly the binding constraint, particularly in climate, materials, manufacturing, biotech and infrastructure.

Family 2 — Value

Direct value of a solution. What does the world look like immediately after the problem is solved, in concrete units (lives, dollars, watts, knowledge, time saved)? Be honest; most problems are smaller than their advocates claim.

Indirect / cascade value. What other problems become solvable, cheaper, or differently-shaped as a result? AlphaFold's direct value (predicting protein structures) is meaningful; its cascade value (accelerating drug discovery, enzyme design, basic biology) is multiples larger. Problems whose solutions sit upstream of many other problems deserve a multiplier.

Optionality value. Solving the problem may not produce direct or cascade value yet, but it creates the option to do something later — once a complementary technology arrives, once a market appears, once a regulator wakes up. Real options theory formalises this; see Models & scoring.

Demonstration value. Even an expensive, ugly, brute-force solution can be valuable purely because it proves the problem is soluble. The Manhattan Project, Apollo, the Human Genome Project and AlphaFold all derived a meaningful share of their value from removing the question of whether it can be done. Once that question is closed, the elegant version is much easier to fund and much faster to build.

Externalities. Positive (spillover learning, infrastructure, talent training) and negative (pollution, dual-use risk, attention diversion). Public funding decisions need this column more than private ones do.

Decay rate of value. Some solutions retain their value indefinitely (a vaccine for a stable pathogen). Others decay fast as the world moves around them (a faster transistor, a better recommendation algorithm). Decay rate interacts with timing — a fast-decaying value should usually be attacked later, not earlier.

Distributional shape. Is the value spread thinly over many or concentrated on few? This matters for who funds it, not whether it is worth solving.

Family 3 — Time and curve dynamics

Cost trajectory. How is the cost-to-solve changing year on year? Roughly flat? On a Wright's-law curve with a known doubling time? On an exponential cliff because of a specific underlying technology? This is the central new variable in the framework. A problem on a fast cost-decline curve has a high option value of waiting; a problem on a flat curve does not.

Tractability trajectory. Distinct from cost. The cost might be falling because compute is getting cheaper, or because the problem itself is becoming differently structured — for example because a complementary technology now exists that turns the original problem into a much smaller residual. AlphaFold did not make the protein-folding problem cheaper to brute-force; it changed what kind of problem it was.

Window. Is there a closing window in which the problem can be solved at all, or solved with current evidence intact? Examples: language documentation as the last speaker dies, archaeological sites before erosion, witness testimony, ice cores before glaciers retreat further, the genomes of soon-to-be-extinct species. A closing window is the single strongest argument for attacking now even if cost is falling.

Reversibility. If we delay and turn out to be wrong about the cost trajectory, can we still solve it later? Most problems yes; some no. Irreversible problems deserve more aggressive timing.

Time-to-impact. Even after a problem is solved, how long until the solution actually translates to value in the world? Some unlocks compound over decades (basic biology); some are immediate (a fix to a production bug). This matters for portfolio composition more than for individual problem selection.

Path-dependence on adjacent problems. Some problems can only be sensibly attacked after some other problem is solved. Crystallography enabled protein folding; machine-readable scientific corpora enabled modern AI. Problems sit in a dependency graph and the question is partly about ordering, not just selection.

Family 4 — Strategic context

Neglectedness. Borrowed from the EA framework: how many smart, motivated, well-resourced people are already working on this? A crowded problem has lower marginal return per additional resource even if the absolute value is high.

Comparative advantage. Who should solve it — the question is rarely "should this be solved" in the abstract but "should I, with my resources, solve it, given the alternatives?"

Information asymmetry. Are you among the few who understand the value, or is the value already priced in by the market for attention? The most productive bets often live in the cracks: problems that look small or unfashionable to the consensus but are obviously important to a small group with the right context.

Demonstrability of progress. Can you tell, from the inside, whether you are making progress, or only after you ship? Problems with poor mid-flight signals burn more resources than they should.

Reusability of by-products. If you fail at the headline problem, is the infrastructure / data / talent / methodology you build still useful? High reusability lowers the effective cost of trying.

Regulatory and political tailwinds. Some problems are technically solvable today and politically unsolvable until later, or vice versa. Pricing these wrong is a common failure mode, especially in biotech and energy.

Crowdability. Can the problem be sliced into pieces small enough that a large distributed effort (Galaxy Zoo, Folding@home, citizen-science transcription, prediction markets) can attack it? Crowdable problems have a much better cost trajectory than non-crowdable ones because the marginal contributor is essentially free.

Asymmetry of payoff. Is the distribution of outcomes roughly symmetric, or heavy-tailed? Heavy-tailed distributions justify portfolio strategies that a normal-distribution decision-maker would reject. See Moonshots, arbitrage & markets.

Strategic dual-use. Solving the problem may benefit you and equally benefit competitors or adversaries — or, in the worst cases, may benefit malicious users substantially more than legitimate ones. For most problems this is a minor consideration; for a small but consequential set it is the dimension that overrides the rest. The framework's specific treatment — including the four-category classification (benign-default, defender-favoured, attacker-favoured, symmetric), the cases where the standard cascade and demonstration readings invert for attacker-favoured problems, and practical advice for founders, investors, public funders and researchers in dual-use domains — is in Dual-use & catastrophic risk. The dimension belongs in the strategic family because the relevant question is not whether the technology is dual-use in the abstract but whether the ecosystem in which it would deploy contains the safety infrastructure to absorb it.

Visualising the dimensions: the Tractability Frontier

The dimensions describe individual problems one at a time. A useful way to see how the framework's verdicts compose is to plot many problems on a two-dimensional plane: cost-to-solve on one axis, direct value on the other. The boundary between cheaply-solvable and not-cheaply-solvable is what the framework calls the Tractability Frontier — a curve that shifts outward year by year as the underlying inputs get cheaper, complementary capabilities arrive, and yesterday's hard problems become today's weekend projects.

The frontier shifts outward year by year 2024 frontier 2026 frontier Mass digitisation Endangered languages AlphaFold cascade Custom enzymes Robotic data collection Whole-brain emulation Useful quantum advantage Direct value → Cost to solve ↑ inside frontier — attack now on the frontier — with caveats outside — open / wait
The Tractability Frontier — inside the curve, problems are cheap to attack now; outside, wait or brute-force; on the curve, the individual dimensions decide.

Inside the frontier, problems are cheap enough that the framework's verdict is usually attack now — most of the dimensions point the same way and the work is finding the angle. Outside the frontier — too expensive, too uncertain, too constrained by physical resources — the verdict is usually wait (if the curve is favourable) or open question (if it is not). On the frontier itself the verdicts are mixed and the individual dimensions matter most: a problem with a strong demonstration value or a closing window can be worth attacking even from the outside; a problem in a saturated category can be worth skipping even from the inside.

The frontier moves. Each year a band of previously-outside problems crosses to the inside, and the work that was unreasonable two years ago becomes the obvious bet today. The framework's central practical claim is that allocators who watch this movement carefully — and who act on it before the consensus does — will out-perform allocators who do not.

Using the framework

In practice, almost every interesting allocation decision turns on two or three of these dimensions, not all of them. The art is identifying which two or three dominate for the problem in front of you, and being honest about the others. A common pathology is to over-weight direct value and under-weight cost trajectory, which produces the classic "we sequenced the genome ten years too early" failure mode — except that, on closer inspection, the early sequencing produced enough demonstration value and infrastructure value that it may well have been the right call. (See Historical examples.)

Each dimension is, when read honestly, scenario-conditional: the score depends on an unstated forecast about how the world will unfold over the relevant time horizon. The Pierre Wack and Royal Dutch Shell tradition (covered in Intellectual lineage) is the explicit corrective. The recommendation in Models & scoring is to score across a small set of plausible scenarios rather than against a single implicit future, and to read the shape of robustness rather than a single number.

The next file proposes some mathematical models for putting these dimensions together, and a deliberately simple scoring scheme that resists over-quantification.

Models & scoring

Models & scoring

This section collects the theoretical machinery worth borrowing, sketches a working scoring scheme, and proposes a retrospective score (the "stupidity index") for evaluating past projects with the benefit of hindsight. The emphasis throughout is on models that aid judgement rather than replace it. A spurious decimal place in a problem-allocation calculation can be more dangerous than an honest shrug.

The core decision

For any given problem the basic decision is:

  1. Attack now, with the current generation of tools and knowledge.
  2. Wait, on the bet that the cost will fall, the tools will improve, or someone else will move first.
  3. Attack a different problem, on the bet that another problem dominates this one on the relevant axes.
  4. Decompose: attack a sub-problem now, defer the rest.
  5. Brute-force the demonstration, then defer the production version (see Brute force vs elegance).

The right choice depends on how cost, value and probability of success move over time, and on the opportunity cost of the alternatives.

Model 1 — Real options

The simplest formal framing is to treat each problem as a real option whose strike price (cost to solve) is changing over time. The classical Black-Scholes intuition does not transfer directly — the volatility of "cost to solve a scientific problem" is not log-normal, and the underlying uncertainty is closer to Frank Knight's Knightian uncertainty (1921) than to the measurable risk that finance theory typically assumes — but the qualitative lessons do.

The value of waiting rises with:

  • the volatility of the cost trajectory (genuinely uncertain when it falls dramatically),
  • the steepness of the expected cost decline,
  • the cost of failure if we attack early,
  • the existence of substitutes (someone else may solve it for free).

The value of waiting falls with:

  • the discount rate on the eventual benefit,
  • the closing of any natural window,
  • the cascade value foregone by delaying (downstream problems are blocked),
  • the demonstration value of being first.

A useful, deliberately rough heuristic: attack now if the cascade value plus the demonstration value exceeds the expected savings from waiting one tractability cycle.

Model 2 — Optimal stopping

The secretary problem and its relatives offer the right shape for one common case: a sequence of moments at which you could attack the problem, with the cost falling stochastically, and a hard deadline beyond which the problem is no longer worth solving. Optimal stopping says: observe for a while, learn the distribution, then commit at the first moment after which the expected cost is below your reservation.

For research problems this becomes: do not be the first lab to commit (you pay too much) but do not be the third (the prize is gone). Be the second, with eyes open. This is not always achievable, but as a calibration it is useful.

Model 3 — The wait curve

A more practical visualisation. Plot the expected cost-to-solve over time on one axis, and the probability-weighted value on the other. The cost curve usually slopes down. The value curve sometimes slopes down too (decay of relevance), sometimes slopes up (the cascade matures), and sometimes has a step at the moment a complementary technology arrives.

Cost to solve falls, value rises — attack at the crossover Cost to solve (falling) Probability-weighted value Cost / Value → Time → too early attack zone too late
The Wait Curve — when the rate of cost decline first falls below the rate of value decline (adjusted for the cascade of moving early), you are in the attack zone.

The right time to attack is roughly where the rate of cost decline first falls below the rate of value decline, adjusted for the cascade value of moving early. In most domains this is not a clean point; it is a region. Knowing you are in the region is often enough.

This is the curve I imagine when I read a research proposal. A solid proposal answers the question "why now" implicitly; a great proposal answers it explicitly.

Model 4 — Wright's law on the input side

For problems whose cost is dominated by a specific underlying input (compute, sequencing, photons, qubits, robotic hours), the cost trajectory often follows Wright's law: cost falls by a roughly constant percentage with each doubling of cumulative production. Once you know which input dominates the problem, you have a much better forecast of the cost trajectory than a naive linear projection would give you.

Sequencing followed Wright's law for almost two decades and beat Moore's law substantially. Compute follows scaling laws of its own. Solar panels, batteries, launch costs and gene synthesis all fit similar curves. Identify the dominant input curve is one of the more useful single questions to ask of any problem.

Model 5 — Differential acceleration

Bostrom's framing, applied at the problem level rather than the technology level. For each problem, ask not only "should this be solved" but "should this be solved before or after its neighbours in the dependency graph?" Solving Problem A before Problem B can produce vastly different worlds than solving them in the reverse order, especially when one of them is dual-use or carries systemic risk.

This dimension is hard to score in isolation but easy to surface in conversation, which is most of what a framework needs to do.

Model 6 — Importance × Tractability × Neglectedness × Timing

The EA framework with one term added.

  • Importance: how big is the prize?
  • Tractability: how much does an additional unit of resource move the needle?
  • Neglectedness: how few people are already on it?
  • Timing: how favourable is the moment, given the cost and tractability trajectories?

Multiplicative, not additive. A zero on any one term takes the product to zero, which is roughly correct: a problem that is huge but completely intractable, or huge and tractable but already crowded, deserves close to zero of your marginal resource regardless of how huge it is.

A working scoring scheme

All models are wrong, but some are useful.

— George Box (1976)

I am suspicious of scoring schemes. They invite false precision, and the false precision invites the worst kind of decision-maker — someone who trusts the spreadsheet because it feels objective.

That said, a discipline of scoring forces a clarity of thought that pure prose does not. The compromise is a rough score on a small number of axes, on a 0-3 scale, with a written justification for each.

Suggested axes for a first pass, all on 0-3, ordered to reflect the framework's foregrounding of verification cost and cost trajectory as the two most important variables:

  1. Verification cost. 0 = unclear if you ever solved it. 3 = trivially verifiable. Leads the rubric because the generation–verification asymmetry is the central technical fact of the moment; a problem whose verifier is itself hard is now substantially worse-positioned than the same problem ten years ago.
  2. Cost decline rate. 0 = flat or worsening. 3 = collapsing fast on a known curve.
  3. Direct value (now). 0 = trivial. 3 = world-changing.
  4. Cascade value. 0 = none. 3 = unlocks an entire downstream field.
  5. Demonstration value. 0 = nothing learned by an early ugly attempt. 3 = proving possibility unlocks everything else.
  6. Window. 0 = will be there forever. 3 = closing now and irretrievably.
  7. Physical-resource dependency. 0 = essentially unconstrained by energy, atoms or permitted actions. 3 = severely constrained on at least one of the three. Note: this dimension is scored inverted relative to the others — high score is bad, low score is good.
  8. Crowding. 0 = saturated with capable teams. 3 = essentially no one.

The point is not to add the eight numbers. It is to look at the shape of the row. A problem that scores (3, 0, 3, 3, 1, 1, 0, 3) — easy to verify, flat cost curve, high direct value, high cascade, modest demonstration, no closing window, no resource constraint, no crowding — is a "do it now" problem. A problem that scores (1, 3, 2, 1, 0, 0, 2, 3) — hard to verify, fast falling cost, modest direct value, low cascade, no demonstration unlock, no closing window, real resource constraint, low crowding — is a "wait" problem regardless of how much you want to attack it.

For attacker-favoured dual-use problems, the cascade and demonstration scores invert: a high score becomes a reason for caution rather than for confidence. The dual-use modification is in Dual-use & catastrophic risk.

Scoring across scenarios, not against a single forecast

Each of the eight dimensions, when the discipline is honest, is scenario-conditional. The cost trajectory of synthetic biology depends on whether biosecurity tightens dramatically over the next five years. The closing window for indigenous languages depends on whether ML-assisted documentation reaches the remaining communities before generational hand-off completes. The cascade value of fusion depends on whether grid-scale storage solves the intermittency problem on a different curve. A score that ignores its own conditioning is a number masquerading as analysis.

The discipline borrowed from the scenario tradition (introduced via Pierre Wack and Royal Dutch Shell in Intellectual lineage) is to score each dimension across a small set of plausible, internally-consistent scenarios — three is usually right, four when the field is genuinely contested — rather than against a single implicit forecast. The scenarios should differ on the variables that most plausibly drive the score, not on cosmetic surface features. The output of a scenario-conditional scoring run is not a single eight-number row but a small matrix: dimensions on one axis, scenarios on the other, with a robustness reading at the foot.

The most useful reading is then the shape of robustness. A bet that scores well in two scenarios and catastrophically in one is not the same as a bet that scores moderately well across all three. The first is a directional bet on a specific scenario; the second is a robust position in the framework's preferred sense. Allocators should be conscious of which they are choosing. The framework's recommendation across the portfolio is to over-weight robust positions in the patient-infrastructure share and to take directional bets only in the moonshot share, where the convex pay-off and the antifragile by-products together justify the variance.

The retrospective stupidity index

Apply the same scheme to historical projects with hindsight. The stupidity index is the gap between what we should have known at the time and what we did. Keep them separate from the what we know now score; many projects look stupid in hindsight only because the cost curve dropped faster than anyone could reasonably have predicted.

A few rough categories:

  • Vindicated brute force: high cost at the time, looked unreasonable, retrospectively unlocked enormous cascade value. Human Genome Project (probably). Apollo (debatable). AlphaFold's reliance on decades of crystallography data. ImageNet labelling.
  • Honest mistakes: looked reasonable at the time, turned out the cost curve was steeper than anyone predicted. Many early hand-curation efforts in NLP. Some early bioinformatics infrastructure that was rebuilt by GPUs five years later.
  • Wilful waste: should have been obvious at the time that the cost curve was about to drop. Hand-tuned chess engines after 1997. Manual transcription efforts begun after good OCR existed. Many enterprise machine-learning projects begun in 2017 that were rebuilt with off-the-shelf transformers in 2020.
  • Ongoing question marks: things we are doing now that may sit in any of the above categories in ten years. (See Historical examples.)

The honest answer for most projects is "we don't know yet." That is fine. The point of the index is to build the habit of asking, not to issue verdicts.

Prize systems as retrospective signals

Two external prize systems — the Nobels and the Ig Nobels — are useful sources of calibration data for the retrospective stupidity index. Both reward work that has been done. Both are partial. Both are worth reading carefully if you want to build a sense of how the framework's verdicts compare with those of practising scientific communities, and where the framework is at risk of producing readings the world will eventually find embarrassing.

The Nobels are a lagging indicator. They reward work that has already turned out to be high-cascade-value, often decades after the fact. The 2024 prize in chemistry for AlphaFold and protein design was unusually fast for the Nobels; thirty- or forty-year gaps between the work and the recognition are more typical. The 1962 prize for the structure of DNA recognised work done in 1953. The 2013 prize in physics for the Higgs mechanism recognised work done in 1964, a year after the experimental confirmation. The pattern is consistent: the Nobel committee waits until the cascade has fired and the demonstration value is unambiguous to outside observers.

This makes Nobels a poor guide to current allocation. By the time a field is producing Nobel-level work the cascade has already fired and the obvious follow-on bets are crowded. As a guide to retrospective calibration, however, they are useful: the framework should recognise Nobel-winning work as having scored highly on the cascade and demonstration dimensions at the time it was done. If the framework's retrospective reading on a now-Nobel-winning project is poor, the framework probably has a bias the Nobel record can help expose. A worthwhile project for the retrospective stupidity index is to score every Nobel-winning piece of work from the twentieth century at the moment of the original work, and check whether the framework would have called it.

The Ig Nobels are the more interesting dataset. Most Ig Nobel-winning projects look ridiculous and are. Some look ridiculous and turn out to be unexpectedly important. The canonical case is Andre Geim, who shared the 2000 Ig Nobel for the magnetic levitation of frogs and then the 2010 Nobel in physics for the discovery of graphene — the same playful experimental sensibility, applied to two different problems, produced both outcomes. Several other Ig Nobel laureates have produced work the framework would have rated higher than the consensus did at the time: certain unusual materials investigations, several pieces of curiosity-driven biology that turned out to seed small but durable subfields, the magnetised cockroach studies and their downstream contribution to insect-locomotion research.

The Ig Nobels are a partial corrective to a real bias in the framework. The framework rewards problems with identifiable cascade or demonstration value — work whose importance you can articulate before doing it. Curiosity-driven work that does not have such a story in advance is systematically under-rated. The Ig Nobels are not exactly the missing dataset, but they are the closest publicly-available proxy for work that looked unimportant at the time and was not. Scoring the entire Ig Nobel back-catalogue retrospectively through the framework would produce a quantitative estimate of how often "this looks silly" predicted "this turned out to matter" — and a quantitative estimate of how often the framework, applied honestly, would have caught the importance the consensus missed.

For a working allocator, the practical implications are small but real. Reserve a portion of any portfolio — the framework's deliberately-unallocated curiosity-driven slice — for investigators whose track records include curiosity-driven results that have surprised the field. The Ig Nobel back-catalogue is one cheap source of such investigators. Treat the Nobel announcements each year as a calibration event for your own framework reading: did you flag the work in advance? Would the framework have? If not, where is the gap? A discipline of asking those two questions every October produces sharper allocation a year or two later. The pattern repeats across other prize systems too: the Turing Award, the Lasker, the Breakthrough Prizes, the Fields Medal — all are retrospective signals that the framework should recognise but should not chase.

The two failure modes

The framework guards against two opposite failure modes.

Premature optimism. "We can do it now" — true, but at fifty times the cost it will take in three years, with no demonstration value, no cascade value, and no closing window. This is the most common failure in well-funded research environments because the local incentives reward action.

Pathological patience. "We should wait until it's cheaper." True for the cost line, but the window is closing, the cascade is large, or the demonstration unlocks the next twenty problems. This is the most common failure in academia and in cautious foundations because the local incentives reward inaction.

A good framework, applied honestly, should produce roughly equal numbers of "go now" and "wait" verdicts. If it always says one or the other, it is being abused.

Brute force vs elegance

Brute force vs elegance

One of the more useful patterns in the historical record is the brute-force-then-elegance sequence. A problem looks intractable. An expensive, unglamorous, resource-intensive project demonstrates that it can be solved. Once possibility is established, a much cheaper, more elegant version follows, often quickly. The brute-force phase looks wasteful in retrospect — and is sometimes attacked at the time on exactly those grounds — but without it the elegant phase would not have arrived, or would have arrived later.

This section is about when that pattern applies and when it does not.

The pattern is worth seeing in the abstract before walking through the worked examples. It runs in two phases — an expensive, unglamorous demonstration that proves possibility, then a cheaper, more elegant version that consumes the substrate the demonstration produced. The diagram below sketches the shape; the cases that follow show what it has looked like in practice.

The two-phase pattern Phase 1 — Brute force Expensive, unglamorous, demonstrates feasibility, produces by-products and data cascade fires data, talent, infrastructure transfer Phase 2 — Elegance Cheap, refined, scalable, often by new entrants, consumes phase-1 substrate Examples Human Genome Project ImageNet labelling Decades of crystallography Tycho Brahe's observations Apollo programme Becomes $200 consumer genomes Modern computer vision AlphaFold Kepler's three laws SpaceX reusable rockets
The two-phase pattern. Phase 1 is expensive and unglamorous; phase 2 is fast and elegant; the cascade between them transfers data, talent and infrastructure that would otherwise have to be built from scratch.

The pattern

The Human Genome Project sequenced one human genome at extraordinary cost. The cost per genome then fell roughly a million-fold over twenty years. The cheap phase followed the expensive phase, not the other way round.

ImageNet was an expensive labelling project. Modern foundation models are trained on vastly more data, often gathered cheaply by scraping. The cheap phase followed the expensive phase.

Tycho Brahe spent decades collecting precise astronomical observations by hand. Kepler used Brahe's data to derive his three laws in much less time. The expensive phase produced the data; the elegant phase produced the theory.

Apollo demonstrated that humans could go to the moon. Reusable rockets — a vastly cheaper approach to spaceflight — followed forty years later. The demonstration may not have been strictly necessary for the cheaper phase, but the institutional knowledge and the public willingness to fund spaceflight at all almost certainly were.

AlphaFold trained on a Protein Data Bank built over decades of crystallography. The cheap, elegant model could not have existed without the expensive, slow accumulation of ground-truth structures.

The pattern is not universal but it is common enough that ignoring it is a mistake.

Why brute force precedes elegance

Three reasons recur.

The data the elegant version needs has to come from somewhere. Modern machine learning is unusually expensive of data, and the data was almost always gathered painstakingly by humans for some other purpose first. The expensive phase produces the substrate that the cheap phase consumes.

Possibility has to be proven before it can be funded at scale. A problem that no one has ever solved is much harder to fund than a problem that someone has solved expensively. The brute-force solution closes the question of feasibility, and once that question is closed the second-generation funders show up. This is why Apollo mattered as a demonstration even though no one will ever use Saturn V technology again.

Talent flows toward visible problems. A field that has produced one celebrated success attracts ten times the talent of a field that has not. The brute-force success creates the recruiting story for the elegant phase.

When brute force is wrong

The pattern is not universal. Brute force is the wrong move when:

There is no information cascade. If solving the problem produces no data, no infrastructure, no talent and no reputation that the cheap version can use, then doing the expensive version first is just doing the expensive version. Hand-tuned chess engines after 1997 are an example: each iteration produced little that the eventual neural-network version (AlphaZero) could use.

The cost decline is fast and exogenous. If the cost is going to fall by an order of magnitude in three years for reasons that have nothing to do with whether you act, then attacking now buys you a three-year head start at ten-times the cost. Often a bad trade. Many enterprise machine-learning projects begun in 2017 fall here.

The problem will be solved as a by-product of a different effort. Some problems become trivially solvable not because someone attacks them directly but because they fall out of an unrelated technology. Most of the "automate this transcription task" problems of the 2010s were solved as by-products of speech recognition and OCR research aimed at completely different markets.

Verification is so expensive that you cannot tell if the brute-force version is actually working. The point of a brute-force demonstration is to close the question of feasibility. If the demonstration is itself ambiguous, you have not closed the question, you have just spent a lot.

The vanity case

There is a category of brute-force project undertaken not because the framework justifies it but because the institution sponsoring it wants the prestige of having undertaken it. Big-science programmes can fall here, particularly ones with a famous PI and a vague enough goal that "success" can be declared regardless of outcome. The framework's purpose in these cases is partly to make the vanity legible, so that funders who care about it can choose it consciously rather than pretend it is something else.

A reasonable check: would a competing team of equal quality choose to attack this problem with this method, given the same budget? If the answer is no, the project is in part a vanity project. That is not necessarily fatal — vanity is a reliable funding mechanism — but it should not be confused with optimal allocation.

The decomposition move

Often the right answer is neither pure brute force nor pure patience but a decomposition. Identify the sub-problem whose solution unlocks the rest and brute-force that one; defer the others. The Human Genome Project effectively decomposed: it brute-forced the reference assembly and the sequencing infrastructure, then let the cost decline carry the field through to population-scale sequencing. ImageNet brute-forced the labels and let the algorithmic improvements come from the field.

The skill here is recognising which sub-problem is the linchpin. The framework's dependency-graph dimension (in Dimensions) is the relevant one: find the upstream node, attack it, watch the rest become tractable.

A heuristic

A working heuristic, useful for first-pass triage:

Attack with brute force when (a) the problem is upstream of many others, (b) demonstrating feasibility unlocks a discontinuous cascade, and (c) the by-products of the brute-force attempt — data, infrastructure, talent — are themselves valuable independent of whether the headline project succeeds. Otherwise, wait.

The third clause is the most important. Brute-force projects whose by-products are valuable are de facto cheaper than they look; brute-force projects whose by-products are not are essentially gambling on the headline outcome.

Implication for funders

For research funders, the practical implication is that willingness to fund expensive, ugly, demonstrational projects in fields whose cost curves are about to break is an underrated capability. Most funders cluster around the median of consensus, which means most funders are systematically underexposed to brute-force-then-elegance opportunities.

The exceptions — DARPA, the early HGP funders, certain private foundations, parts of the recent FRO movement — exist but are small relative to the opportunity. Building more of that funding capacity is one of the few interventions where the framework has a clear policy implication.

Moonshots, arbitrage & markets

Moonshots, arbitrage & markets

This section covers three related topics: how to think about high-risk, high-payoff bets within the framework; who is in a position to arbitrage mispriced problems; and what a market for problems would have to look like to actually function.

The three sit together because all three involve the same underlying pattern — the gap between the consensus price of attention to a problem and the rational price of attention to a problem. Closing that gap is the practical upside of having the framework at all.

Moonshots inside the framework

A moonshot, in the sense Elon Musk and others use the term, is a project with low probability of success and very high payoff if it succeeds. The naive expected-value calculation often justifies them; the harder question is whether they belong in any particular allocation portfolio.

The framework treats moonshots as a special case of the asymmetric payoff dimension. A problem with a 5% probability of producing a million-times-baseline outcome is, in expected-value terms, equivalent to a problem with a 100% probability of producing a fifty-thousand-times-baseline outcome. The accounting is the same. The psychology is not, and most allocation systems are bad at moonshots for that reason.

A few things follow. The most useful intellectual reference here is Nassim Nicholas Taleb, whose Fooled by Randomness, The Black Swan and Antifragile together constitute the modern reference for thinking about heavy-tailed distributions and convex payoffs. The framework's moonshot logic is Talebian. So is its argument that the reusability of by-products dimension matters more than the consensus credits — an antifragile project gives you more on the failure path than a fragile one gives you on the success path.

Moonshots should be funded in portfolios, not individually. A single moonshot has a high probability of failing. A portfolio of twenty independent moonshots has a high probability of producing at least one large success. The right unit of analysis is the portfolio, and the right allocator is one who can stomach the variance. Taleb's barbell strategy — small bets on extreme upside paired with safe positions, no middle — is the same shape applied to portfolio construction.

Moonshots benefit disproportionately from cheap by-products. The 5% case is the headline, but the 95% case still produces something — talent, methodology, partial results, infrastructure. A moonshot whose 95% case is a complete write-off is much worse than a moonshot whose 95% case produces a useful B-stream output. The framework's reusability of by-products dimension is what separates good moonshots from bad ones.

Moonshots should be matched to the right institution. Universities are bad at them (incentives reward incremental work). Most big companies are bad at them (incentives reward predictable returns). DARPA, the Bell Labs of its era, certain foundations, and a small number of for-profit research organisations are good at them. Mismatching the project type to the institution is one of the most common allocation failures.

Peter Thiel's question, reformulated

What do you strongly believe to be true that very few other people believe?

In framework terms: which problems are the consensus mispricing right now, and in which direction? The answer is the moonshot or the contrarian wait. Both are bets against the crowd; the framework's job is to make the bet legible to the bettor before they place it, and ideally to give them a clearer stop-loss than vibes.

The Thiel question is not a framework on its own — it works only when paired with a willingness to be wrong. The framework's contribution is to translate the contrarian instinct into something testable: when will the consensus update, what would update it, and how would I notice if I were wrong?

The human colossus

Musk's framing — that some problems are best solved by recruiting a large enough fraction of humanity's attention to bear on them — is correct but underspecified. The framework adds two things.

First, the colossus is not free. Pulling thousands of people onto a problem is itself a coordination cost, and the cost is paid by the next problem they would otherwise have worked on. Crowding effects are real; the marginal contributor on a saturated problem is worth less than the marginal contributor on a neglected one.

Second, the colossus model only works for crowdable problems. Some problems decompose well into independent units of work (Galaxy Zoo, Folding@home, Wikipedia). Others require a small group with concentrated context and cannot be sliced (theoretical physics, most basic mathematical research). Trying to apply the colossus model to non-crowdable problems wastes the colossus.

Who can arbitrage

The framework, applied honestly, identifies cases where the consensus is mispricing a problem. Several types of actor are in a position to act on that information.

Founders. Especially of deep-tech and research-driven companies. The framework is essentially a research-roadmapping tool, and the founders who do this kind of thinking explicitly tend to outperform those who do it implicitly.

Funders with patient capital. Foundations, family offices, certain sovereign funds and the new wave of long-horizon technology investors. The framework helps them justify counter-consensus bets to their boards.

National science agencies. In principle, governments are best placed to fund the brute-force-then-elegance projects whose payoffs accrue to society at large. In practice, most science funding is captured by lobbying, prestige and inertia, and the framework would imply substantial reallocation. This is politically hard but worth being honest about.

Individual researchers. The Hamming question — what are the most important problems in your field, and why aren't you working on them? — is exactly the framework applied at the individual scale.

AI agents allocating compute. A live and underdiscussed case. Increasingly, decisions about which problems to attack are being made by agentic systems with budgets of compute, money and tool calls. The framework is, in part, an attempt to articulate the decision rule those agents will need.

A market for problems

The most ambitious version of the framework is a market that prices problems explicitly. Several existing mechanisms partially do this.

Prizes. X-prize, Netflix prize, Kaggle competitions, the various Ansari and Lunar X Prizes. Useful when the problem can be specified precisely and the solution can be verified cheaply. Less useful for problems where defining "solved" is itself the hard part.

Advance market commitments (AMCs). Pioneered for vaccines, theoretically applicable to many other domains. A funder commits to buying any solution that meets specified criteria, removing the demand-side uncertainty. The framework's neglectedness and cascade value dimensions are exactly what AMC designers need to estimate when sizing the commitment.

Patent–prize hybrids. Michael Kremer's work on alternative incentive structures for innovation. Moves some of the value from the patent-protected monopoly to the prize-funded payment, with consequences for who can use the result.

DARPA challenges. Goal-directed contests with substantial budgets and tight deadlines. The autonomous-vehicle Grand Challenges (2004, 2005, 2007) are the canonical example. Powerful, but only for problems already known to be close to the tractability frontier.

Focused Research Organisations (FROs). A newer model: time-limited, mission-specific research orgs, somewhere between a startup and a research institute. Designed exactly for the brute-force-then-elegance phase of the curve.

Prediction markets on problem-solvability. A genuinely new instrument that the framework would benefit from. If you could trade contracts on "Problem X will be solved at cost Y by date Z," you would have a real-time price for problem-difficulty trajectories. Existing prediction markets (Polymarket, Manifold, the older Augur and PredictIt) have not yet seriously attacked this domain, and there are good reasons (problem definitions are slippery, settlement is hard) but the design is not impossible. A serious attempt would be valuable.

Why a real market is hard

A few obstacles deserve naming.

Defining the problem is half the problem. Markets need a settlement criterion. Most interesting problems do not have one until very late.

Time horizons are too long for most market structures. A market for problems whose solution arrives in fifteen years is hard to make liquid.

The relevant participants are not in the market. The best information about whether a problem will be solved cheaply in three years is usually held by a handful of researchers who have no instrument to express their view. Designing the disclosure incentives is not trivial.

Some problems should not be priced at all. Problems with severe dual-use risk, problems involving suffering of identifiable people now, problems whose solutions would be destabilising before complementary technologies arrive — these are cases where market efficiency is not the right metric. The framework should be honest that not every problem belongs in a market. The framework's specific treatment of dual-use and catastrophic-risk problems, including the four-category classification and the cases where the standard cascade and demonstration readings invert, is in Dual-use & catastrophic risk.

The arbitrage opportunity in practice

For now, the most productive uses of the framework are smaller-scale than a global market.

A single funder running a portfolio of twenty bets, scoring each one through the dimensions, will be better than the consensus on average — not because the framework is magic but because the consensus is sloppy.

A founder choosing between two product directions can use the framework to make the timing argument explicit, which usually surfaces the real bet hiding inside the choice.

A research institute can use the framework to defend an unfashionable bet to its board, which is more important than it sounds: most of the highest-value research bets are unfashionable until shortly after they pay off.

An individual choosing what to work on can use the framework to identify the small set of problems where the gap between what they know about the cost trajectory and what the consensus knows is largest. This is the closest individual analogue to the arbitrage trade.

In all four cases, the value of the framework is not that it gives the right answer. It is that it makes the question askable.

Historical examples

Historical examples

This section collects case studies and works through them with the framework. The point is partly to test the framework — does it make sense of the past? — and partly to build a reference library of examples that can be cited in shorter pieces.

The examples are clustered into broad types: vindicated brute force, demonstration unlocks, hard-to-call cases, and projects that turned out to be wasteful in retrospect or were obviously so at the time.

A note on certainty: I have tried to be honest about what is settled and what is contested. Several of the dollar figures and dates are widely cited but worth verifying before publication; the first verification pass and its findings are recorded in Methodology, and outstanding items are tracked in Open questions. Where the case is genuinely contested I have flagged it.

This section overlaps deliberately with the ranked lists in top_projects/. The lists rank cases through the framework with one-paragraph verdicts; this section works through a smaller set in more depth and shows the dimensional analysis at work.

Vindicated brute force

The Human Genome Project (1990–2003)

The HGP cost roughly three billion US dollars over thirteen years and produced the first reference human genome. By the time it finished, sequencing was already getting much cheaper; today a whole-genome sequence is a routine consumer product.

A naive analysis says we paid three billion for something that costs two hundred dollars now. A correct analysis is that we paid three billion to make it cost two hundred dollars now. The HGP forced the development of automated sequencers, generated the assembly software stack, trained a generation of bioinformaticians and produced the reference assembly that every subsequent sequencing run is aligned to. Without that demonstration, the cost curve would have started later and fallen more slowly.

Framework reading: high direct value, very high cascade value, deliberately accelerated the cost decline rather than ridden it, demonstration value enormous. A go now call that was correct.

The ImageNet labelling project (2007–2010)

Fei-Fei Li's team distributed annotation work to roughly forty-nine thousand Amazon Mechanical Turk workers across one hundred and sixty-seven countries between 2008 and 2010, producing labels for over fourteen million images across roughly twenty-two thousand categories. At the time this was an unreasonable amount of human effort to spend on a dataset, especially given that the field's then-dominant view was that better algorithms, not more data, would unlock progress.

The dataset turned out to be the substrate on which the deep-learning revolution ran. AlexNet (2012) demonstrated possibility; everything since has built on it. Without ImageNet, the demonstration would have happened years later or on a different architecture.

Framework reading: low headline difficulty, plausibly high cascade value (correctly priced by Li, mispriced by almost everyone else), no closing window — this was a pure cascade-value bet, won on the strength of one researcher's conviction.

Crystallography → AlphaFold

Decades of patient work on X-ray crystallography produced the Protein Data Bank, with structures of roughly two hundred thousand proteins by the time AlphaFold trained on it. Each structure had cost months of expert effort. A counterfactual world in which crystallographers had downed tools in 1990, reasoning that "one day a computer will solve this," would have left AlphaFold without training data and the field essentially stuck.

Framework reading: each individual structure was high-cost, low-direct-value at the time. The cumulative cascade value, realised forty years later, was enormous. Vindicated patience-with-action: the brute-force phase was not glamorous and not urgent, but it was the prerequisite.

The Apollo programme (1961–1972)

A contested case. Twenty-five billion dollars in 1960s money, roughly two hundred and fifty billion today. Direct value of going to the moon: contested. Cascade value: integrated circuits, materials science, software engineering practices, the modern systems-engineering discipline.

The honest framework reading is that Apollo was over-funded relative to its direct goal but produced cascade value that justified a substantial fraction of the spend. Whether it justified all of it remains a real argument. As a demonstration project it succeeded; as a programme of generic scientific yield it was probably suboptimal compared with what the same money might have done if differently directed.

Demonstration unlocks

AlexNet (2012)

The model itself was not technically novel — convolutional networks dated to the 1980s. What AlexNet did was prove that, with enough data and GPU compute, deep learning would beat hand-engineered computer vision by a wide margin. The demonstration was the unlock; everything since has been the cascade.

The right way to read AlexNet through the framework is as the moment a long-running cost curve crossed a threshold. Anyone applying the framework in 2011 would have rated computer vision as "wait — costs are falling fast." Anyone applying it in 2013 would have rated it as "go now, the demonstration is done and the cascade is starting."

AlphaFold (2018, 2020)

Same shape as AlexNet but for protein structure. The CASP competitions had been run for decades on the assumption that protein folding was a hard, slow, expert-intensive problem. AlphaFold 2 collapsed that view in a single year. The cascade — drug discovery, enzyme design, basic biology — is still unfolding and the framework would have flagged it correctly only if the dependency graph (sequence databases, crystallography ground truth, transformer architectures) had been read together.

LIGO and gravitational waves (2015 detection)

A project that took roughly forty years from concept to first detection, with extraordinary precision-engineering investment. Direct scientific value: confirmation of general relativity in a new regime. Cascade: a new branch of astronomy. Cost: substantial but justifiable on the strength of the unique evidence available no other way.

Framework reading: closing window concerns were minimal (gravitational waves are not going anywhere) but the cumulative learning curve in precision interferometry will pay off across many adjacent fields. A patient brute-force project that was probably correctly timed.

Slow human work that paid off

The Oxford English Dictionary (1857–1928)

Seventy-one years and an army of unpaid volunteers reading and slipping quotations. Today, with full-text corpora and machine reading, the same job would take weeks. But the OED was the substrate on which lexicography became a science, and the methodology — dated citations from primary sources — is now the standard for any historical dictionary.

Framework reading: vindicated. The brute force was the only available method at the time, the cascade value was enormous, and the demonstration value (that a serious historical dictionary could be built at all) shaped the discipline.

Linnaean classification

Carl Linnaeus and his successors hand-classified hundreds of thousands of species over two centuries. Today, DNA barcoding can classify in hours what once took years. Yet the framework, the Latin binomials and the museum collections built around them remain the spine of modern biology. Without them, the genomic era would have lacked a vocabulary.

Decipherment of Linear B (Ventris, 1952)

A few people, decades of pattern-matching, no machinery. The unlock — recognising Linear B as an early Greek script — opened a new region of Bronze Age history. A modern attempt would use computers and would probably have succeeded earlier; whether the later, cheaper version would have produced the same depth of scholarship is harder to say. Pure pattern-recognition problems on small corpora are now firmly on a fast cost-decline curve.

Pottery shard reassembly

Tens of thousands of hours have been invested in physically reassembling pottery from Greek, Roman, Egyptian and other sites. Computer vision and learned 3D shape-matching are now meaningfully better than humans at the geometric piece of the job. The remaining question is whether the interpretive work — deciding what a re-assembled vessel meant in its context — should also be deferred. Probably not: the interpretive layer benefits from the cheap reassembly, and the human time saved should be redirected, not eliminated.

Framework reading: the manual reassembly era was justifiable up to roughly 2015 and is largely no longer the right allocation of expert time. The shift is uneven across institutions; framework-aware funders could create real value by accelerating it.

Manuscript stylometry

Was it Shakespeare? Did Paul write Hebrews? Centuries of careful scholarly attribution work, increasingly augmented and in some cases resolved by computational stylometry. The framework reading is mixed — for the canonical questions the answer is "the cheap version has arrived, redirect human attention to the harder questions of what to make of the answer." The interpretive cascade is large; the brute-force philological phase is mostly over.

Mass digitisation

Google Books and Project Gutenberg

A bet that digitised text would be valuable enough to justify the legal and operational cost. At the time, the cascade value was unclear; today, the cascade includes essentially every modern language model, vast quantities of cultural and historical scholarship and improved access to old works.

If Google Books had not existed, modern AI would have been delayed by some years and modern humanities scholarship would be substantially poorer. The right framework reading is go now, with the cascade only fully visible fifteen years later. Worth flagging that the legal and ethical complexity around the project was real and is still being negotiated.

The Vatican Library digitisation; the Smithsonian's collection scans; herbarium digitisation

Ongoing programmes to digitise old books, specimens and artefacts. Each one is a brute-force project today; each one will produce cascade value as AI systems become better at extracting structure from those scans. The framework reading is go now for the core scanning work, wait for any expensive interpretive layer that can be applied retrospectively.

Citizen science and crowdsourcing

Galaxy Zoo (2007– )

A million volunteers classifying galaxy morphologies. Direct value: a labelled dataset. Cascade value: training data for the machine-learning systems that have since taken over the routine classification work, freeing humans for the genuinely ambiguous cases. A clean example of crowdable problem at the right moment.

Folding@home, BOINC

Distributed computing for protein folding and other simulation problems, predating modern AI methods. Some of the work has been superseded by AlphaFold; some — molecular dynamics and exotic systems — remains the right tool. Framework reading: the folding part of the portfolio was correctly executed but is now mostly cheap; the dynamics part is still on a slow curve and worth continuing.

Probably wasteful in hindsight

Astrology natal-chart computation

Centuries of human effort. Direct value: zero in any honest accounting. Cascade value: arguably non-trivial in that the calculation needs of astrology drove improvements in observational astronomy and trigonometry. As a problem-allocation matter, the cascade was a positive externality of an essentially worthless headline goal.

Hand-tuned chess engines after Deep Blue (1997)

The framework reading would have flagged this as "stop" by 1999 at the latest. Some hand-tuned engines persisted for another decade with diminishing returns. AlphaZero (2017) closed the question entirely.

Many enterprise NLP projects, 2015–2020

Hand-rolled rule-based systems for entity extraction, sentiment analysis and document classification, often built at cost of millions, frequently rebuilt three years later on top of off-the-shelf transformers. A consistent failure of timing thinking inside large organisations.

Some manual cell counting in microscopy

A staple PhD task for decades. Largely automated by modern image analysis. The framework reading would have predicted this earlier than the field admitted.

Hard to call

The Manhattan Project

Direct value: contested. Cascade value: enormous and largely negative depending on accounting (nuclear weapons, energy, naval propulsion, radiation medicine). Framework reading: the project was rationally timed if you accept the strategic premise; whether the strategic premise was correct is outside the framework's scope.

SETI

Decades of patient listening with little to show. The framework reading depends entirely on priors. If you think the probability of detection is non-zero and the value is enormous, the cost has been justified. If you think the probability is zero, no cost is justified. SETI is a clean illustration of how the framework's outputs depend on inputs that are not themselves objective.

The Higgs boson at the LHC

Roughly five billion dollars to confirm a particle predicted decades earlier. Direct value: closing a hole in the Standard Model. Cascade value: precision-engineering and computing infrastructure, but more modest than for HGP or LIGO. Probably justifiable, possibly overspent.

Manual road mapping and navigation pre-2005

The work that built the underlying maps that Google and others later digitised was vast. In retrospect, it was on the right side of the cost decline — without it, modern navigation would have started from scratch.

Patterns that emerge

A few patterns recur across the cases above and are worth stating explicitly.

The first is that demonstration value is consistently underweighted by sceptics and consistently overweighted by enthusiasts. The HGP, AlphaFold, AlexNet, Apollo and LIGO all derive a substantial share of their value from removing the question of whether something is possible.

The second is that cascade dependencies are usually invisible until they fire. No one in 1990 was funding crystallography because of its eventual contribution to AlphaFold. The framework's most important practical use may be the discipline of asking what would this enable that we cannot currently do? — even when the answer is speculative.

The third is that the right time to attack a problem is often before the curve has clearly turned. By the time everyone agrees the curve has turned, the prize is gone. The pattern is: do the unreasonable brute-force version that proves the curve is about to turn, and the elegant version follows almost automatically.

The fourth is that retrospective verdicts are unreliable. Several projects that look obviously correct now (HGP) were heavily contested at the time, and several that look obviously wasteful now (hand-tuned chess engines after 1997) were defended as recently as 2010. The framework should produce verdicts that are more confident than the consensus when it has reason to be, and less confident when it has reason to be.

To add

Many further examples are worth working up. A non-exhaustive list, for Open questions:

  • The Encyclopædia Britannica versus Wikipedia.
  • The Tycho Brahe → Kepler arc as the original brute-force-then-elegance archetype.
  • ENIAC and early electronic computing.
  • The Linnaean → Mendel → Watson-Crick → genome chain.
  • The decipherment of Egyptian hieroglyphs (Champollion, with the Rosetta Stone as the unlock).
  • Cataloguing the night sky (Hipparchus, Tycho, Hubble Space Telescope, Gaia mission).
  • ENCODE, BRAIN, the Human Cell Atlas as recent attempts at HGP-shape projects.
  • The Connectome Project.
  • Bibliometric digitisation efforts (Web of Science, Sci-Hub).
  • Materials Project / Open Quantum Materials databases.
  • Carbon capture R&D portfolios as a live current case.
  • Fusion programmes (ITER, the new private fusion bets) as a live timing question.
Intellectual lineage

Intellectual lineage

The question this framework is trying to answer is not new. Versions of it have been asked, with varying levels of formality, for as long as people have had to choose what to work on. This section maps the lineage and the adjacent fields, both to give credit and to make clear what is genuinely new in this framing versus what is being recombined.

The contribution this framework attempts is not a new insight; it is the integration of several existing ones, with the time-dependence of tractability promoted to first-class status.

The problem-list tradition

David Hilbert (1900) delivered a now-famous lecture in Paris listing twenty-three open problems in mathematics. The list shaped a century of work. Several of the problems remain open; many were solved; one or two turned out to be ill-posed. Hilbert's gesture is the founding act of explicit problem-allocation thinking. He did not have a framework — his selection was based on personal taste and the consensus of his peers — but he made the point that picking which problems to work on is itself a discipline. Hilbert's later motto, inscribed on his tombstone, was Wir müssen wissen — wir werden wissen (we must know — we will know): a more confident statement of the framework's underlying optimism than anything in the present repository.

Stephen Smale (1998) updated Hilbert's list for the twenty-first century. The Smale problems are an interesting test of the framework: about a third have been solved or substantially advanced; the rest remain stubbornly open. A retrospective scoring through the framework would be a useful exercise.

The Clay Millennium Prize Problems (2000), with their seven-problem list and one-million-dollar bounties, are an explicit prize-based version of the same gesture. One has been solved (Poincaré); six remain. The prize structure is itself a primitive market for problems.

These traditions are direct ancestors. The framework here is essentially what Hilbert did for mathematics, generalised to all fields and made explicit about the time dimension.

Richard Hamming and the "important problems" tradition

Richard Hamming's "You and Your Research" (1986), a talk that has shaped more careers than most academic books, asks bluntly:

What are the most important problems in your field? Why aren't you working on them?

Hamming's argument is that important problems are not solved by accident; they are solved by people who have decided in advance that they are important and have organised their working life around them. The framework here is essentially the formalisation of Hamming's question, with explicit attention to why now or not now.

If only one piece of pre-existing literature were read alongside this framework, "You and Your Research" should be it.

Karl Popper and the centrality of problems

Karl Popper treated all knowledge-seeking as a process of problem-solving: a problem arises, tentative theories are proposed, they are tested, the best survive, and a new problem emerges from the test. In Popper's view, the choice of problem is the seat of scientific progress. Conjectures and Refutations (1963), Objective Knowledge (1972), and the posthumous All Life is Problem Solving (1999, the title of which is itself the thesis) are the relevant texts.

All life is problem solving.

— Karl Popper

Popper's contribution to this framework is conceptual rather than methodological: it locates problems, not theories or facts, as the unit of analysis. That move is what allows the framework's question — which problems, when? — to be coherent at all.

Imre Lakatos and progressive research programmes

Imre Lakatos introduced the distinction between progressive and degenerative research programmes. A progressive programme is one whose theoretical adjustments lead to novel predictions and new discoveries. A degenerative one keeps adjusting to fit observations without producing new ones.

The framework here borrows the underlying intuition: a problem is worth attacking if working on it produces progressive by-products — new techniques, new data, new sub-problems. A problem whose attempted solutions only produce defensive elaborations of existing theory is a degenerative target.

Thomas Kuhn and the structure of normal science

Thomas Kuhn observed that most scientific work is "normal science" — incremental puzzle-solving inside an accepted paradigm — punctuated by rare paradigm shifts. The framework here applies most cleanly to normal-science allocation; paradigm shifts are by their nature unpredictable and the framework cannot do much to help time them. Kuhn is useful as a reminder that the framework's domain has limits.

David Deutsch and the soluble universe

David Deutsch's The Beginning of Infinity (2011) argues that all problems are soluble given enough knowledge — a position that sits squarely behind the optimism of this framework. If problems were intrinsically insoluble, timing would not matter; everything would be either too early or too late forever. The framework presupposes a Deutschian view that the cost of solving things is, in general, going to keep falling.

It is worth noting that Deutsch is not naive about this — he distinguishes between problems and evils, and the framework here covers only the first. Some things are not "problems to be solved" in any framework's sense; they are conditions to be lived with or managed.

Vannevar Bush and "as we may think"

Vannevar Bush's 1945 essay "As We May Think", published in The Atlantic Monthly in the closing months of the Second World War, is the founding document of the idea that the human capacity to select among problems and findings is itself the bottleneck of scientific progress. Bush wrote: "There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers — conclusions which he cannot find time to grasp, much less to remember, as they appear." The Memex he imagined was a tool for navigating that mountain. Modern AI systems are the more ambitious heirs. The framework here is partly an answer to Bush's question of how to choose in a world where everything is queryable.

Nick Bostrom and differential technological development

Nick Bostrom's Superintelligence (2014) and his earlier "Existential Risk" papers articulate the principle of differential technological development: that we should accelerate beneficial technologies relative to dangerous ones, recognising that technologies do not arrive in a fixed order.

The framework here is direct kin: differential problem-solving applies the same move at the level of individual problems. Bostrom's tools are sharper for catastrophic-risk reasoning; the framework here is more general but less sharp on that specific question. Used together they cover more ground than either does alone.

The framework's specific modification for dual-use problems — the four-category classification (benign-default, defender-favoured, attacker-favoured, symmetric) and the cases where the standard cascade and demonstration readings invert — is in Dual-use & catastrophic risk. It is the part of the framework that most directly inherits Bostrom's lineage, and the part that depends most on his work for grounding.

The Effective Altruism cause-prioritisation tradition

The EA tradition (Toby Ord, Will MacAskill, GiveWell, Open Philanthropy, the Future of Humanity Institute and the various successor organisations) has the most developed existing framework for prioritising problems: importance × tractability × neglectedness, with various refinements for moral uncertainty, risk and reversibility.

The framework here borrows IT N essentially wholesale and adds time-dependence as a first-class fourth term. The EA framework is implicit about timing — tractability is a function of when you ask — but treats it as an input rather than a variable. Promoting it to a variable changes some of the conclusions, particularly in fast-moving technical domains.

Operations research, optimal stopping, real options

The mathematical machinery of the framework is borrowed from finance and operations research: real-options theory (Trigeorgis, Dixit and Pindyck), optimal stopping (the secretary problem, the Wald-style sequential analysis), Bayesian decision theory and dynamic programming. None of this is original to the framework; the contribution is to import it consciously.

The most useful single piece of finance theory is the option value of waiting — the recognition that not acting is a positive choice that has value, especially under uncertainty about whether costs will fall. Under-recognising option value is one of the more common failure modes of decision-makers in research and policy.

Pierre Wack, Royal Dutch Shell and the scenario tradition

The corporate scenario tradition has a clear forerunner in Herman Kahn at the RAND Corporation in the 1950s and his subsequent work at the Hudson Institute. Kahn's On Thermonuclear War (1960) and Thinking About the Unthinkable (1962) introduced the idea that the most dangerous strategic problems could only be reasoned about by constructing detailed, internally-consistent stories about how they might unfold and then arguing about the stories rather than about the underlying probabilities. The discipline was political-strategic in origin — the alternative was to refuse to think about nuclear war at all — and Wack's contribution was to translate it from defence to corporate use.

Pierre Wack joined Royal Dutch Shell's London office in 1971. With Ted Newland and a small Group Planning team, he spent the next decade refusing to do what the rest of the corporate planning profession was doing — committing to a single forecast — and instead built sets of plausible, internally-consistent narratives about how the future could unfold. The team's 1972 scenarios included a sharp oil-price shock as a credible path. When the embargo arrived in October 1973, Shell was the only major capable of acting fast: the response had been pre-rehearsed, the senior managers' mental models had been pre-stretched, and the company emerged from the crisis in a structurally stronger position than its peers. Subsequent work on the 1979 second shock, the 1980s natural-gas glut and the late-1980s anticipation of the Soviet Union's collapse turned scenario planning from a Shell idiosyncrasy into a recognised corporate discipline.

What Wack actually did is worth being precise about, because the term scenarios has been diluted by management consultancies into something more anodyne. Wack's scenarios were not best-case, base-case, worst-case forecasts. They were structurally distinct stories about what could happen, each internally consistent, each rooted in identifiable driving forces and explicitly named critical uncertainties, each producing a different posture toward the same decision. The point was not to predict; the point was to expand the decision-maker's mental model so that the actual future, when it arrived, did not require entirely new thinking. As Wack put it in his two 1985 Harvard Business Review essays, the test of a good scenario is not whether it comes true — it is whether it changes the mind of the decision-maker who reads it.

Scenarios deal with two worlds: the world of facts and the world of perceptions. They explore for facts but they aim at perceptions inside the heads of decision-makers.

— Pierre Wack

Several intellectual descendants are worth naming. Peter Schwartz, who succeeded Wack at Shell and later founded Global Business Network, wrote The Art of the Long View (1991), the most accessible practitioner's book on the discipline. Kees van der Heijden's Scenarios: The Art of Strategic Conversation (1996) is the more rigorous methodological treatment. Adam Kahane's facilitation of the 1991–92 Mont Fleur scenarios for South Africa's transition out of apartheid — drawing directly on Shell's tradition — is the canonical demonstration that the technique works at the scale of national political settlements as well as corporate strategy. Ged Davis ran Shell's scenarios into the early 2000s and then took the discipline into the United Nations Development Programme and the Intergovernmental Panel on Climate Change, where successive emissions scenarios became the substrate that climate policy debates have been arguing over for two decades.

The bridge to Differential Problem-Solving runs through the framework's central problem: the future is not a single forecast. The cost-trajectory dimension is well-defined for technologies on a Wright's-law curve, but the curve is conditional on a continuing political economy that supports it. The closing-window dimension assumes one canonical future in which the window closes; in some plausible futures the window remains open for a generation and the framework's verdict reverses. The cascade-value dimension assumes a particular configuration of downstream technologies and policies; in alternative scenarios the cascade fires earlier, later or not at all. Each dimension is, when read honestly, scenario-conditional.

Honest scoring on the framework's dimensions therefore requires building the small set of scenarios over which the score is averaged — or at the very least the small set over which the score's robustness is checked. A bet that scores well in one scenario and catastrophically in another is not the same as a bet that scores moderately well across three. The framework as it stands is single-scenario by default; the scenario tradition is the explicit reminder to score conditionally and to look for the robust positions.

The framework's portfolio shapes can be re-read in scenario language. The patient infrastructure share — closing-window cataloguing, foundational datasets, public-good measurement — is the set of bets that pay off across the entire envelope of plausible futures. The moonshot share is the set of bets that pay massively in one scenario and produce useful by-products in the rest; in Talebian terms, antifragile across the future-distribution. The just-early share is timed against a specific scenario about the cost trajectory and is therefore the most fragile if the scenario is wrong. Allocators who use the framework without the scenario discipline tend to over-weight the just-early share, because that is where the framework's vocabulary is most legibly applied. Allocators who add the scenario discipline tend to over-weight the patient infrastructure and moonshot shares, because those are the positions that the future-distribution rewards.

There is one specific failure mode the scenario tradition warns against that the framework as written is vulnerable to. Single-scenario thinking turns each dimension into a number; the number gets argued over; the argument is about the number rather than about the unstated forecast on which the number depends. Scenarios force the unstated forecast to be made explicit, which is exactly what the framework's limits and falsifiability discipline (in Limits & falsifiability) requires.

The complement is true as well. Scenario planning by itself produces narrative without timing — the discipline tells you which futures to plan for but not when to act. The framework supplies the timing layer that scenarios on their own do not. Used together, scenarios define the state-space, the framework times the moves within it, real options price the right-to-wait, and Wright's law shapes the cost-trajectory inside any single scenario. The four are complementary tools at different layers of the same decision; the framework here is most useful when read as one of those layers rather than as the whole machine.

Wright's law and learning curves

Theodore Wright's 1936 paper observed that the cost of producing aircraft fell by roughly the same percentage with each doubling of cumulative production. The pattern has held for a remarkable range of technologies since: solar panels, batteries, sequencing, semiconductors, satellite launches.

The framework leans heavily on Wright's law as the empirical backbone of the cost trajectory dimension. If you can identify a dominant input on a Wright-law curve, you can forecast the cost of attacking the problem with substantially better accuracy than naive linear projection.

Claude Shannon and the measurement of information

Claude Shannon's A Mathematical Theory of Communication (1948) is the founding document of information theory and one of the most consequential single papers in the history of science. Shannon defined entropy, mutual information, channel capacity, and the noisy-channel coding theorem, and in doing so gave the world a way to measure information rigorously for the first time.

Information is the resolution of uncertainty.

— Claude Shannon

Shannon's contribution to this framework is methodological. The scoring scheme — the rough zero-to-three rubric, the dimensions checklist, the retrospective stupidity index — all assume that allocation decisions can be characterised quantitatively, even crudely. That assumption is Shannon's legacy. Before 1948, the question how much information is in this message? had no formal answer; after 1948, it had one. The framework's broader claim — that how worth attacking is this problem? can be characterised by a small number of measurable dimensions — is a downstream descendant of the Shannon move.

Shannon also matters substantively. The cost-trajectory dimension borrows from learning-curve theory, which in turn assumes the broader information-theoretic view that progress is the accumulation of resolved uncertainty. Each Wright's-law doubling of cumulative production reduces the cost of the next unit; each unit teaches something the next one no longer needs to learn. Shannon's framework gives that intuition its mathematical spine.

For modern AI and machine learning the connection is more direct. Cross-entropy, mutual information, the Kullback–Leibler divergence and the entire vocabulary of probabilistic modelling come from Shannon. The framework's arguments about the cost-trajectory of foundation-model training are essentially arguments about how cheaply we can compress, align and route information — arguments in the Shannon tradition. The 2024 Nobel in physics for Hopfield and Hinton, and the Turing Award generation that preceded it, all sit on Shannon's substrate.

Carlota Perez and technological revolutions

Carlota Perez's Technological Revolutions and Financial Capital (2002) describes a recurring pattern: an installation phase of new technology (high investment, low immediate returns, much speculation) followed by a deployment phase (broad-based productivity gains as the technology becomes infrastructure).

For the framework, Perez is a reminder that the cascade value of a problem-solution often depends on which phase the underlying technology is in. The same problem solved in the installation phase versus the deployment phase has very different downstream consequences.

Brian Arthur and the combinatorial nature of technology

Brian Arthur's The Nature of Technology (2009) treats technology as a combinatorial system — new technologies are recombinations of older ones — and shows that the rate of new combinations grows with the size of the existing inventory. The framework borrows this for its cascade and dependency-graph dimensions: a problem's value often depends on what it makes recombinable.

Tyler Cowen and the great stagnation question

Tyler Cowen's The Great Stagnation (2011) and the surrounding literature ask whether the easy problems are running out — whether we have already picked the low-hanging fruit. The framework's answer is implicit but worth making explicit: the low-hanging fruit moves. Each generation's low-hanging fruit is unreachable for the previous one. The question is not whether there is fruit but whether you can see it from where you are standing.

The science of science

A more recent literature (Dashun Wang, Albert-László Barabási, James Evans and others) brings a quantitative empirical approach to the questions this framework addresses: which projects produce more, which collaborations are productive, which kinds of papers anticipate breakthroughs. The framework here is theoretical where the science of science is empirical; the two should converge.

Forecasting and superforecasting

Philip Tetlock's work on forecasting and the Good Judgement Project introduced the discipline of calibrated probability estimation into a domain previously dominated by punditry. The framework's scoring scheme is improved by Tetlock-style discipline: do not just say a problem is "easy" or "important," put a probability and a number on it, and track whether you are calibrated over time.

Nassim Nicholas Taleb and asymmetric bets

Nassim Nicholas Taleb's Fooled by Randomness (2001), The Black Swan (2007), Antifragile (2012), and the broader Incerto corpus are the modern reference for thinking about heavy-tailed distributions, convex payoffs, and the kinds of bet where the framework's standard intuitions break down.

Taleb's contribution to this framework is concentrated in two places. The first is the asymmetry of payoff dimension and the moonshot logic in Moonshots, arbitrage & markets. The framework's central observation about moonshots — that a portfolio of twenty bets with twenty-per-cent hit rates and ten-times outcomes dominates a portfolio of five conservative bets with eighty-per-cent hit rates and two-times outcomes — is Taleb's barbell strategy applied to problem allocation. Heavy-tailed distributions justify portfolio strategies that a normal-distribution decision-maker rejects, and most institutional allocators decide as if the world were thin-tailed.

The second is the antifragile concept, which is more subtle and more useful than its popular reading. An antifragile bet benefits from volatility; a fragile one is destroyed by it. The framework's reusability of by-products dimension is essentially the antifragile move: if the project fails on the headline goal but the data, talent and infrastructure are still valuable, the bet is antifragile in Taleb's sense — disorder gives you more, on average, than order. The brute-force-then-elegance pattern compounds the same property: even when the brute-force phase fails by its own metric, the substrate it leaves behind is what the elegant phase consumes.

Taleb's broader epistemology — that we know less than we think, that complex systems behave in ways our linear models do not capture, and that decision-making under genuine uncertainty differs structurally from decision-making under measurable risk — sits behind the framework's repeated insistence that the rubric is not a calculator. The framework prefers Knightian uncertainty (cited in Models & scoring) over false-precision risk modelling for exactly the reasons Taleb has been arguing for two decades.

Practical descendants

The framework also has practical, less-academic ancestors:

  • DARPA's programme-manager model: a small number of empowered individuals making concentrated bets on the brute-force-then-elegance frontier.
  • Y Combinator's "request for startups": an explicit problem-allocation tool aimed at founders.
  • The Open Philanthropy cause-prioritisation reports: rigorous applications of the EA framework to specific problems.
  • The various FRO experiments (Convergent Research, the new Astera and Arc institutes): institutional bets on the brute-force-then-elegance pattern.

What is genuinely new

To be clear about what this framework adds, given the lineage above:

The contribution is not the dimensions; almost every dimension exists somewhere in the literatures cited here. The contribution is in promoting time-dependence of tractability to a first-class variable, integrating the dimensions into a single working framework, providing a retrospective scoring scheme to test the framework against history, and being explicit about the institutional implications — who can arbitrage, what a problem-market would need, where the brute-force funding capacity should live.

If a working title is needed, Differential Problem-Solving captures the academic positioning. Problem Timing is the trade-book version of the same thing.

The ranked lists
Methodology

Methodology

Four lists follow this section. Fifty smartest projects in history. Fifty dumbest. Fifty live projects today that the framework predicts will age badly. And fifty live possibilities that the framework predicts will age unusually well — the attack now counterpart to the third list.

The lists are opinionated, partial and fallible. Several of the calls will be wrong. Some of the wrong ones will be obvious in retrospect. The point is not to be right on every entry; the point is to be specific enough to be argued with. A framework that produces no falsifiable verdicts is not a framework, it is a mood.

Scoring criteria

Each project is scored, implicitly, on a small set of dimensions drawn from Dimensions:

Resources allocated. Money, person-hours, opportunity cost. Estimates only; many are rough orders of magnitude.

Direct value. What the project produced in concrete terms.

Cascade value. What the project made possible downstream that did not exist without it.

Demonstration value. Whether the project removed the question of feasibility for an entire category.

By-product value. What was generated even if the headline goal failed.

Counterfactual. Would this have happened anyway, on what timeline, at what cost?

Curve position. Was the work timed correctly with respect to the cost-trajectory of its dominant inputs?

The rough rule used to order the lists: value generated divided by resources spent, weighted heavily by cascade and demonstration where they apply. A modest project with enormous cascade value (Mendel's pea experiments, the CERN web) outranks a vast project with merely large direct value.

What "smartest" means

A smart project is one where the resources allocated produced disproportionate value, where the timing was correct, and where a competent team applying the framework at the time would have rated the bet highly.

Several projects on the smartest list looked unreasonable when funded. ImageNet was widely thought to be a strange use of mTurk dollars. The HGP was contested. AlphaFold was treated as a publicity exercise by some structural biologists in 2018. Smartness is partly resistance to consensus.

A few were vindicated brute-force projects (HGP, ImageNet, Apollo). A few were elegant first-mover bets (Mendel, the CERN web, Wikipedia). A few were patient cataloguing efforts whose value emerged decades later (Linnaeus, the Protein Data Bank, the OED). The framework rewards all three patterns when correctly executed.

What "dumbest" means

Two distinct categories appear on the dumbest list and they should not be conflated.

The first is projects that were misallocated by the standards of what should have been known at the time. Hand-tuned chess engines after 1997 fall here. Many enterprise rule-based NLP systems built between 2017 and 2022 fall here. The cost trajectory of the dominant input was visible, the alternatives were available, and the institution proceeded anyway. These are the ones the framework most clearly indicts.

The second is projects that looked reasonable at the time and turned out to be wasteful for reasons that were genuinely hard to predict. Some early bioinformatics infrastructure rebuilt by GPUs five years later sits here. So do many early electric-vehicle and battery bets that arrived a decade too soon. Hindsight makes them look obvious; they were not. These are listed but not condemned.

A third category — outright fraud (Theranos, FTX, the WeWork IPO attempt) — sits on the dumbest list because the framework indicts the institutional allocation that funded them. The fraud is the moral failure; the framework failure is by the people who were supposed to verify and did not.

I have tried to be honest about which category each entry falls into. Several are still arguable.

What "likely to age badly" means

The third list is the most exposed. It calls live projects that the framework predicts will look misallocated in five to ten years.

The criteria for inclusion are deliberately strict. Each entry meets at least two of these:

  • The dominant input on which the project depends is on a steep cost-decline curve, and the project does not bend the curve.
  • A foundation-model or general-purpose technology is closing on the capability faster than the project's competitive position can absorb.
  • The crowding is high relative to the marginal team's contribution.
  • A simpler, cheaper alternative is already visible in the early-adoption phase.
  • The institutional momentum behind the project is path-dependence rather than thesis.

I have named categories where I am confident. I have named specific organisations or projects only where the framework reading is strong and the public record supports the call. Several entries are genuinely arguable; I have flagged the ones where I have lower confidence.

I expect to be wrong on roughly twenty per cent of this list. The point is not to score perfectly. The point is to make the bets explicit so that the framework can be evaluated five years from now.

On dollar figures and dates

Several figures in the lists are widely cited but worth verifying. The HGP cost is variously given as roughly three billion to thirteen billion depending on what is counted; I have used the most commonly-cited figure for the headline programme. ImageNet labelling costs, AlphaFold development costs, Apollo costs, FTX losses and similar all have published estimates that vary by a factor of two or more. I have used round numbers and stated when they are estimates.

Where dates are contested (when does a project start; what counts as completion), I have used the most defensible bracket and noted when a different convention would change the framework reading.

What "high-scoring possibilities" means

The fourth list is the symmetric counterpart of the third. It names live opportunities — fields, programmes, infrastructure bets — that the framework predicts will look obvious in retrospect, and that are currently funded at a small fraction of the rate the dimensions imply.

Each entry is paired with its framework reading (which dimensions fire) and a rough estimate of the resources required to attack it. The aim is to make the bet legible enough that an allocator can compare it against what they are currently funding, not to produce a budget.

The list is not a moral document. It tells you which problems are well-timed; it does not tell you which problems are good to solve. The morality has to come from elsewhere.

The list will be wrong on a fraction of entries — typically, the framework expects roughly twenty per cent within five years. Some entries will be obviated by an unexpected upstream solution; some will turn out to face binding constraints the framework underweighted. Both will be visible in the next revision.

On the politics of these lists

The smartest list is uncontroversial in its substance, though some of the entries are still debated.

The dumbest list will offend several constituencies: people who worked on the projects, organisations that funded them, alumni networks, journalists who promoted them at the time. The framework reading is offered as analysis, not as personal criticism. Several of the projects on the dumbest list employed extraordinary people doing competent work; the failure was at the level of allocation, not execution.

The live-and-likely-to-age-badly list will offend everyone the previous two lists missed. The defence is that every framework worth having makes calls that the consensus rejects, and that quietly believing this without saying it publicly is worse than saying it and being wrong.

If your project is on the third list and you disagree, the right response is a paragraph on which curve the project is bending, which cascade it is firing, or which window it is closing. If the paragraph is convincing, the entry comes off the list in the next revision.

Verification log

Figures and dates in the lists were initially written from memory and have been cross-checked against public sources where the claim was prominent or load-bearing. The first verification pass was completed in May 2026 and covered the following entries:

Verified against public sources and corrected where necessary: the Human Genome Project headline cost (the often-cited three-billion-dollar figure is the projected total; actual outturn was roughly two and three-quarter billion, with broader related programme spending around three and three-quarter billion). The Apollo programme inflation-adjusted total (roughly two hundred and fifty billion in 2020 dollars using NASA's New Start Index, with somewhat lower estimates using the CPI). Theranos total raised (over seven hundred million dollars in venture funding; some reports include later debt rounds bringing the total higher). FTX customer-funds gap (around eight to nine billion dollars at peak shortfall). Quibi total funding (one and three-quarter billion dollars). WeWork peak SoftBank-led valuation (forty-seven billion dollars in early 2019). AOL–Time Warner merger (announced at one hundred and sixty-five billion dollars equity, accompanied by the largest single annual loss in corporate history at the 2002 write-down). Boeing 737 MAX direct grounding cost (roughly twenty billion dollars in direct costs to Boeing, with substantial additional indirect costs). Volkswagen Dieselgate (over thirty billion euros in fines, settlements and remediation). AT&T's DirecTV acquisition (forty-nine billion equity, sixty-seven billion enterprise value including assumed debt; sold for roughly seven and a half billion in 2025). AT&T's Time Warner acquisition (eighty-five billion equity, one hundred and eight billion including debt). Microsoft's Nokia mobile acquisition (seven point two billion paid, seven point six billion written down in 2015). NHS National Programme for IT (roughly twelve billion pounds projected; about ten billion actually spent before dismantling). Berlin Brandenburg Airport (roughly seven billion euros final, against an original two billion budget). Iridium (roughly five to five and three-quarter billion dollars total Motorola investment). NASA SLS development (approximately thirty-two billion dollars through 2025, with broader Artemis investment substantially larger). ITER (over twenty-five billion euros against an original six-billion budget, with further increases and a nine-year delay confirmed in 2024). NEOM and "The Line" (announced at five hundred billion dollars in 2017; internal estimates reportedly grew to multiples of that figure before being scaled back in 2025–26). F-35 lifecycle (over two trillion dollars projected over a 94-year programme life through 2088).

Soft figures retained where the public record is ambiguous: ImageNet's direct annotation budget, AlphaFold's specific development cost, several of the smaller historical entries where the original estimates were public but inflation-adjusted comparisons vary. Where a figure is qualitative ("modest", "substantial"), it has been deliberately kept qualitative.

Still to verify in subsequent passes: the Soviet-era programme costs (Buran, Plan for the Transformation of Nature, the Aral Sea irrigation diversion), several of the smaller corporate-failure figures (Sears, Ford Edsel, New Coke), several mega-event figures (Sochi 2014, Rio 2016), and the China-specific entries on the current list.

The verification log will be updated alongside each scheduled revision of the lists. Submissions of corrections, with sources, are welcome.

— Siri Southwind

50 smartest

The highest-leverage bets human beings have ever made.

Modest efforts that founded entire disciplines. Single individuals who picked the right problem and produced cascades that compound to this day. Patient cataloguing programmes that ran for decades and produced the substrate of the next century. Brute-force demonstrations that bent the cost curve and made everything downstream cheap.

Most of these were contested, ridiculed or under-funded when undertaken. That is part of how the list works.

If you are deciding what to spend your life on, this is the company you should aim to keep.

1. Mendel's pea experiments

(1856–1863, Brno) One Augustinian monk, a monastery garden, seven years of patient crossing. Resources: effectively zero. Cascade value: founded modern genetics. Mendel's work was ignored for thirty-five years before being rediscovered, which is the canonical reminder that the framework verdict and the contemporary consensus diverge regularly. The single highest cascade-to-resource ratio in the recorded history of science.

2. Maxwell's equations

(1865) Four equations on a few pages. Resources: one person's salary for a few years. Cascade value: classical electromagnetism, special relativity, modern electronics, every wireless technology that has ever existed. The cleanest case of theoretical compression in physics.

3. Tim Berners-Lee's hypertext system at CERN

(1989–1991) A side project in a physics lab. Resources: trivial; one researcher, modest hardware. Cascade value: the world wide web. Berners-Lee gave the protocols away rather than patenting them, which is itself part of the framework reading — value that compounds via standards is multiplied by the openness with which it is released.

4. Newton's *Principia
  • (1687)

A single book funded by Edmund Halley, who paid the printing costs himself when the Royal Society demurred. Resources: small. Cascade value: founded modern physics, calculus, celestial mechanics and the analytical method that defined the scientific enlightenment. The fact that Halley underwrote it is a small lesson about patient capital in the right hands.

5. Einstein's 1905 papers

(annus mirabilis) Four papers in a single year by a patent clerk. Resources: a clerk's salary. Cascade value: special relativity, the photoelectric effect (and quantum mechanics), Brownian motion (and atomic theory), mass-energy equivalence. A single individual producing roughly an entire decade of foundational physics in his spare time.

6. Alan Turing's "On Computable Numbers"

(1936) One paper. Resources: trivial. Cascade value: founded computer science, the theory of computation, the decidability framework that underpins everything since. The Turing machine model is still the canonical formalism for what computation is.

7. Claude Shannon's *A Mathematical Theory of Communication
  • (1948)

One Bell Labs paper. Resources: small. Cascade value: founded information theory, modern coding, every digital communication system, the entropy concept that is now pervasive in machine learning. Shannon's paper is what the rest of the twentieth century's information technology was built on.

8. The Bell Labs transistor

(1947) Bardeen, Brattain and Shockley demonstrated the first working solid-state amplifier. Resources: substantial but bounded by Bell Labs research budgets. Cascade value: every modern electronic device on the planet. The single largest demonstration unlock in twentieth-century technology.

9. The Watson, Crick, Franklin and Wilkins double helix

(1953) Two short papers in Nature, building on Franklin's X-ray crystallography. Resources: small relative to the impact. Cascade value: molecular biology, modern medicine, biotechnology, the entire genomic revolution. The single most consequential structural-biology insight ever published.

10. The Linnaean classification system

(1735–) Carl Linnaeus and successors hand-classified hundreds of thousands of species. Resources: substantial cumulative human time. Cascade value: the universal binomial vocabulary that every subsequent biology, ecology and evolutionary discipline depends on. Without Linnaeus, the genomic era would have lacked a vocabulary.

11. The Oxford English Dictionary

(1857–1928) Seventy-one years, an army of unpaid volunteer readers, the methodology of dated quotations from primary sources. Resources: large in cumulative human-hours, small in capital. Cascade value: founded scientific lexicography. Today's full-text corpora and language models could not exist without the methodology the OED established.

12. Project Gutenberg

(1971–) Michael Hart began transcribing texts on a borrowed mainframe in 1971. Resources: tiny; volunteer time. Cascade value: enormous — the substrate of digital humanities, language modelling and decades of free access to canonical literature. The original brute-force-then-cascade case in digital text.

13. Wikipedia

(2001–) A few engineers and a culture of volunteer editing. Resources: tiny relative to the encyclopaedic output. Cascade value: the largest single source of training data for modern language models, the de facto reference of the internet, a generation of self-directed learners. Has produced more downstream value per dollar than any other knowledge project of the last fifty years.

14. arXiv

(Paul Ginsparg, 1991–) A single physicist's pre-print server. Resources: trivial; one person, one server. Cascade value: completely reshaped how physics, mathematics and increasingly biology and computer science publish. Has accelerated discovery by years across multiple fields.

15. UNIX

(Thompson, Ritchie, 1969–) A small team at Bell Labs built a portable, multi-user operating system on a discarded PDP-7. Resources: a side project. Cascade value: the design ancestor of macOS, Linux, Android and the operating-system tradition behind essentially all modern computing. A small team writing the right thing at the right time has rarely been so consequential.

16. Linux

(Linus Torvalds, 1991–) A Finnish student announced a free hobby kernel. Resources: trivial. Cascade value: every Android phone, every cloud server, every data centre on Earth runs Linux. The most successful open-source project ever undertaken and one of the highest cascade-value bets in the history of software.

17. ImageNet

(Fei-Fei Li and team, 2007–2010) Over fourteen million labelled images across roughly twenty-two thousand categories, with annotation work distributed through Amazon Mechanical Turk to roughly forty-nine thousand workers across one hundred and sixty-seven countries between 2008 and 2010. Resources: modest direct funding plus Li's reputational risk in pursuing what most of the field considered a peculiar use of mTurk dollars. Cascade value: catalysed the deep-learning revolution. Treated as wasteful by parts of the field at the time. Among the highest-leverage data-creation projects ever undertaken.

18. AlexNet

(Krizhevsky, Sutskever, Hinton, 2012) Two students and an advisor used GPUs to train a convolutional network on ImageNet. Resources: a few thousand dollars of GPU time. Cascade value: the current AI era. The cleanest demonstration unlock in modern computing. Everything that has happened in machine learning since traces back to this single result.

19. AlphaFold and the Protein Data Bank cascade

(PDB 1971–, AlphaFold 2018–2021) Two intertwined projects: the patient cumulative crystallography of the PDB and DeepMind's transformer-based predictor. Resources: PDB is decades of expert effort; AlphaFold is hundreds of person-years and substantial compute. Cascade value: collapsed the structural-biology bottleneck, accelerating drug discovery, enzyme design and basic biology. The clearest brute-force-then-elegance case in living memory.

20. The Human Genome Project

(1990–2003) Roughly three billion dollars over thirteen years for the first reference human genome. Resources: large. Cascade value: bent the sequencing cost curve by orders of magnitude, founded modern genomics, enabled the two-hundred-dollar consumer genome. Vindicated brute force at scale.

21. The polio vaccine

(Salk, Sabin, 1950s) Public funding, a National Foundation for Infantile Paralysis, the largest field trial of the era. Resources: substantial but proportionate. Cascade value: near-eradication of polio, demonstration that a globally distributed public-health programme could destroy a major pathogen. Salk declined to patent the vaccine, which compounded the cascade.

22. The mRNA vaccine platform

(Karikó, Weissman, decades, 1990s–2020s) Decades of underfunded persistence by Katalin Karikó and Drew Weissman to make mRNA tolerated by the immune system. Resources: persistent small grants in a sceptical field. Cascade value: fast-design vaccine platforms, a four-billion-dose COVID response, a pipeline of next-generation vaccines and cancer therapies. Vindicated patience under consensus opposition.

23. The eradication of smallpox

(1959–1980) Coordinated global vaccination campaign led by the WHO. Resources: roughly three hundred million dollars over two decades. Cascade value: the first complete eradication of a human disease and the demonstration that such a thing is even possible, which directly motivated the polio and rinderpest campaigns.

24. The Green Revolution

(Borlaug, Swaminathan, 1940s–1960s) Norman Borlaug's wheat breeding work and the parallel Indian and Mexican programmes. Resources: modest agricultural research budgets. Cascade value: averted mass famine for an estimated billion people. One of the cleanest cases of cascade value being wildly underpriced at decision time.

25. CRISPR-Cas9 as a gene-editing tool

(Mojica's CRISPR observations 1987, Doudna and Charpentier 2012) A series of incremental observations in microbiology became, in 2012, a general-purpose gene editor. Resources: small individual labs across two decades. Cascade value: reshaped genetic engineering, biotechnology and medicine. A clean example of patient curiosity-driven research producing a discontinuous unlock.

26. The Apollo programme

(1961–1972) Roughly two hundred and fifty billion dollars in current money to put humans on the moon. Resources: vast. Cascade value: integrated circuits, modern systems engineering, materials science, software engineering practices. Whether the programme was optimally allocated is contested; the framework reading is that it was a cascade-and-demonstration project that mostly justified its cost. A close call but defensible.

27. The Manhattan Project

(1942–1945) Two billion dollars in 1940s currency, a hundred and thirty thousand people. Resources: extraordinary. Cascade value: nuclear weapons, nuclear power, naval propulsion, medical isotopes, modern computing (Los Alamos was a forcing function), the systems-engineering tradition. Morally complex; framework-smart in its allocation given the strategic premise. Whether the strategic premise was correct is outside the framework's scope.

28. The Bell Labs research tradition

(1925–1980) A single industrial laboratory produced the transistor, the laser, information theory, UNIX, C, the photovoltaic cell and a substantial fraction of twentieth-century communication theory. Resources: a fraction of AT&T's monopoly profits. Cascade value: the entire information age. The strongest case in history for letting smart people work on hard problems with patient capital.

29. ARPANET and the early internet

(1969–) A few research labs connected by packet-switched links. Resources: small DARPA budgets. Cascade value: the internet. The classic case of a public research investment whose social return on capital is essentially incalculable.

30. The integrated circuit

(Kilby, Noyce, 1958) Two parallel inventions in the same year. Resources: small lab budgets. Cascade value: every modern electronic device, six decades of Moore's law, the entire computing industry that followed. The downstream multiplier is among the largest in technological history.

31. The Intel 4004 microprocessor

(1971) A single small team at Intel, designing the first commercial microprocessor for a Japanese calculator company. Resources: modest commercial R&D. Cascade value: launched the microcomputer, then personal-computing, then the smartphone era. A side project that reshaped the world.

32. Tycho Brahe's astronomical observations

(1572–1601) Decades of patient naked-eye observation at Uraniborg with unprecedented precision. Resources: substantial royal patronage. Cascade value: provided the data Kepler used to derive the laws of planetary motion. The original brute-force-then-elegance case in science.

33. Kepler's laws of planetary motion

(1609–1619) Johannes Kepler analysing Brahe's observations to extract three precise laws. Resources: trivial computational labour by one person. Cascade value: empirical foundation for Newton's mechanics; the single most important transition from descriptive to predictive astronomy.

34. Darwin's Beagle voyage and *Origin of Species
  • (1831–1859)

A five-year voyage and twenty more years of writing. Resources: modest patronage. Cascade value: founded evolutionary biology and reshaped every life-science field that followed. A patient single-investigator project at the highest end of intellectual leverage.

35. The *Encyclopédie
  • (Diderot, d'Alembert, 1751–1772)

A twenty-year, twenty-eight-volume effort to systematise knowledge. Resources: substantial volunteer-and-paid labour. Cascade value: shaped the European Enlightenment and provided the model for every encyclopaedia since. A precursor to Wikipedia by two and a half centuries.

36. Harrison's marine chronometer

(1730s–1770s) John Harrison's solo development of a clock accurate enough at sea to determine longitude. Resources: small grants and royal prizes. Cascade value: enabled safe transoceanic navigation, transformed global trade and exploration. A textbook case of an individually-driven hard-engineering project producing strategic-scale value.

37. The Hubble Space Telescope

(1990–) Roughly ten billion dollars over its lifetime. Resources: large. Cascade value: revolutionary observational astronomy, public engagement with science, training of a generation of astronomers, and the Deep Field images that recalibrated our cosmic perspective. The early mirror flaw nearly killed it; the in-orbit repair is itself a small framework lesson about reversibility.

38. LIGO

(1992–2015) Roughly a billion dollars over two decades to build the most precise interferometer ever constructed. Resources: large. Cascade value: opened gravitational-wave astronomy as a new branch of observational science. The kind of patient brute-force project that only a public funder could undertake.

39. The Sloan Digital Sky Survey

(2000–) A wide-area survey of hundreds of millions of celestial objects. Resources: modest by big-science standards. Cascade value: the dataset on which a generation of astronomical and cosmological work has been built. A clean case of patient cataloguing producing compounding scientific return.

40. Voyager 1 and 2

(1977–) Two probes, launched in a single rare planetary alignment. Resources: roughly a billion dollars in current money. Cascade value: the first close-up science from the outer planets and now the only spacecraft in interstellar space. A timing-window project that could not have been deferred without losing the alignment.

41. The standardised shipping container

(McLean, 1956–) An American trucker, Malcom McLean, demonstrated that putting cargo in a standard steel box would change everything. Resources: a single private investment. Cascade value: collapsed the cost of international trade by an order of magnitude and reshaped global manufacturing. The clearest case of a humble-looking project producing decades of compounding logistical value.

42. The bar code and GS1 standards

(1973–) A small industry consortium agreed on a universal product code. Resources: trivial. Cascade value: modern retail, supply chains, inventory management. A standards-coordination project whose value compounds with every additional adopter.

43. PubMed and NCBI

(1988–) The US National Library of Medicine's free index and database infrastructure for biomedical literature and data. Resources: modest public funding. Cascade value: the working substrate of every modern biomedical researcher. Without PubMed, the pace of medical research over the past three decades would have been measurably slower.

44. The Internet Archive

(Brewster Kahle, 1996–) A non-profit digitising and preserving the web and other media. Resources: small charitable funding. Cascade value: the only meaningful institutional memory of the early internet, plus the most significant book-digitisation effort outside Google. A closing-window project whose true value will be visible in a century.

45. The Protein Data Bank

(1971–) A patient, decades-long deposition of solved protein structures by the global crystallography community. Resources: cumulative researcher effort, modest infrastructure. Cascade value: the training data for AlphaFold. Without the PDB, AlphaFold would have had nothing to train on. The most important scientific dataset of the twentieth century.

46. SpaceX's reusable rocket programme

(2002–) A private bet that orbital launch could be made an order of magnitude cheaper through reusability. Resources: roughly five to ten billion dollars to crack the technology. Cascade value: collapsed launch costs, opened a renaissance in space-based industry, demonstrated that capital-intensive aerospace can be done outside national programmes. The textbook current-era brute-force-then-elegance case.

47. The eradication of rinderpest

(declared 2011) A coordinated veterinary effort culminating in the second-ever eradication of a disease, the first in livestock. Resources: modest, sustained. Cascade value: removed a centuries-long scourge of African and Asian agriculture. Less famous than smallpox but framework-equivalent in elegance of execution.

48. Galaxy Zoo and citizen-science platforms

(2007–) A million volunteers classifying galaxies. Resources: trivial. Cascade value: a labelled dataset that subsequent ML systems consumed; the demonstration that citizen-science could scale; the opening of the broader Zooniverse ecosystem across multiple disciplines. A clean case of a crowdable problem attacked at the right moment.

49. The Royal Society's experimental tradition

(1660–) Not a project but an institutional invention: regular meetings, publication of Philosophical Transactions, the demand that claims be testable. Resources: trivial. Cascade value: institutionalised the scientific method in Europe and produced the model for nearly every scientific society since. Counts as a project for framework purposes because it is one of the longest-running compounding institutional bets in history.

50. The decoding of Linear B

(Ventris and Chadwick, 1952) An amateur architect and a Cambridge philologist worked out that an ancient Aegean script encoded an early form of Greek. Resources: nothing beyond two people's spare time. Cascade value: opened a new region of Bronze Age history to scholarship and remained the methodological template for subsequent decipherment efforts. A small project at the highest end of intellectual leverage.


Open list. Submissions of strong candidates that should displace existing entries are welcome. The next obvious additions to consider: the Beagle voyage as distinct from Darwin's later work, the Fossil hunt traditions of the early 19th century, the development of steam-engine theory by Carnot, the standardisation of the metric system, the discovery of penicillin, the World Health Organisation's vaccine programmes, and the Hong Kong / Singapore / Taiwan economic-development bets of the post-war period.

— Siri Southwind

50 dumbest

The most expensive misjudgements in modern history.

Each one funded by people who should have known better. Each one defended by institutions whose only check on bad ideas was the absence of someone willing to say no. Several driven by political logic that overrode what the evidence already said.

The aggregate cost runs to several trillion dollars and many decades of redirected effort.

Read these and ask which one your current project most resembles.

1. The Maginot Line

(France, 1929–1940) Roughly three billion francs of static fortifications oriented to refight the trench warfare of 1914–1918. Resources: vast for the era. Outcome: outflanked through the Ardennes in six weeks. Framework reading: a textbook fight-the-last-war project. The signals that mobile warfare was coming were available; the institution was incapable of swapping. The cost of certainty in a moving cost-trajectory is exactly this kind of monument.

2. The Gallipoli campaign

(1915–1916) A naval and land assault on the Dardanelles intended to knock the Ottomans out of the First World War. Resources: roughly half a million casualties combined. Outcome: stalemate, evacuation. Framework reading: vast resources committed to an ambitious geographic moonshot whose preconditions for success — surprise, naval bombardment effectiveness, defender weakness — were not in place. The kind of bet a prediction market would have priced very differently than the war cabinet did.

3. The Aral Sea irrigation programme

(Soviet, 1960s–) Diverted the rivers feeding the Aral Sea to grow cotton in central Asian deserts. Resources: substantial sustained engineering investment. Outcome: the world's fourth-largest lake is now mostly dust. Framework reading: an ecological closing-window destroyed by a project optimised on a single output (cotton tonnage) with no scoring of cascade losses. One of the highest negative cascade-value projects in history.

4. The Soviet *Plan for the Transformation of Nature
  • (1948–1953)

A vast programme of forest belts, irrigation and climate engineering meant to remake Soviet agriculture. Resources: extensive over five years. Outcome: largely abandoned after Stalin's death; the parts that were built produced modest agricultural gain at significant ecological cost. Framework reading: ideology and central authority over scientific verifiability; a pure single-axis trap.

5. The Lysenko biology campaign

(USSR, 1930s–1960s) Trofim Lysenko's pseudoscientific theory of inheritance was made official Soviet biology under Stalin. Resources: decades of state-imposed dominance over the entire field; promotion of Lysenkoists, suppression and imprisonment of Mendelian geneticists. Outcome: agricultural failures, the loss of a generation of Soviet biology, and a setback that the field never fully recovered from. Framework reading: a political-imitation programme dressed as science; the framework's verification cost dimension was deliberately disabled by political fiat. A reminder that the worst framework failures are sometimes intentional.

6. The Strategic Defense Initiative — "Star Wars"

(US, 1983–1993) Ronald Reagan's missile-defence programme, intended to develop space-based interceptors that could destroy intercontinental ballistic missiles. Resources: roughly thirty billion dollars over a decade. Outcome: no deployable system; some of the research and infrastructure flowed into later programmes. Framework reading: an ambitious moonshot whose verification cost (testing whether the proposed defences would actually work against a sophisticated adversary) was structurally infinite. Some defenders argue the programme contributed to ending the Cold War; that is contested and outside the framework's scope.

7. Project Mohole

(US, 1957–1966) A National Science Foundation project to drill through the Earth's crust to the Mohorovičić discontinuity and recover mantle material. Resources: roughly fifty million 1960s dollars. Outcome: cancelled by Congress with the boundary not reached. Framework reading: a project whose technical preconditions (deep-water drilling at unprecedented depths) had not yet been demonstrated. Brute-force-too-early; the by-products (improved drilling techniques) were valuable but did not justify the headline cost. Subsequent ocean drilling has produced the science Mohole was meant to produce, more cheaply.

8. Project Pluto

(US, nuclear-ramjet cruise missile, 1957–1964) A nuclear-reactor-powered cruise missile intended to fly low over enemy territory irradiating everything beneath it. Resources: roughly two hundred million 1960s dollars. Outcome: cancelled because no one could figure out how to test the thing. Framework reading: a project whose verification cost was effectively infinite was not framework-tractable from inception. A textbook should-have-stopped-earlier case.

9. The NHS National Programme for IT

(UK, 2002–2011) The largest civilian IT contract in history at the time, intended to integrate health records across the NHS. Resources: approximately twelve billion pounds spent before the programme was effectively abandoned. Outcome: a small fraction of the original scope delivered. Framework reading: a coordination-cost trap. The complexity of harmonising clinical practice across an entire national health service was never priced honestly. A case study in how procurement structures can prevent a project from being killed when it should be.

10. The Soviet Buran space shuttle

(1976–1993) A near-clone of the American Space Shuttle. Resources: an estimated fourteen billion roubles. Outcome: flew once, unmanned, in 1988; the programme was abandoned and the prototype destroyed in a hangar collapse in 2002. Framework reading: a strategic-imitation project pursued because the rival had one. The framework's question — why this, why now, what cascade does this fire that an alternative does not — was never asked.

11. The NASA Constellation programme

(2005–2010) A planned successor to the Space Shuttle including new launch vehicles and a return to the Moon. Resources: roughly nine billion dollars before cancellation. Outcome: cancelled by the Obama administration in 2010. Framework reading: a programme whose cost-trajectory assumptions were inherited from earlier eras while commercial space — SpaceX in particular — was about to drop the price of orbital launch by an order of magnitude. Some of the work was redirected into SLS, which is itself a contested case.

12. Concorde commercial service

(1976–2003) The technical achievement is real and the engineering elegant. The commercial operation lost money for participating airlines for almost its entire life. Resources: enormous public R&D plus operational losses. Outcome: retired after the Air France crash and 9/11 demand collapse. Framework reading: a beautiful demonstration whose elegant version (cheaper, ubiquitous supersonic) never followed because no one bent the right curve.

13. The Fifth Generation Computer Systems Project

(Japan, 1982–1992) A roughly five-hundred-million-dollar national push to build parallel logic-programming machines as the future of AI. Resources: substantial. Outcome: produced little of lasting commercial or scientific value. Framework reading: bet on the wrong substrate. The Japanese institution chose Prolog and parallel inference machines; the cost-trajectory of general-purpose computing was about to make the entire approach obsolete. A high-profile case of mistaking the consensus paradigm for the right paradigm.

14. Iraq reconstruction and nation-building budget

(2003–2011) Roughly sixty billion dollars allocated to reconstruction, with extensive evidence of fraud, waste and unfinished projects. Resources: extreme. Outcome: large fractions unaccounted for; long-term reconstruction outcomes weak. Framework reading: a coordination-cost trap of the highest order. The framework's verdict is on the allocation methodology, not on the moral arguments about the war itself.

15. The Bay of Pigs invasion

(1961) A CIA-organised invasion of Cuba that failed within seventy-two hours. Resources: roughly forty-six million 1961 dollars plus enormous strategic cost. Outcome: total tactical failure and one of the largest single-step deteriorations of US-Soviet relations. Framework reading: a project whose preconditions for success were systematically misjudged by the planning institution; the verification of those preconditions was treated as adversarial rather than as honest review.

16. The original Healthcare.gov launch

(US, 2013) The federal health-insurance exchange was unusable for months after launch. Resources: roughly half a billion dollars in the original development. Outcome: emergency rebuild, hundreds of millions in additional costs. Framework reading: not a technical-impossibility failure; a procurement-and-management failure. The cost trajectory of building such a system in 2013 was favourable; the institutional capacity to use that trajectory was not.

17. Sochi 2014 Winter Olympics infrastructure

(2007–2014) Roughly fifty-one billion dollars of investment, the most expensive Olympics ever held. Resources: extreme. Outcome: most facilities fell into limited use. Framework reading: large fractions diverted into corruption and prestige construction. The framework reading is on the allocation, not on the games themselves.

18. Rio 2016 Olympics infrastructure

(2009–2016) A similar profile to Sochi at smaller scale. Resources: roughly thirteen billion dollars. Outcome: most facilities under-used or derelict within five years. Framework reading: chronic underpricing of decay rate of value in mega-event projects. A pattern that recurs in nearly every Summer Olympics outside London 2012 and Paris 2024.

19. The Berlin Brandenburg Airport

(2006–2020) Construction began in 2006, planned to open in 2012, finally opened in 2020. Resources: roughly seven billion euros against an original budget of two billion. Outcome: an airport, eight years late. Framework reading: a textbook case of coordination cost and verification cost (fire-safety certification, in particular) being underpriced at inception.

20. The California High-Speed Rail Project

(2008–) Voted in 2008 with a thirty-three-billion-dollar budget and a 2020 completion target. Resources: now estimated at over one hundred billion dollars with a substantially scaled-back scope. Outcome: ongoing. Framework reading: a project with strong direct value if completed and a high probability of being technically superseded (autonomous vehicles, aviation electrification) before the cascade fires. Functions here as an early entry on a list it may eventually graduate to.

21. The Romanian Casa Poporului — People's Palace

(1984–1989) Nicolae Ceaușescu's vanity construction in central Bucharest, the second-largest administrative building in the world. Resources: roughly three billion dollars in 1980s currency, twenty thousand workers, the demolition of a fifth of historic Bucharest. Outcome: largely empty for decades. Framework reading: a regime-glorification project whose direct value to the population was negative and whose cascade value is essentially the lessons subsequent regimes have not learned from it. A reminder that the framework can be used by autocrats and fails to constrain them.

22. The Spruce Goose / H-4 Hercules

(Howard Hughes, US-government-funded, 1942–1947) A vast wooden flying boat intended for transatlantic transport. Resources: tens of millions of dollars and most of Hughes's reputation. Outcome: flew once, never used. Framework reading: a personal-conviction project pursued past the point at which the war it was meant for had ended. A clean case of a sunk-cost continuation that no longer made sense once its strategic premise had evaporated.

23. The AOL–Time Warner merger

(2000) A one hundred and sixty-five billion dollar all-stock merger at the peak of the dot-com bubble. Resources: the entire market capitalisation of two large firms. Outcome: roughly two hundred billion dollars of equity value destroyed over the decade following. Framework reading: a strategic bet on a thesis (synchronous old-and-new media convergence) that was specifically wrong, executed at exactly the moment when the cost-trajectory of internet distribution was about to make the old-media half of the merger irrelevant.

24. AT&T's mega-acquisitions of DirecTV and Time Warner

(2015 and 2018) DirecTV at forty-nine billion dollars in equity (sixty-seven billion total enterprise value including assumed debt); Time Warner at eighty-five billion in equity (one hundred and eight billion including debt). Combined value destruction: well over one hundred billion dollars across spinoffs and write-downs. AT&T sold its remaining DirecTV stake in 2025 for roughly seven and a half billion dollars. Framework reading: a pair of strategic bets on convergence theses — pay-TV plus broadband, content plus distribution — that were specifically wrong about where the cost-trajectories of streaming, cord-cutting and content economics were going. Both businesses were spun out within five years of acquisition at substantial losses.

25. The Daimler-Chrysler merger

(1998–2007) A "merger of equals" that destroyed roughly thirty-six billion dollars in shareholder value. Resources: vast. Outcome: divestiture nine years later. Framework reading: cross-cultural integration costs underpriced; strategic rationale weak. Several similar mergers (HP-Compaq, BMW-Rover) have similar profiles.

26. HP's acquisition of Autonomy

(2011) HP paid eleven billion dollars and wrote down nearly nine billion within a year. Resources: vast. Outcome: a decade of litigation. Framework reading: a strategic acquisition pursued at the wrong price for the wrong reasons; the framework's neglectedness and cost-trajectory questions about the underlying business would have flagged it.

27. Microsoft's acquisition of Nokia's mobile phone business

(2013) Microsoft paid roughly seven point two billion dollars for Nokia's handset division and wrote down essentially the entire amount within two years. Resources: substantial; cost roughly eighteen thousand jobs. Outcome: Windows Phone wound down; the acquired business effectively dissolved. Framework reading: a defensive acquisition into a platform war that the cost-trajectory of mobile development had already settled. The framework would have flagged the bet as attack with caveats at best, probably skip if the question had been asked honestly.

28. Yahoo's acquisition of Tumblr

(2013, sold 2019) Yahoo paid eleven hundred million dollars for Tumblr; the asset was sold to Automattic in 2019 for about three million. Resources: significant. Outcome: ninety-nine point seven per cent value destruction. Framework reading: an acquisition made on cultural-relevance grounds at the peak of a network's cycle; the cost-trajectory of social platforms was clearly downward for Tumblr's category by 2014.

29. The original Iridium

(Motorola, 1991–1999) A constellation of sixty-six satellites for global mobile phone coverage. Resources: roughly five billion dollars. Outcome: bankruptcy nine months after service launch. Framework reading: a textbook curve-mispriced bet — terrestrial cellular cost was collapsing throughout the project, hollowing the market by the time the satellites were on station. The infrastructure later found a niche under different ownership; the original allocation was a misread of the curve.

30. The Boeing 737 MAX MCAS development

(2015–2019) The decision to add the Manoeuvring Characteristics Augmentation System rather than redesign the airframe, with safety-verification corners cut to preserve the marketing claim of common type rating. Resources: modest in development; over twenty billion dollars in subsequent groundings, settlements and lost sales; three hundred and forty-six lives lost. Framework reading: a verification cost dimension that was deliberately underpriced for short-term commercial reasons. The moral failure is severe; the framework failure is institutional, in the procurement and certification arrangements that allowed verification to be treated as a cost to minimise.

31. Volkswagen Dieselgate

(2009–2015) Software designed to detect emissions-test conditions and reduce pollution levels in test mode while permitting much higher emissions in normal driving. Approximately eleven million vehicles affected worldwide. Resources: substantial engineering effort to design and maintain the deception. Outcome: over thirty billion euros (roughly thirty-three billion dollars) in fines, settlements, vehicle buy-backs and remediation; criminal prosecutions; lasting reputational damage to the diesel category. Framework reading: like Theranos and FTX, the moral failure is the fraud; the framework failure is in the institutional verification systems that allowed a decade-long programme of cheating to go undetected.

32. Kodak's failure to commercialise digital photography

(1975–2012) Kodak invented the first digital camera in 1975. Decades of internal politics, deference to the film business and reluctance to cannibalise existing margins delayed serious commercial digital investment until the curve had passed. Resources: opportunity cost of the global photography market. Outcome: bankruptcy in 2012. Framework reading: the canonical example of an incumbent failing to attack on an obvious cost-trajectory because internal accounting prioritised current margin over future relevance.

33. Sears under Eddie Lampert

(2005–2018) Lampert's strategy of treating Sears as a financial-engineering vehicle rather than a retailer, while underinvesting in stores, technology and inventory. Resources: a hundred-and-twenty-five-year-old institution, roughly thirty billion dollars in market value at acquisition. Outcome: bankruptcy in 2018; most stores closed; pension obligations transferred to the federal Pension Benefit Guaranty Corporation. Framework reading: a deliberate decision not to invest on a curve where competitors were investing aggressively. The framework reading is harsh because the alternative — a serious retail-tech transformation — was visibly available.

34. Enterprise rule-based NLP systems

(2017–2022) The global consulting industry built hand-rolled entity extractors, sentiment analysers and document classifiers for tens of billions of dollars across thousands of enterprises. Resources: extraordinary in aggregate. Outcome: rebuilt or scrapped between 2022 and 2024 once foundation models commoditised the underlying capability. Framework reading: the most expensive should-have-known-better case of the recent past, distributed across thousands of organisations, almost none of which read the cost trajectory honestly.

35. Meta's Reality Labs metaverse pivot

(2019–) Roughly fifty billion dollars in cumulative investment by 2024 with negligible consumer adoption of the headline products. Resources: vast. Outcome: ongoing, but pricing has caught up. Framework reading: a thesis (immersive virtual presence as the next computing platform) bet on with extraordinary capital while the actual cost-trajectory of mobile-and-AI-driven computing was running in a different direction. The honest framework reading allows that the moonshot might still succeed; the resource ratio so far is unflattering.

36. Enterprise blockchain pilots

(2017–2022) A wave of corporate "we are exploring blockchain" pilot programmes funded by IT and innovation budgets. Resources: tens of billions across the global enterprise market. Outcome: a small fraction reached production; most were quietly abandoned. Framework reading: a category that required peer-to-peer trustless coordination being deployed inside organisations that already trusted their internal databases. A solution looking for a problem at scale.

37. Most NFT-related corporate pilots

(2021–2023) Major brands launching NFT collections and "Web3 communities" at the peak of the speculative cycle. Resources: hundreds of millions to perhaps low billions in aggregate marketing and development. Outcome: most programmes wound down by 2024. Framework reading: a fashion-driven category with no underlying cost-trajectory advantage; the framework's crowding and demonstration-dilution anti-patterns combined.

38. The Apple Newton

(1993–1998) A handheld personal digital assistant launched with overpromised handwriting recognition. Resources: substantial. Outcome: discontinued by Steve Jobs on his return. Framework reading: technology not yet on the curve to support the product promise. The PalmPilot, launched three years later, succeeded with a deliberately narrower design — the framework's decomposition move applied correctly.

39. New Coke

(1985) A reformulation of Coca-Cola that lasted seventy-nine days. Resources: very large in marketing and brand risk. Outcome: rapid reversal. Framework reading: a project whose verification cost (testing genuine consumer reaction at scale) had been bypassed by industry conventional wisdom. A small expensive lesson in the difference between blind taste tests and lived consumer behaviour.

40. The Ford Edsel

(1957–1960) A new mid-market car line launched into a market that had just changed taste. Resources: roughly two hundred and fifty million 1950s dollars in development. Outcome: discontinued in three model years. Framework reading: a project with a long lead time launched into a market trajectory that had moved during development. A reminder that decay rate of value is real for fast-moving consumer products.

41. Hand-tuned chess engines after Deep Blue

(approximately 1999–2010) Significant academic and commercial effort continued on hand-tuned position-evaluation engines for a decade after the Kasparov defeat made it clear that brute-force search plus better hardware would dominate. Resources: cumulative researcher and engineer years. Outcome: superseded entirely by AlphaZero in 2017. Framework reading: a clean premature-optimism anti-pattern; the cost-trajectory of compute was visible and the writing was on the wall.

42. Rule-based machine translation systems after roughly 2010

Several academic institutions continued investing in hand-built grammar-based MT systems years after statistical and then neural systems demonstrated decisive superiority. Resources: substantial cumulative funding. Outcome: superseded. Framework reading: institutional path-dependence in academic linguistics combined with grant cycles that could not absorb the paradigm shift.

43. Cold Fusion claims and follow-on research

(1989–) Stanley Pons and Martin Fleischmann announced that they had achieved nuclear fusion at room temperature in a tabletop electrochemistry cell. Resources: hundreds of laboratories worldwide spent the next two years attempting and failing to replicate; smaller programmes have continued under various rebranding (LENR, condensed-matter nuclear science) for thirty-five years. Outcome: no robust replication of the original claim; intermittent revivals followed by quiet retreats. Framework reading: a case where the verification cost of an extraordinary claim was structurally underpriced by the announcement itself, drawing capital and attention into chasing an effect that was almost certainly experimental error.

44. The original EU Human Brain Project

(2013–2023) A flagship European Commission research programme funded at over one billion euros, originally promising a comprehensive simulation of the human brain. Resources: substantial public funding over a decade. Outcome: the simulation goal was quietly abandoned within three years; the project produced useful infrastructure but at a fraction of the scientific return implied by the original commitment. Framework reading: the verification cost of the headline goal (do we know what a successful simulation would even look like) was never honestly addressed at funding time, and the political momentum of a flagship programme prevented timely course-correction.

45. Phlogiston theory research after Lavoisier

(post 1780s) Continued investigation of the phlogiston model after combustion-as-oxidation had been demonstrated empirically. Resources: cumulative researcher time. Outcome: dead end. Framework reading: a paradigm holding on past its useful life; framework-illuminating because the cost of switching was social, not scientific.

46. Aether-search experiments after Michelson-Morley

(post 1887) Several decades of further interferometry attempting to detect a luminiferous aether, despite the original 1887 null result and Einstein's 1905 reframing. Resources: cumulative laboratory time. Outcome: nothing. Framework reading: the framework cannot save you from a community that does not want to update.

47. Theranos

(2003–2018) A blood-testing startup that raised approximately seven hundred million dollars based on technical claims that were not real. Resources: substantial. Outcome: criminal prosecution; investor losses near total. Framework reading: the moral failure is the fraud. The framework failure is in the institutional allocators — sophisticated investors and a star-studded board — who failed to verify the scientific claims because the founder's narrative substituted for due diligence.

48. FTX

(2019–2022) A cryptocurrency exchange that collapsed amid roughly eight billion dollars in customer funds gone missing. Resources: substantial; the moral failure dominates. Framework reading: same as Theranos. The framework reading is on the institutions that funded and platformed FTX without the ordinary diligence that any earlier-era financial institution would have demanded.

49. WeWork

(2010–2019) SoftBank-led funding pushed the valuation toward forty-seven billion dollars; the IPO attempt collapsed at single-digit billions of intrinsic value. Resources: vast. Outcome: one of the most public valuation collapses in venture history. Framework reading: a real-estate arbitrage business priced as a software business. The cost-trajectory of physical office leasing was not on the curve the valuation assumed.

50. Quibi

(2018–2020) A short-form video service that raised roughly one and three-quarter billion dollars and shut down within six months of launch. Resources: extraordinary for a media product. Outcome: complete write-off. Framework reading: a distribution thesis (paid mobile-first short-form) that ignored the existence of free competition; a cost-trajectory of attention that was already unfavourable when the round closed.


Open list. The next obvious additions to consider: the Soviet Lysenko-adjacent agricultural campaigns, several specific UK government IT contracts beyond NPfIT, the South African Strategic Defence Package "arms deal" of 1999, the Boeing 787 production-quality programme, GE Capital's pre-2008 expansion, Citi's subprime mortgage exposure, the various Yahoo dot-com acquisitions (Broadcast.com, GeoCities), the BlackBerry and Nokia responses to the iPhone, Cisco's Flip Camera acquisition, Bank of America's Countrywide acquisition, the Athens 2004 Olympics, Spain's high-speed-rail extension to several uneconomic destinations, Westinghouse's AP1000 nuclear programme that bankrupted Toshiba, and several specific national R&D programmes whose cost-trajectory readings were similarly bad.

— Siri Southwind

50 current likely-dumb

The fifty live projects, programmes and bets that will look misallocated within five years.

Specific. Named. Some of them belong to people who will read this. Most belong to institutions too large to feel a single critic. Several involve more capital than the GDP of small countries.

If you are running one of these, you have a choice. If you are funding one, you have a different one. If you are working on one, the most useful thing this list can do is make you ask why.

The world has more capacity than it has people willing to point at the specific problem.

1. Sovereign "national champion" AI initiatives that try to replicate OpenAI or Anthropic from scratch

Several governments — France, UK, Germany, Saudi Arabia, the UAE, India, Korea, parts of the EU — are funding multibillion-dollar attempts to build domestic frontier-model labs. Resources: cumulative tens of billions. Framework reading: the cost trajectory of training frontier-class models is collapsing fast enough that being twelve to twenty-four months behind on capability buys you almost no defensible position. Sovereign compute is a defensible bet; sovereign frontier models mostly are not.

2. Most "build your own foundation model from scratch" enterprise efforts

Large banks, telcos, defence primes and several governments funding bespoke pre-training runs. Resources: hundreds of millions per institution. Framework reading: the marginal benefit over fine-tuned and post-trained off-the-shelf base models is small and shrinking; the marginal cost (data engineering, evals, security, drift) is large and growing.

3. Hand-curated enterprise knowledge graphs in 2026

A category that has been quietly rebuilt every five years for two decades. Resources: tens of millions per implementation across thousands of organisations. Framework reading: language models with retrieval are now capable of producing the working subset of a knowledge graph on demand, and the maintenance cost of a hand-curated graph rises with its size while the model cost falls with the curve.

4. Most generic AI-agent platform startups

A category currently raising at frontier-model-adjacent valuations. Framework reading: the agentic capability is being commoditised fast by the foundation-model providers themselves. Vertical agents with proprietary workflow data, defensible eval suites and clear customer relationships are the survivors; horizontal agent platforms mostly are not.

5. Most "vertical SaaS plus AI wrapper" companies at $50M+ valuations with no data moat

A category currently raising aggressively. Framework reading: the moat is the proprietary workflow data, not the AI; companies that have one will compound, those that do not will get squeezed between foundation-model owners adding domain features and incumbent SaaS vendors integrating the same off-the-shelf models.

6. Most "AI copilot for [profession]" wrappers competing with native-platform offerings

A category in which the foundation-model providers themselves are vertically integrating. Framework reading: the wrapper businesses survive only if they own the workflow data the foundation-model owner cannot get. Most do not. Microsoft, Google and OpenAI are now shipping native versions for code, productivity, support, sales and search.

7. Most "AGI alignment" research programmes that do not engage with deployed systems

A research category that has had real impact and a parallel category that has not. Framework reading: alignment work that does not test against current frontier models risks being theoretically elegant and operationally irrelevant. The framework rewards the empirical wing of the field; the rest is at risk.

8. Manual content-moderation farms still expanding hiring in 2026

The cost trajectory of automated moderation is collapsing faster than the cost trajectory of human moderation. Framework reading: continuing to scale human capacity while not also building the automation pipeline is exactly the curve-mispriced bet that doomed enterprise rule-based NLP a decade earlier.

9. Most "European sovereign cloud" projects trying to recreate AWS

National data-residency cloud programmes that fund full-stack alternatives to the hyperscalers. Framework reading: the cost-trajectory difference between the hyperscalers and the sovereign clones is widening, not narrowing. Sovereign capacity at the application layer is defensible; sovereign duplication of generic cloud infrastructure mostly is not.

10. Most billion-dollar AI rounds raised in 2024–2025 for thin wrappers

A category-level call. Framework reading: a substantial fraction of the largest-headline AI rounds of the past two years have funded businesses whose differentiation is brand and momentum rather than data, model or workflow. The portfolio shape works for the funds — one or two of these will compound — but the median outcome will be mediocre, and several of the largest names of 2024 will be down rounds or quiet wind-downs by 2028.

11. The NEOM "The Line" project

(Saudi Arabia) A linear city in the desert announced at five hundred billion dollars. Framework reading: a vanity project on a scale the framework rates almost zero. The cascade dependencies (governance, demand, operating cost, climate, water, geology) are systematically misjudged. Several scaled-back versions remain plausible; the original is not.

12. NEOM's broader portfolio — Trojena, Oxagon, Sindalah, Qiddiya, the Red Sea Project, Diriyah Gate

The wider Saudi Vision 2030 megaproject programme. Aggregate budgets in the high hundreds of billions of dollars, with execution rates and demand assumptions consistently behind plan. Framework reading: a national-scale strategic-imitation programme attempting to compress decades of urban development into less than a decade. The framework's coordination cost and neglectedness dimensions both flag it; several specific sub-projects will be quietly scoped down well before headline completion.

13. Egypt's New Administrative Capital

(2015–) A purpose-built capital city in the desert east of Cairo, budgeted originally at fifty-eight billion dollars and now substantially over. Framework reading: a centrally-planned city built without organic demand drivers, on a cost-trajectory of urban infrastructure that has not become favourable. Several similar projects in history — Brasília, Naypyidaw, Astana — provide cautionary precedents.

14. Indonesia's Nusantara new capital project

(2019–) A new federal capital in Borneo with a planned investment of roughly thirty-five billion dollars. Framework reading: similar template to the Egyptian case with comparable risks. The political-imitation pattern is visible in both. The honest framework verdict is that the core driver (climate-induced sinking of Jakarta) is real, but the chosen response is more expensive and slower than the alternatives the framework would propose.

15. UK High Speed 2 (HS2) in its current scoped-back form

Originally announced at thirty-three billion pounds, now estimated at over one hundred billion with the northern legs cancelled. Framework reading: a project whose cost-benefit case relied on the full network and now persists in a truncated form whose business case is substantially weaker. The framework reading is that continuing to build the central section without the wider network has poor coordination value and uncertain cascade.

16. The California High-Speed Rail Project

(2008–) Approved with a thirty-three-billion-dollar budget and a 2020 completion target; current estimates exceed one hundred billion with substantially scaled-back scope. Framework reading: a project with strong direct value if completed and a high probability of being technically superseded (autonomous vehicles, aviation electrification, intercity bus innovation) before the cascade fires. A live case study in mega-project allocation under shifting transport curves.

17. Brisbane 2032 Olympic Games infrastructure

Capital programme at over seven billion Australian dollars and rising. Framework reading: the historical record on Olympic infrastructure cost overruns and post-event utilisation is overwhelmingly negative; recent attempts to learn the lessons (Paris 2024 used existing venues more) suggest improvement is possible but Brisbane's plan involves substantial new construction. Decay-rate-of-value will be high.

18. NASA's SLS / Artemis launcher programme

Approximately thirty-two billion dollars spent on SLS development through 2025, with the broader Artemis programme projected to reach roughly ninety-three billion dollars in cumulative investment by 2025; per-launch cost is estimated by NASA's Inspector General at roughly four billion dollars per Artemis flight, an order of magnitude above commercial alternatives. Framework reading: the government-procurement logic is keeping the programme alive past the point at which the framework would have flagged it. SpaceX's Starship and Blue Origin's New Glenn are the cost-trajectory the framework points to.

19. Several specific Mars-sample-return mission architectures

Multi-billion-dollar programmes whose timelines have slipped to the 2030s. Framework reading: the cost-trajectory of robotic missions and commercial heavy-lift is moving faster than the programme architecture can absorb; sample return done with current architectures will look expensive next to sample return done with the technology that arrives during the slip.

20. The Future Combat Air System (FCAS) European fighter programme

A trinational France-Germany-Spain sixth-generation fighter programme, budgeted at over one hundred billion euros across its life. Framework reading: the framework's coordination cost dimension is operating; intra-European procurement disputes have already produced years of delay. The wider question is whether sixth-generation manned fighters are the right form factor given autonomous-systems trajectories — a question the programme's institutional logic cannot easily ask.

21. The continued F-35 sustainment beyond planned variant convergence

The lifecycle cost of the F-35 programme is now estimated at over two trillion dollars across all customers. Framework reading: the procurement decision is sunk; the sustainment decision continues to absorb capital that could otherwise fund autonomous-system alternatives whose curves are favourable. A defensible portion of sustainment is required for transition; the framework reads the high end of current commitments as misallocated.

22. The Sizewell C nuclear project

(UK) A planned new gigawatt-scale nuclear plant at the Sizewell site in Suffolk, with capital cost estimates rising past forty billion pounds. Framework reading: the framework is mixed on conventional gigawatt nuclear in current western regulatory regimes. The coordination cost and capital intensity are extreme; the cost-trajectory has not improved and may not. The cascade if delivered is real; the chance of completing on plan is low.

23. India's Smart Cities Mission

A national programme launched in 2015 to develop one hundred smart cities, with cumulative investment exceeding two trillion rupees. Framework reading: most participating cities have produced fragmented infrastructure investments rather than the integrated transformations promised. The framework's coordination cost dimension was structurally underpriced at programme design.

24. National "smart city" copies of Songdo and Masdar

Greenfield smart-city programmes in multiple jurisdictions promising integrated sensors, autonomous mobility and predictive governance. Framework reading: the deployment friction overwhelms the technology benefit; the coordination cost dimension is structurally underpriced in centrally-planned smart cities. Toronto's Sidewalk Labs cancellation is the cleanest recent illustration.

25. Various national "green hydrogen valley" programmes

Multibillion-euro and multibillion-dollar national programmes in Germany, the Netherlands, Saudi Arabia, the UAE, Japan, Australia, India and others, intended to anchor a domestic clean-hydrogen ecosystem. Framework reading: a subset of green hydrogen is genuinely important (industrial feedstock, ammonia, refining, possibly aviation); much of what is currently being subsidised in the "valleys" addresses applications (passenger vehicles, residential heating) where the curve does not favour hydrogen. The honest framework verdict is selective approval.

26. Continued European LNG import infrastructure expansion post-2024

Floating storage and regasification units, new permanent terminals, long-term contracts signed in 2022–2024 in response to the loss of Russian gas. Framework reading: short-term necessity is real; the decay rate of value on infrastructure built with twenty-to-thirty-year economics in a sector the same governments have committed to net-zero is unfavourable. Several specific terminals will become stranded assets faster than the financing assumed.

27. Specific EV battery gigafactory subsidy over-allocations

The combined effect of the Inflation Reduction Act in the US and similar programmes in Europe has produced a wave of gigafactory announcements that, in aggregate, may exceed near-term demand. Framework reading: the strategic-supply argument is sound; the specific allocation pattern is producing duplicate capacity in some chemistries while leaving others (sodium-ion, LFP at scale, recycling) under-subsidised. Several announced facilities will be delayed, downscaled or quietly shelved.

28. Hydrogen-fuel-cell passenger-vehicle infrastructure investments

National investments in hydrogen refuelling networks for personal cars, particularly in Japan, Korea and parts of Europe. Framework reading: battery-electric vehicles have decisively won the passenger market on the curve. Hydrogen has plausible cases in heavy industry, shipping and aviation; passenger cars are not the right vector and the refuelling infrastructure being built now will be substantially under-utilised.

29. Most retail central-bank-digital-currency pilots

Roughly forty central banks running retail-CBDC programmes. Framework reading: most of the use cases are already covered by faster-payment systems and open-banking infrastructure; political resistance is real; the cost-trajectory of implementation is unfavourable. A small number of wholesale-CBDC programmes are defensible; the retail wave mostly is not.

30. EU AI Act compliance products built around frozen 2023-era definitions

A category of compliance, governance and risk products built against an instantaneous regulatory snapshot. Framework reading: the regulatory landscape is moving fast; products optimised for the 2023 definitions will be obsolete when the 2028 amendments arrive. The compliance category as a whole will exist; the specific 2024–2025 generation of products will largely not.

31. Surviving enterprise blockchain pilots in non-financial domains

Supply-chain, healthcare, identity and similar projects funded between 2017 and 2022 that have somehow continued. Framework reading: a category that should have been wound down years ago but persists because of sunk-cost commitments and consultant relationships.

32. National "5G use case" innovation programmes funding new pilots in 2026

Government innovation budgets continuing to fund discovery work for transformative applications of 5G that have not arrived. Framework reading: the technology shipped, the transformative use cases mostly did not. Continuing to fund pilots seven years in is sunk-cost behaviour.

33. The Apple Vision Pro at its current price and positioning

A roughly thirty-five-hundred-dollar mixed-reality headset launched in 2024. Framework reading: the technology is impressive, the price-positioning is wrong for the addressable market, and the cost-trajectory of the underlying components will move much faster than the consumer behaviour. The product itself may evolve into something significant; the current allocation will look mistimed.

34. Meta's continuing Reality Labs spend at the 2024-25 pace

Roughly twenty billion dollars per year. Framework reading: the strategic argument is real (avoid platform dependency) and the moonshot may yet succeed; the resource ratio over five years has been unfavourable and the consumer thesis remains undemonstrated. The framework's verdict is conditional but currently unfavourable.

35. Major oil and gas companies' continued upstream exploration in transition-vulnerable jurisdictions

ExxonMobil, Shell, BP, TotalEnergies, Aramco, the Chinese majors and several smaller players continuing capital expenditure on long-cycle exploration and production projects whose payoff timelines extend twenty to thirty years. Framework reading: the decay rate of value on long-cycle hydrocarbon assets in the most aggressive transition jurisdictions is rising. A substantial subset of currently-sanctioned projects will become stranded; the framework cannot tell which ones, but the aggregate is too high.

36. Major auto OEMs' continued hydrogen-fuel-cell passenger-vehicle investments

Toyota's Mirai programme and the broader Korean and Japanese FCV ecosystems. Framework reading: the platform war is over and battery-electric won. The specific commitments to passenger FCVs are absorbing capital that could be reallocated to battery EV competitiveness, charging infrastructure or commercial-vehicle applications where hydrogen retains plausibility.

37. Major auto OEMs' attempts to build proprietary EV charging networks

GM's effort, Ford's earlier independence from Tesla, Volkswagen's Electrify America, Mercedes' branded network and others. Framework reading: charging infrastructure is a network-effect business; OEM-fragmentation is exactly the wrong shape. The Tesla-NACS adoption move was the framework's prediction; the holdouts will mostly capitulate or build infrastructure that will need conversion.

38. Major retailers' "experiential physical store" investment programmes

Substantial investments by Nordstrom, Macy's, several luxury houses and chain restaurants in flagship "experiential" locations. Framework reading: the curve of physical-retail demand is unfavourable for most of these formats; the experiential differentiation is being competed away faster than the capital can amortise. A small set of category-defining flagships will work; most will be quietly closed within a decade.

39. Specific massive ERP migrations (SAP S/4HANA, Oracle Cloud Fusion)

Multi-year, billion-dollar-per-enterprise migration programmes from legacy ERP to cloud successors. Aggregate global cost likely above five hundred billion dollars across the migration window. Framework reading: a substantial portion of these migrations will deliver thin business value relative to cost; the framework's coordination cost and verification cost dimensions both flag the category. The replacement-driven nature (vendor end-of-support timelines) is forcing investment that strict ROI analysis would not justify.

40. Continued cable / fixed-broadband infrastructure investment beyond fibre and satellite competition

Major cable operators continuing to invest in HFC upgrades and DOCSIS 4.0 in markets where fibre and satellite (Starlink, Kuiper) are arriving. Framework reading: the curve favours fibre for performance and satellite for reach; the middle position cable occupies is being squeezed from both sides. Investment in the legacy plant beyond minimum service maintenance is increasingly difficult to defend.

41. Major traditional-media streaming wars investments

Disney, Warner Bros Discovery, Paramount, Comcast and Netflix continuing aggressive content and platform spending against each other. Aggregate annual content investment globally exceeds one hundred and twenty billion dollars. Framework reading: the unit economics of subscription video are deteriorating, the AI-generated-content curve is shifting the cost structure of the underlying inputs, and several of the current platforms will consolidate or wind down. The framework would direct substantially less aggregate spending toward the category.

42. Big consultancies' "digital transformation" engagements at large enterprises

Accenture, Deloitte, McKinsey Digital, BCG X, the IBM consulting business and others continuing to sell multi-year transformation programmes at substantial margins. Aggregate global spend in the hundreds of billions per year. Framework reading: a substantial portion of these engagements is being sold against capabilities that the foundation-model wave will compress. The deliverables will look dated within eighteen months of completion in many cases; the decay rate of value is being underpriced.

43. Major pharma's continued small-molecule discovery in crowded therapeutic categories

Specific oncology indications (some kinase targets), several metabolic indications and selected immunology categories where ten or more companies are pursuing the same target with marginal differentiation. Framework reading: the marginal differentiation is small; the cost-trajectory of biological-platform alternatives (cell, gene, mRNA, peptide) is favourable for a subset of the same indications. A substantial fraction of currently-funded small-molecule programmes will be terminated for portfolio reasons before reaching market.

44. Major bank "neobank subsidiary" plays

JPMorgan's Finn (since shut), Goldman's Marcus retreat, several European bank attempts at standalone digital subsidiaries. Framework reading: a category that has consistently produced poor returns for incumbents because the cost structure of the parent makes neobank economics impossible to replicate inside. New entrants of this type in 2025–2026 will follow the same pattern unless deliberately structured to escape the parent's cost base.

45. ITER fusion reactor (the specific tokamak architecture)

A multinational fusion reactor under construction in southern France, originally signed in 2006 at an estimated cost of roughly six billion euros over ten years. As of 2024, total cost is estimated at over twenty-five billion euros, with a further roughly five billion in additional costs and a nine-year schedule slip confirmed in the 2024 revised plan; first deuterium-deuterium plasma operations now scheduled for 2035, energy-producing fusion not expected before 2039. Framework reading: the demonstration value is real and the science is sound; the framework's concern is that the specific tokamak architecture is being out-paced by smaller, faster, privately-funded competitors (CFS, Helion, TAE, Tokamak Energy) whose curves are steeper. ITER will probably produce science worth its cost over fifty years; whether it should still be the centrepiece of the international fusion programme is a different question.

46. Future Circular Collider (FCC) proposals at CERN

A proposed roughly twenty-billion-euro successor to the LHC, with construction proposed for the 2040s. Framework reading: the cascade of marginal high-energy-physics discoveries is uncertain in a way it was not when the LHC was funded. The framework would require a sharper cascade and demonstration-value case before approving a project of this magnitude. Several alternative approaches (muon collider, lepton collider) deserve more weight in the comparison than they currently receive.

47. Continued investment in symbolic-AI / GOFAI academic departments

Several university departments and research centres maintain substantial faculty hiring, PhD pipelines and grant funding around symbolic AI traditions disconnected from current empirical progress. Framework reading: the path-dependence is severe and the framework's verdict is uncomfortable. A small number of programmes are doing meaningful integrative work; many are continuing patterns from the 1980s and 1990s on the strength of institutional momentum rather than current relevance.

48. Specific Human Brain Project successor and brain-mapping flagships

A category that has produced one notable disappointment (the original EU Human Brain Project) and continues to produce new versions of similar promises. Several proposed successor programmes in Europe, the US and East Asia continue to promise comprehensive brain mapping and simulation. Framework reading: the verification cost dimension is structural and has not been addressed in the new proposals. Specific sub-fields (single-cell connectomics, cortical-column mapping, specific behavioural circuits) are tractable; the wholesale "simulate the brain" framing has consistently underdelivered.

49. Continued large-scale academic investment in foundational psychology and social-science findings that have not replicated

A category-level call. Power posing, ego depletion, several priming effects, several development-economics effects with mixed replication — the field continues to fund follow-on work on findings whose empirical foundation is contested. Framework reading: the highest-leverage move is to fund replication of the foundational findings before further building on them. The institutional incentives point the other way.

50. Various academic "Centre for AGI / machine consciousness / X-of-the-future" institutes

A category of institutes founded to study future technologies in the abstract, often with substantial endowments and limited engagement with current empirical work in the same domains. Framework reading: a subset is doing valuable work (some of the longtermist research institutes, parts of the AI-safety community, several specific philosophy-of-mind groups). A larger subset is producing theoretical output that does not connect to the current state of the relevant empirical field. The framework predicts a quiet contraction of the latter as the gap between abstract speculation and concrete capability widens.


This list is meant to be argued with. If your project is on it and you disagree, send a paragraph naming the curve you are bending, the cascade you are firing, or the window you are closing. Strong arguments will move the entry off the list in the next revision.

And if you are an investor, founder, builder, public funder, corporate strategist or academic reading this and your reaction is "this is harsh but correct" — that is the point. The framework's job is to make the unfashionable thought sayable. The cost of saying it is lower than the cost of pretending not to think it.

— Siri Southwind

50 possibilities now

The fifty fields, programmes and bets that the framework, applied honestly, says are the highest-leverage uses of the next decade's capital, talent and compute. Specific. Named. Most of them are under-attacked relative to what the dimensions imply.

If you fund things, allocate compute, or are choosing what to spend the next ten years on, this is the list the framework hands you. Several entries will look unreasonable. So did ImageNet, AlphaFold and HGP at the moment they were funded. Smartness is partly resistance to consensus.

A few entries appear because the consensus is mispricing them downward. A few appear because a window is closing. A few appear because the cascade is so steep that even a low-probability attempt has positive expected value. Each entry names which.

The resources column is deliberately concrete. Talent type, capital range, compute envelope, infrastructure and licensing dependencies. The aim is not precision; it is to make the bet legible enough that an allocator can compare it against what they are currently funding.

Each entry also carries two scores. Top dimension names the dimension from Dimensions that fires hardest for this entry, scored out of five — five meaning textbook exemplar, three meaning solid fit. Competition now scores how crowded the field already is, also out of five — one meaning nearly nobody is working on it, five meaning ferocious crowding. The most interesting entries are usually those with a high dimension score and a low competition score: the framework rates them as urgent and the world has not yet noticed. Those with a high competition score are still worth attacking, but the marginal team's contribution is smaller and the bar to differentiation is higher.

1. Wastewater and air-sampling pathogen surveillance at global scale

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 2/5. A standing metagenomic monitoring network across municipal wastewater, major airports and high-density indoor venues, designed to flag novel pathogens within days of community spread. Framework reading: defender-favoured dual-use; closing window if the next pandemic outpaces detection; massive cascade value for early intervention; cost trajectory of metagenomic sequencing falling fast. Resources: one to three billion dollars over five years across a global network; metagenomicists, public-health epidemiologists, software engineers; partnerships with municipal water utilities, airports and a handful of airlines; sustained sequencing capacity; real-time signal-extraction software; open data-sharing protocols across jurisdictions.

2. Pan-coronavirus and pan-influenza vaccines

Top dimension — Cascade value: 5/5. Competition now: 3/5. Vaccines that protect against entire viral families rather than the specific strains in circulation, removing the annual reformulation race and the response lag at the start of any new outbreak. Framework reading: cascade across global health, defender-favoured, cost trajectory of antigen design favourable, current allocation orders of magnitude smaller than the expected-value calculation supports. Resources: three to ten billion in patient capital over a decade; structural vaccinologists, immunologists, manufacturing engineers; access to non-human-primate facilities for challenge studies; manufacturing capacity at scale; regulatory pathways for variant-agnostic approval.

3. Mucosal sterilising-immunity vaccine platforms

Top dimension — Cascade value: 4/5. Competition now: 2/5. Vaccines administered nasally or orally that produce sterilising rather than disease-attenuating immunity, blocking transmission rather than only severity. Framework reading: the gap between the public's expectation of vaccines and what they actually achieve is one of the largest single trust failures of the post-2020 period; the technical path is plausible; the cascade across respiratory disease is enormous. Resources: one to three billion in development capital; mucosal immunologists, formulation chemists, clinical trial infrastructure with transmission endpoints; access to challenge models; new regulatory frameworks for transmission-blocking endpoints.

4. Far-UV-C indoor air disinfection at deployment scale

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 1/5. The technology to render indoor air pathogen-free is mature; the deployment is not. Framework reading: defender-favoured, cheap relative to its expected pandemic-prevention value, deployable immediately, currently allocated almost no public infrastructure money despite its cost-effectiveness. Resources: one to five billion in deployment capital across schools, transport hubs and hospitals; lighting engineers, ventilation engineers, building-code reformers; updated electrical and safety standards; long-term studies on chronic exposure (the binding scientific question).

5. Universal nucleic-acid synthesis biosecurity screening

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 2/5. Free, mandatory, fast and global screening of DNA and RNA synthesis orders against threat sequences before they are shipped to customers. Framework reading: defender-favoured, high-leverage, low-cost, currently fragmentary and voluntary. The asymmetry between the cost of universal screening and the cost of a single misuse incident is enormous. Resources: a hundred to three hundred million for the central infrastructure; bioinformaticians, security cryptographers, policy lawyers; coordination with the small number of major synthesis providers; an international governance body; refresh of the threat-sequence database.

6. AI red-teaming and capability evaluation as public infrastructure

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 3/5. A standing, well-funded, methodologically rigorous evaluation function for frontier AI systems, run independently of the labs that build them. Framework reading: defender-favoured, neglected relative to capabilities investment by orders of magnitude, cascade across every subsequent governance and deployment decision. Resources: three to five hundred million per year of operating budget at maturity; AI researchers with frontier-lab experience, security professionals, social scientists, statisticians; substantial compute access (likely subsidised by labs under disclosure agreements); an institutional home that is not a lab and not a single national agency.

7. Mechanistic interpretability of frontier neural networks

Top dimension — Cascade value (decomposition unlock): 5/5. Competition now: 3/5. Understanding what large neural networks are actually computing, at the level of circuits, features and algorithms rather than only behaviours. Framework reading: a decomposition unlock — solving interpretability cheaply makes alignment, evaluation, debugging and many regulatory questions tractable. Cost trajectory falling fast as automated interpretability tools mature. Currently funded at a small fraction of capabilities work. Resources: a few hundred million per year of well-targeted funding across labs and academic groups; rare combination of ML researchers, theoretical neuroscientists and software engineers; meaningful inference compute on frontier models; open-weight access where possible.

8. Robust held-out evaluation benchmarks for frontier AI

Top dimension — Verification economics: 5/5. Competition now: 2/5. Benchmarks that frontier models cannot have seen during training and that meaningfully predict real-world capability. Framework reading: verification-economics; the field's current evaluation infrastructure is leaky and saturated, which makes capability claims and policy decisions harder to ground. Defender-favoured. Resources: fifty to two hundred million across multiple evaluation organisations; domain experts in mathematics, code, biology, security, persuasion, autonomy; secure data-handling infrastructure; refresh cycles to stay ahead of training-set leakage.

9. Memory-safe rewrites of critical infrastructure software

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 2/5. The slow, unglamorous migration of Linux kernel modules, embedded firmware, network stacks and industrial control systems from C and C++ to Rust or equivalent. Framework reading: defender-favoured, closing window before adversary AI-assisted exploit generation outpaces defensive capacity. The cascade prevents an entire class of vulnerabilities from existing. Currently funded mostly by a few large companies' goodwill. Resources: a few billion over a decade across the open-source critical-software ecosystem; senior systems programmers, formal-methods specialists; project management for long migrations; institutional buyers for the transition costs.

10. Post-quantum cryptography migration at civilisational scale

Top dimension — Closing window: 5/5. Competition now: 3/5. Replacement of vulnerable key-exchange and signature schemes across the financial system, government communications, certificate authorities and the wider internet before sufficiently large quantum computers exist. Framework reading: closing window — the harvest-now-decrypt-later risk is already material; defender-favoured; cascade across every digital trust system. Resources: tens of billions in migration costs across enterprises and governments; cryptographers, protocol engineers, certificate-authority operators; updated standards bodies; coordinated deprecation timelines; substantial transition tooling.

11. Closed-loop and advanced geothermal drilling cost reduction

Top dimension — Cascade value: 5/5. Competition now: 3/5. The application of oil-and-gas drilling advances and millimetre-wave drilling to make geothermal viable across most of the planet rather than only in volcanic zones. Framework reading: cascade for clean firm baseload power, brute-force-then-elegance phase, cost trajectory of horizontal drilling already proven, fastest credible path to dispatchable carbon-free electricity at industrial cost. Resources: five to fifteen billion in patient capital over a decade across several teams; drilling engineers from oil-and-gas, materials scientists for high-temperature tooling, plasma and millimetre-wave physicists; siting agreements; offtake contracts to make first-of-a-kind plants bankable.

12. Long-duration energy storage at grid scale

Top dimension — Cascade value: 4/5. Competition now: 4/5. Storage technologies — iron-air, flow batteries, gravity, thermal — that hold tens to hundreds of hours of energy at a fraction of the cost of lithium-ion. Framework reading: cascade unlock for fully renewable grids; demonstration phase well underway; the marginal team is now adding meaningful learning-curve progress; current allocation is below the level the curve supports. Resources: five to twenty billion in deployment capital across multiple chemistries; electrochemists, mechanical engineers, grid-integration specialists; siting and interconnection agreements; first-of-a-kind utility offtake.

13. Heat-pump electrification of industrial heat above 200°C

Top dimension — Cascade value: 4/5. Competition now: 3/5. Industrial process heat is a quarter of global emissions; most of it is below 400°C and is technically reachable with high-temperature heat pumps. Framework reading: cascade across cement, chemicals, food processing and pulp; cost trajectory of high-temperature heat pumps already favourable in pilots; current deployment is a tiny fraction of where the curve supports. Resources: a few billion across multiple companies and demonstrations; thermal engineers, refrigerant chemists, process integrators; partnerships with heavy-industry operators; clean-electricity supply contracts; updated industrial codes.

14. Grid software and interconnection-queue automation

Top dimension — Cascade value (bottleneck unlock): 5/5. Competition now: 2/5. The bottleneck in renewable deployment in the United States and several European countries is now interconnection-study queueing, not turbine cost. Framework reading: a software-shaped problem with infrastructure-shaped consequences; cheap, neglected, cascade across the entire energy transition. Resources: tens to a few hundred million across several specialised firms; power-systems engineers, software developers, regulatory lawyers; data-sharing arrangements with system operators; sustained advocacy work to align state-level processes.

15. High-temperature superconducting magnets at production scale

Top dimension — Cascade value: 5/5. Competition now: 3/5. The new generation of REBCO-tape magnets that have already enabled the more credible private fusion programmes, with applications across MRI, motors, particle accelerators and grid storage. Framework reading: cascade across multiple industries, demonstration unlock already partially achieved, manufacturing scale-up is the binding constraint. Resources: one to three billion in manufacturing capacity across a small number of firms; cryogenics and superconductor specialists, manufacturing engineers; specialised tape-fabrication infrastructure; long-term offtake from fusion, MRI and motor customers.

16. The Earth BioGenome Project and closing-window biodiversity sequencing

Top dimension — Closing window: 5/5. Competition now: 2/5. Sequencing the genomes of every living eukaryotic species, with priority on those facing imminent extinction. Framework reading: closing window in the strictest sense; cascade across taxonomy, ecology, agriculture and biotech; current funding orders of magnitude below the irreversibility of the loss. Resources: three to ten billion over a decade; field biologists, sequencing platforms, bioinformaticians; sample-collection logistics across remote regions; biobanks for tissue preservation; open data-sharing infrastructure.

17. Indigenous-language documentation before extinction

Top dimension — Closing window: 5/5. Competition now: 2/5. Recording, transcribing, machine-translating and preserving the world's roughly seven thousand languages, of which several hundred lose their last speakers each decade. Framework reading: closing window; cultural cascade; modern speech-recognition and translation models make the work an order of magnitude cheaper than five years ago. Resources: one to two billion over a decade; field linguists, native speakers as paid collaborators, audio engineers, ML engineers for low-resource translation; recording infrastructure; long-term archival storage; community-led data-governance frameworks.

18. Functional connectomes of mammalian model organisms

Top dimension — Cascade value: 5/5. Competition now: 2/5. Whole-brain wiring diagrams of mice, marmosets and eventually macaques, paired with functional recording at scale. Framework reading: cascade for neuroscience and AI; decomposition unlock — the connectome plus dynamics is plausibly the missing substrate for circuit-level theories of cognition; cost trajectory of electron microscopy and connectomics tools falling rapidly. Resources: five to fifteen billion over a decade across several teams; electron microscopists, automation engineers, computational neuroscientists; massive image-storage infrastructure; standardised pipelines for image segmentation; primate facility access for the higher-order organisms.

19. Pan-cancer early-detection blood tests at population scale

Top dimension — Cascade value: 4/5. Competition now: 4/5. Multi-cancer detection assays, validated and deployed at sufficient scale to shift the population-level distribution of cancer at diagnosis from late-stage to early-stage. Framework reading: cascade across oncology, cost trajectory of liquid-biopsy assays steeply falling, demonstration value once population trials clear. Resources: five to fifteen billion in trial and deployment capital; molecular biologists, statistical epidemiologists, bioinformaticians; large prospective cohorts with multi-year follow-up; reimbursement-pathway negotiation; integration with primary care.

20. Engineered phage therapies for antimicrobial resistance

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 3/5. Programmable phage cocktails as a complement and replacement for antibiotics in resistant infections. Framework reading: cascade against the AMR crisis; defender-favoured; the regulatory framework is the binding constraint, not the science; closing window as resistance spreads. Resources: a few billion in development capital across multiple companies; phage biologists, synthetic biologists, regulatory specialists; phage banks; clinical trial sites with antibiotic-resistant patient populations; new approval pathways for personalised biologics.

21. Industrial-scale senolytic and senomorphic compound screening

Top dimension — Cascade value: 4/5. Competition now: 3/5. Systematic high-throughput screening for compounds that selectively eliminate or modify senescent cells, paired with rigorous biomarker-validated trials. Framework reading: cascade for ageing biology; the field has plausible mechanism, mediocre execution, and underweighted screening throughput relative to its cascade. Resources: one to three billion in screening and trial capital; cell biologists, medicinal chemists, gerontologists; validated senescence biomarkers (the binding constraint); long-running prospective cohorts; modernised trial endpoints for healthspan.

22. Standardised ageing biomarkers

Top dimension — Cascade value (decomposition unlock): 5/5. Competition now: 2/5. The set of measurable, validated biomarkers that allow trials of geroprotective interventions to be run on years rather than decades of follow-up. Framework reading: a decomposition unlock; the entire longevity field is bottlenecked on this one piece of infrastructure; cascade across hundreds of trials. Resources: a few hundred million in coordinated cohort and validation studies; gerontologists, statisticians, clinical chemists; access to long-running cohorts (UK Biobank, FinnGen, similar); regulatory engagement to qualify biomarkers as valid endpoints.

23. Whole-organ vitrification and rewarming

Top dimension — Cascade value: 5/5. Competition now: 2/5. The technical capability to cryopreserve and rewarm complex organs for transplantation, addressing the rate-limit of the global transplant system. Framework reading: cascade across transplant medicine, demonstration value once a single organ class is solved, current funding well below what the magnitude of the prize supports. Resources: one to three billion over a decade; cryobiologists, materials scientists for ice-blocking agents, surgical teams, perfusion engineers; specialised cryoperfusion equipment; large-animal model facilities; ethics frameworks for first-in-human protocols.

24. AI-assisted rare-disease diagnostic platforms

Top dimension — Cascade value: 5/5. Competition now: 3/5. Tools that compress the average diagnostic odyssey from years to weeks for the seven thousand rare diseases collectively affecting hundreds of millions of people. Framework reading: cascade across thousands of conditions; cost trajectory of LLM-plus-knowledge-graph tooling extremely favourable; neglected because individual rare diseases have small markets. Resources: a few hundred million across non-profits, foundations and a few specialist firms; clinical geneticists, ML engineers, patient advocacy groups; access to medical literature corpora and rare-disease registries; integration into electronic-health-record workflows.

25. Open-weight medical-imaging foundation models on consented data

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 3/5. Pre-trained foundation models for radiology, pathology and dermatology, released under licences that allow audit and adaptation, trained on data released with patient consent and rigorous governance. Framework reading: cascade across the imaging-driven specialties; defender-favoured against vendor lock-in; current allocation produces mostly closed proprietary models. Resources: a few hundred million in compute and curation; radiologists and pathologists working alongside ML engineers, governance lawyers; consortium of hospitals willing to release de-identified data; sustained model-update pipelines.

26. Privacy-preserving machine learning at production scale

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 3/5. Federated learning, differential privacy and secure multi-party computation deployed in healthcare, finance and government — the infrastructure that lets sensitive data be used without being concentrated. Framework reading: defender-favoured; cascade across many fields that currently cannot use their best data; cost trajectory of the underlying primitives now favourable. Resources: a few hundred million across vendors and standards bodies; cryptographers, distributed-systems engineers, regulatory lawyers; reference deployments in sensitive domains; updated procurement standards.

27. Formal-verification toolchains for safety-critical software

Top dimension — Verification economics: 5/5. Competition now: 2/5. Verified compilers, verified operating-system kernels, verified protocol implementations and AI-assisted theorem-proving tooling that bring formal methods within reach of normal engineering teams. Framework reading: verification-economics; cascade across every safety-critical domain; cost trajectory of AI-assisted proof generation steeply falling. Resources: a few hundred million across academic and commercial efforts; programming-language theorists, formal-methods researchers, ML engineers building proof assistants; integration into existing software-engineering toolchains; procurement standards that reward verified components.

28. Circuit-level mechanistic models of depression and addiction

Top dimension — Cascade value: 5/5. Competition now: 2/5. Moving the psychiatry of major depression, anxiety, addiction and OCD from symptom clusters to circuit-level mechanism with testable interventions. Framework reading: cascade across mental-health treatment; the field has stalled on weak DSM categories and underpowered drug trials; cost trajectory of human neuroimaging and circuit-level recording falling. Resources: a few billion over a decade across multiple groups; neuroscientists, clinical psychiatrists, ML researchers, clinical trialists; access to human imaging at scale and animal-model laboratories; reform of psychiatric trial endpoints.

29. Machine-learning catalyst design to replace precious metals

Top dimension — Cascade value: 4/5. Competition now: 3/5. Systematic ML-driven design of catalysts that perform the work currently done by platinum, palladium, rhodium and iridium. Framework reading: cascade across hydrogen production, fuel cells, chemicals manufacture and pollution control; cost trajectory of computational chemistry plus ML steeply favourable; supply-chain de-risking value substantial. Resources: a few hundred million across academic and commercial groups; computational chemists, materials scientists, automated-laboratory engineers; high-throughput synthesis robots; partnerships with industrial users for validation.

30. Atmospheric water harvesting for arid regions

Top dimension — Closing window: 4/5. Competition now: 2/5. Devices that produce drinkable water from low-humidity air at humanitarian-scale unit costs. Framework reading: closing window for water-stressed populations; cascade for resilience; the materials science (sorbents, MOFs) is now mature enough that the binding constraint is engineering and deployment. Resources: a few hundred million across companies and humanitarian funders; materials scientists, mechanical engineers, deployment specialists; pilot sites in genuinely water-stressed regions; supply chain for sorbent materials at scale.

31. Battery chemistries beyond lithium

Top dimension — Defender-favoured dual-use (supply chain): 4/5. Competition now: 4/5. Sodium-ion, multivalent, sulfur and solid-state chemistries that reduce cost and supply-chain dependence on a small number of minerals and countries. Framework reading: cascade for the energy transition and storage at scale; demonstration phase already underway in sodium-ion; defender-favoured against critical-mineral concentration. Resources: five to ten billion across multiple chemistries; electrochemists, manufacturing engineers, supply-chain specialists; pilot manufacturing lines; offtake from grid-storage and stationary-storage customers (less price-sensitive than EVs).

32. CO2-to-fuels and CO2-to-cement chemistry at industrial scale

Top dimension — Cascade value: 4/5. Competition now: 3/5. Processes that turn CO2 into liquid fuels, plastics and structural cement at competitive cost using clean electricity. Framework reading: cascade for hard-to-abate sectors; cost trajectory of clean electricity plus electrochemistry now favourable for first products; demonstration value substantial. Resources: five to fifteen billion across multiple processes; electrochemists, process engineers, clean-electricity offtake teams; first-of-a-kind plant capital; offtake agreements; updated codes for low-carbon cement.

33. Ocean carbon-removal measurement, reporting and verification

Top dimension — Verification economics: 5/5. Competition now: 2/5. The independent, open scientific infrastructure for verifying claimed ocean-based carbon removals — alkalinity enhancement, kelp sinking, microbial pumps. Framework reading: verification-economics; the entire ocean-CDR field is bottlenecked on credible measurement; cascade across what becomes investable. Resources: a few hundred million across academic groups and registries; oceanographers, biogeochemists, statisticians; ship time and autonomous platforms; modelling capacity; open data registries with adversarial audit.

34. Open infrastructure for scientific reproducibility

Top dimension — Cascade value: 5/5. Competition now: 1/5. The boring but cascade-rich infrastructure of preregistration, open data, open code, replication studies and credit for confirmation rather than discovery. Framework reading: cascade across every empirical field; defender-favoured against the slow corruption of the literature; current allocation mostly token compared to the magnitude of the cumulative loss. Resources: a few hundred million per year of operating budget; meta-scientists, software engineers, statisticians; integration with funder mandates and journal infrastructure; long-term archival hosting.

35. Pre-extinction soil and ocean microbiome cataloguing

Top dimension — Closing window: 5/5. Competition now: 1/5. Sampling and sequencing soil microbiomes of intact ecosystems and ocean-water microbial diversity before climate-driven shifts make pre-disturbance baselines unrecoverable. Framework reading: closing window; cascade across agriculture, biotech, climate science; the cost of the work is small relative to the irreversibility. Resources: a few hundred million over a decade; microbiologists, ecologists, sequencing platforms, autonomous samplers; preservation infrastructure for cryogenic biobanks; standardised metadata and open repositories.

36. High-resolution glacier and ice-core archive expansion

Top dimension — Closing window: 5/5. Competition now: 2/5. Aggressive coring and storage of glacial ice records before warming destroys the archive. Framework reading: closing window; cascade for paleoclimate, atmospheric chemistry and microbial archaeology; modest cost relative to irreversibility. Resources: a few hundred million; glaciologists, drilling engineers, low-temperature storage facilities; access to remote sites; international cooperation across polar research bodies.

37. Civilisational food and water reserves with auditable redundancy

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 1/5. Multi-month strategic reserves of staple crops, seeds, water-treatment supplies and basic medicines, independent of single-point logistics, distributed and stress-tested. Framework reading: defender-favoured against systemic shocks; neglected relative to its expected-value contribution under tail-risk scenarios. Resources: tens of billions across nation-state and supranational programmes; logistics and supply-chain specialists, agronomists, civil-defence planners; storage infrastructure across geographies; auditable inventory systems; legal frameworks for emergency deployment.

38. Open EHR and clinical-data interoperability infrastructure

Top dimension — Cascade value (decomposition unlock): 5/5. Competition now: 2/5. The unsexy plumbing that makes hospital data movable across systems, allowing both better care and the data substrate that medical AI actually needs. Framework reading: a decomposition unlock for medicine and medical AI; defender-favoured against vendor lock-in; cascade across every downstream clinical-AI use case; cost trajectory of the implementation falling as standards mature. Resources: a few billion in implementation capital across health systems; standards engineers, clinical informaticists, hospital IT teams; reform of procurement that currently rewards proprietary lock-in; sustained funder pressure.

39. AI tutoring with rigorous learning-outcome measurement

Top dimension — Cascade value: 4/5. Competition now: 4/5. Adaptive tutoring at scale, paired with the measurement infrastructure to actually verify learning gains rather than engagement metrics. Framework reading: cascade across human capital; cost trajectory of the underlying models steeply falling; the field is currently dominated by engagement-optimised products with weak evidence of learning effect. Resources: a few hundred million across non-profits and rigorous evaluators; educational researchers, learning scientists, ML engineers; long-running RCTs in real schools; open-licensed curriculum corpora.

40. Auditable open-source voting infrastructure

Top dimension — Defender-favoured dual-use: 5/5. Competition now: 2/5. End-to-end-verifiable election systems, open-source ballot-counting software, paper-backed audits at scale. Framework reading: defender-favoured against an entire class of legitimacy crises; neglected relative to the magnitude of trust at stake; cost trajectory of the cryptographic primitives now mature. Resources: a few hundred million across democracies and standards bodies; cryptographers, election administrators, software engineers, security auditors; pilot deployments in willing jurisdictions; updated procurement and certification standards.

41. Consumer-scale hardware roots of trust

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 3/5. The chips, firmware and software stacks that establish verifiable identity and integrity at the device level — for laptops, phones, IoT sensors, and increasingly for AI accelerators. Framework reading: defender-favoured; closing window before adversary capability and IoT scale outpace defensive capacity; cascade across digital infrastructure security. Resources: a few billion in fabrication and software development across multiple companies; silicon and firmware engineers, cryptographers, supply-chain auditors; trusted fabrication capacity; updated procurement standards that reward attested hardware.

42. Zero-gravity manufacturing of high-value products

Top dimension — Cascade value: 3/5. Competition now: 2/5. Fibre-optic preforms, bioprinted tissue and certain pharmaceutical crystals that are improved by microgravity environments to a degree that makes orbital production economically attractive. Framework reading: cascade for the broader space economy; cost trajectory of launch (Falcon, Starship) finally permissive; demonstration phase appropriate. Resources: one to three billion across several companies; aerospace engineers, materials scientists, automation engineers; orbital platform access; specialised return-and-recovery infrastructure; regulatory frameworks for orbital manufacturing.

43. Asteroid characterisation for planetary defence

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 2/5. Comprehensive cataloguing of near-Earth objects above the threatening size threshold, paired with proven deflection capability. Framework reading: defender-favoured at civilisational scale; the cost is trivial relative to the asymmetry of the bet; the DART demonstration is a textbook brute-force-then-elegance opening move. Resources: a few billion over two decades across space agencies; planetary scientists, mission engineers, observational astronomers; survey-telescope infrastructure (NEO Surveyor and successors); a small standing deflection-mission programme; international governance.

44. Reversible and energy-efficient compute substrates

Top dimension — Closing window (AI energy cap): 4/5. Competition now: 2/5. Compute that approaches the Landauer limit through reversible logic, optical compute, neuromorphic compute and superconducting logic — relieving the looming energy-supply constraint on AI deployment. Framework reading: closing window before AI energy demand reshapes grids; cascade for AI deployment economics; the field is small relative to its potential leverage. Resources: one to three billion of patient capital across multiple architectures; physicists, EE researchers, materials scientists; specialised fabrication facilities; benchmark workloads from AI labs willing to characterise alternative architectures.

45. Open foundation models for protein function, complexes and dynamics

Top dimension — Cascade value: 5/5. Competition now: 4/5. The successors to AlphaFold that solve function prediction, complex assembly and conformational dynamics rather than only static structure. Framework reading: cascade across drug discovery, enzyme engineering and synthetic biology; cost trajectory of the underlying training and data favourable; cascade of AlphaFold itself proves the pattern. Resources: one to three billion in training and validation; structural biologists, ML researchers, biophysicists; substantial training compute (roughly the AlphaFold envelope or an order of magnitude larger); open-data partnerships with experimental groups for ground truth.

46. Computational drug-repurposing platforms for orphan and tropical diseases

Top dimension — Cascade value: 4/5. Competition now: 3/5. Systematic pairing of approved or shelved drugs with neglected diseases, using ML to navigate the combinatorial space of molecule-disease pairings. Framework reading: cascade across thousands of conditions; defender-favoured for global health; the upside of repurposing is dramatically cheaper than de novo drug discovery; current allocation tiny relative to opportunity. Resources: a few hundred million across non-profits, foundations and a few specialist firms; chemoinformaticians, ML engineers, clinical pharmacologists; access to compound libraries and shelved-asset registries; regulatory pathways for off-label and orphan approvals.

47. Lean, Mathlib expansion and AI-assisted theorem proving

Top dimension — Cascade value: 5/5. Competition now: 3/5. The continued growth of the formalised mathematics library and the AI tooling that makes formal proof a normal mathematical activity rather than a heroic one. Framework reading: cascade across formalised mathematics, verified software, AI safety verification; cost trajectory of the AI-assisted proof generation steeply falling; the marginal team's contribution unusually high. Resources: tens to a few hundred million; mathematicians, programming-language theorists, ML engineers; sustained library-curation labour; integration with the major proof assistants and AI math tools.

48. Generalisation benchmarks for autonomous mobile robots

Top dimension — Verification economics: 5/5. Competition now: 2/5. The honest, adversarial benchmarks that distinguish a robot that has memorised its training distribution from one that genuinely generalises across environments. Framework reading: verification-economics applied to robotics; cascade across the whole emerging humanoid and mobile-manipulation industry; current allocation dwarfed by the scale of capabilities marketing. Resources: tens to a hundred million across non-profits and rigorous labs; robotics engineers, statisticians, evaluation specialists; physical test environments that resist gaming; regulatory and procurement standards that reward verified generalisation.

49. Engineered kill-switches and biocontainment for synthetic biology

Top dimension — Defender-favoured dual-use: 4/5. Competition now: 2/5. Reliable, auditable mechanisms that limit the survival, replication and horizontal gene transfer of engineered organisms outside their intended environments. Framework reading: defender-favoured at the level of biotechnology's broader licence to operate; cascade across every later release decision; current allocation small relative to the regulatory and reputational stakes. Resources: a few hundred million across academic and commercial groups; synthetic biologists, evolutionary biologists, regulatory scientists; long-running evolutionary stability tests; international standards for what counts as adequate containment.

50. Kelp and seaweed aquaculture for protein, feed and carbon

Top dimension — Cascade value: 3/5. Competition now: 3/5. Industrial-scale macroalgae cultivation as a low-input source of protein, feed additives that reduce livestock methane, and a partial carbon-removal pathway. Framework reading: cascade across food security, livestock emissions and ocean carbon; cost trajectory of automated mariculture infrastructure favourable; current allocation tiny relative to the addressable problem set. Resources: one to three billion in deployment capital across multiple companies; marine biologists, automation engineers, food scientists, aquaculture operators; permitted ocean leases (the binding constraint in many jurisdictions); offtake agreements from feed and food customers; MRV infrastructure for any carbon claims.


How to read this list

Three honest disclaimers.

First, several of these will look unreasonable. So did the projects on the smartest list at the moment they were funded. The framework's job is not to converge with the consensus; it is to make the bet legible enough to argue with. If you disagree with a specific entry, the right response is a paragraph on which curve it is not bending, which cascade it is not firing, or which window it is not closing.

Second, the resources figures are deliberately rough. They are intended as orders of magnitude, not budgets. A real allocator who picks an entry will need a serious feasibility study. The figures are the framework's rough sense of what kind of bet this is, not what the cheque should be.

Third, the list is a snapshot. The framework expects roughly twenty per cent of any such list to be wrong within five years and another twenty per cent to be obviated by an unexpected upstream solution. That is not a defect; it is the honest property of a list that says anything specific. The smaller revision after each five-year retrospective is more interesting than the original list.

— Siri Southwind

Read the framework · Methodology · Smartest 50 · Dumbest 50 · Current 50 likely to age badly

Reference
Glossary

Glossary

The vocabulary of the field. Some of these are borrowed from existing literatures and clarified for use here; some are new. The point of a glossary is to let people argue inside the framework rather than around it.

Core concepts

Differential Problem-Solving. The formal name of the discipline. The principle that problems should be selected and ordered with explicit attention to how their tractability is moving over time, not only by their importance, current cost and neglectedness. Direct analogue to Bostrom's Differential Technological Development.

Problem Timing. The popular handle for the same discipline. Used in writing aimed at a general audience.

The Tractability Frontier. The moving boundary, at any given moment, between problems that can be cheaply solved with currently-available tools and problems that cannot. The frontier moves outward as costs fall, and occasionally inward as evidence is lost. Most allocation decisions amount to a judgement about where a particular problem sits on the frontier today and where it will sit in n years.

The Wait Curve. The visual representation of the trade-off between attacking now and attacking later. Plots expected cost-to-solve over time on one axis and probability-weighted value on the other. The right time to attack is roughly where the rate of cost decline first falls below the rate of value decline, adjusted for cascade and demonstration value.

Dimensions

Cost trajectory. The rate at which the cost-to-solve a particular problem is changing year on year. Often follows Wright's law if the problem is dominated by a learning-curve input. The single most important new variable in the framework.

Tractability trajectory. Distinct from cost trajectory. Tractability can shift not because the cost is falling but because the shape of the problem is changing — for instance because a complementary technology now exists that turns the original hard problem into a smaller residual. AlphaFold did not make protein folding cheaper to brute-force; it changed what kind of problem it was.

Direct value. What the world looks like immediately after the problem is solved, in concrete units.

Cascade value. The value generated by what becomes solvable, cheaper or differently-shaped because the original problem has been solved. Often the dominant term, often invisible at decision time.

Demonstration value. The value of removing the question of whether a problem is soluble. Even an expensive, ugly, brute-force solution can be valuable purely because it proves the category exists. The Manhattan Project, Apollo, AlphaFold and AlexNet all derived a meaningful share of their value from demonstration.

Optionality value. The value of solving the problem in advance of needing the solution, on the bet that complementary technology, market or regulation will arrive later and make the solution useful. Borrowed from real-options theory.

Decay rate of value. The rate at which the value of a solution declines as the world moves around it. Fast-decay problems should usually be attacked later, not earlier.

Window. The time horizon beyond which the problem cannot be solved at all, or cannot be solved with current evidence intact. Closing windows include language extinction, ecosystem collapse, archive degradation, witness mortality. Open windows are most of the rest.

Reusability of by-products. The degree to which the data, infrastructure, methodology or talent generated by an attempt at the problem are valuable independent of whether the headline succeeds. High reusability lowers the effective cost of trying.

Crowding (also: neglectedness). How many smart, motivated, well-resourced teams are already attacking the problem. Crowded problems have lower marginal return per additional resource even if absolute value is high.

Crowdability. A separate dimension. Whether the problem decomposes into independent units small enough that a large distributed effort can attack it. Different from crowding; a problem can be crowded but not crowdable, or crowdable but not crowded.

Verifiability. The cost of confirming that a proposed solution is in fact a solution. Some problems have cheap verifiers (chess, theorem-proving, protein-folding given ground truth). Others have verifiers that are themselves hard problems.

Coordination cost. The cost imposed by needing many independent actors to agree, contribute or stay out of the way for the problem to be solved.

Capital intensity. The fraction of total cost that is up-front, sunk and irreversible. A two-billion-dollar fab is a different bet from two billion dollars of operating expense over a decade.

Asymmetric payoff. Whether the distribution of outcomes is roughly symmetric or heavy-tailed. Heavy-tailed distributions justify portfolio strategies that a normal-distribution decision-maker would reject.

Strategic dual-use. Whether solving the problem benefits you and your adversaries equally, or asymmetrically, or only your adversaries. The framework's full treatment, including the four-category classification (benign-default, defender-favoured, attacker-favoured, symmetric) and the cases where the standard cascade and demonstration readings invert, is in Dual-use & catastrophic risk.

Physical-resource dependency. What the project requires from the physical world that cannot be substituted away. Three sub-aspects: energy intensity, atom-class dependency (rare earths, lithium, gallium, helium, isotopes, particular biologics) and permitted-action availability (siting, regulatory, environmental, dual-use export). When cognition is cheap, this dimension is increasingly the binding constraint.

Generation–verification asymmetry. The fact that AI has made it dramatically cheaper to produce hypotheses, code, designs and analyses than to check them. The bottleneck has moved to the verification side and is staying there. This is the central technical reason the framework promotes verification cost to a first-class dimension.

Patterns

Brute-force-then-elegance. A common sequence in which an expensive, unglamorous demonstration project is followed by a much cheaper, more elegant version of the same capability. The demonstration produces the data, the talent and the proof of feasibility that the elegant phase consumes. HGP → modern sequencing, ImageNet → modern vision, crystallography → AlphaFold, Apollo → reusable rockets.

Cascade firing. The moment when an upstream problem is solved and a wave of downstream problems become tractable in quick succession. The window between cascade firing and the consensus catching up is one of the prime arbitrage opportunities in the framework.

The unlock. Synonym for the moment a previously hard problem becomes cheap, usually because of a complementary technology arriving rather than a direct attack on the problem itself.

The decomposition move. Identifying the sub-problem whose solution unlocks the rest, attacking that one with brute force, and letting the cost decline carry the rest of the field.

The vanity case. A brute-force project undertaken not because the framework justifies it but because the institution sponsoring it wants the prestige of the project. Not always wrong; often misclassified.

The barbell. Taleb's portfolio shape applied to research allocation: a large allocation to safe, low-variance work paired with a small allocation to highly speculative, heavy-tailed bets, with little in the middle. The framework uses the barbell as the default response to deep uncertainty over cost trajectories.

Antifragile by-products. A project whose failure path still produces useful capital — talent, methodology, infrastructure, partial results — rather than a clean write-off. The framework's reusability of by-products dimension is the operational form of Taleb's antifragility principle.

Scenario. In the Pierre Wack / Royal Dutch Shell tradition, an internally-consistent narrative about how the future could unfold, anchored in identifiable driving forces and explicitly named critical uncertainties. Not a forecast. Not a best-case, base-case, worst-case. Used in the framework to make the unstated forecast inside any dimension score explicit, so that the score becomes scenario-conditional rather than masquerading as a single-future judgement.

Robust position. A bet that scores well across the small set of plausible scenarios over which the framework is being run, rather than scoring brilliantly in one and badly in another. Patient-infrastructure bets are typically robust. Just-early bets typically are not.

Failure modes (see Anti-patterns)

Premature optimism. Attacking a problem now when waiting two years would reduce the cost by an order of magnitude and the demonstration value is small.

Pathological patience. Waiting for a problem to be cheaper when the window is closing, the cascade is large or the demonstration unlocks the next twenty problems.

Verification debt. Allowing generation to outpace verification until the unverified outputs sink the project. The 2026-onwards canonical case: an AI-augmented team that has scaled output volume tenfold without redesigning its review process to match.

Cascade chasing. Reorienting after every fast cascade rather than building the durable position that survives multiple cascades. Each pivot looks sensible at the moment of pivot; the cumulative effect is a team without a durable position and a portfolio of half-finished commitments.

The vanity sprint. A high-profile attack on a problem chosen for its visibility rather than its position on the framework.

The path-dependence tax. Continuing to work on a problem because that is what your team, lab or institution is set up to do, after the framework has rated the problem as low-priority.

The cascade blindness. Focusing on direct value and missing that a problem's main value is in what it makes possible. AlexNet's direct value was modest; its cascade value was enormous.

The demonstration dilution. Repeatedly demonstrating something that has already been demonstrated, on the assumption that demonstrating it again will produce comparable value. It usually does not.

Scoring

The rough score. A 0-3 scale on a small set of dimensions, in order: verification cost, cost decline rate, direct value, cascade, demonstration, window, physical-resource dependency, crowding. Each axis carries a written justification. The order reflects the framework's foregrounding of verification cost and cost trajectory as the two most important variables. Designed to force clarity without inviting false precision.

The stupidity index. The retrospective gap between what we should have known at the time and what we did. Distinct from what we know now; many projects look stupid only in hindsight because the cost curve dropped faster than was reasonable to predict.

The portfolio shape. The distribution of bets across positions on the Tractability Frontier. A healthy portfolio has a small number of just early bets, a smaller number of moonshots and a deliberate share of unallocated curiosity-driven work.

Institutions and instruments

Focused Research Organisation (FRO). A time-limited, mission-specific research organisation, somewhere between a startup and an academic institute. Designed for the brute-force-then-elegance phase of the curve. Recent examples include the various FROs funded by Convergent Research, Astera, Arc and ARIA in the UK.

Advance Market Commitment (AMC). A funder commitment to buy any solution that meets specified criteria. Used for vaccines and theoretically generalisable to many problem types. The framework's neglectedness and cascade-value dimensions are essentially what an AMC designer needs to estimate when sizing the commitment.

Prize. A reward offered for solving a specified problem. X-prize, Netflix prize, Kaggle, the Millennium Prizes. Useful when the problem can be specified precisely and verified cheaply. Less useful when defining "solved" is the hard part.

Prediction market on solvability. A genuinely new instrument. Contracts that pay out on whether a problem is solved at a given cost by a given date. Existing prediction markets (Polymarket, Manifold) have not yet seriously attacked this domain. The design is non-trivial but possible.

Lineage shorthand

The Hamming question. What are the most important problems in your field? Why aren't you working on them? The founding question of the field.

The Hilbert move. Listing open problems explicitly to shape a generation of work. Useful even when the list is wrong; it makes the question askable.

The Bostrom principle. Differential technological development — accelerating beneficial technologies relative to dangerous ones rather than accepting whatever order they arrive in.

The Thiel question. What do you strongly believe to be true that very few other people believe? In framework terms: which problems is the consensus mispricing right now, and in which direction?

The IT-N criterion. Effective Altruism's Importance × Tractability × Neglectedness. The Problem Timing framework adds a fourth term, timing, and treats tractability as a function of when the question is asked.

Anti-patterns

Anti-patterns

A catalogue of the recurring failure modes in problem allocation. Each is named, described, illustrated and given a brief diagnostic. The catalogue is opinionated. Some readers will think their own work is being criticised; in some cases it probably is.

The patterns are clustered into four families: timing failures, value-mispricing failures, institutional failures and intellectual failures.

Timing failures

Premature optimism

Attacking a problem now when waiting two years would reduce the cost by an order of magnitude and the demonstration value of moving early is small.

Symptoms: extremely capable team, well-funded, headline goal of building a thing that will be a free by-product of a different technology arriving in eighteen months. Many enterprise NLP projects between 2017 and 2022 fit this pattern.

Diagnostic: if the dominant input on which the project depends is on a fast cost-decline curve and your project does not bend the curve, you are paying twenty times what your patient competitors will pay. The defence — we'll have a head start — is only valid if the head start compounds, which it usually does not.

Pathological patience

Waiting for a problem to become cheaper when the window is closing, the cascade is large, or the demonstration unlocks the next twenty problems.

Symptoms: the field has consensus that a problem is hard, no one is attacking it, and everyone agrees it will be cheaper in five years. Five years pass. The window has narrowed.

Diagnostic: ask whether the problem has any of the four attack-now triggers — large cascade, demonstration unlock, closing window, valuable by-products — and if any are present the patient case is weaker than it feels.

The slow swap

Switching from one problem to another at the right moment is rarely done well by institutions. Funded projects accumulate inertia; people in projects lobby for continuation. By the time the swap happens, the new problem is itself nearly obsolete.

Diagnostic: a project that should have been ended three years ago is still being defended on the grounds that we have already invested so much. This is the sunk-cost fallacy in problem-allocation form.

Curve myopia

Forecasting the cost trajectory of a problem by extrapolating the current rate. Most cost curves have inflection points; almost all forecasts based on naive extrapolation miss them.

Diagnostic: if a project's why now depends on the cost dropping by a particular factor by a particular date, ask whether the underlying input is genuinely on the curve being extrapolated or whether the extrapolation is wishful.

Verification debt

Allowing generation to outpace verification until the unverified outputs sink the project. The classic 2026 case is an AI-augmented engineering or research team that produces ten times the artefacts the previous team produced and reviews them at the same rate, accumulating a backlog of unverified work whose errors compound until the headline goal becomes unreachable.

Symptoms: a team or institution that has happily adopted AI productivity tools and seen output volume rise sharply, while the verification process — code review, experimental replication, fact-checking, audit — has not been redesigned to match. The errors are not yet visible; the debt is invisible until it is not.

Diagnostic: if generation cost in your project has fallen by an order of magnitude in the past two years and verification cost has not changed, you are probably accumulating verification debt. The framework's promotion of verification cost to a first-class dimension is, in part, a response to exactly this anti-pattern.

Cascade chasing

Reorienting after every fast cascade rather than building the durable position that survives multiple cascades. The framework correctly observes that cascades are firing faster as the underlying technology curves accelerate; the right response is sometimes to ignore the cascade and continue building the position that benefits from many of them at once.

Symptoms: a team that has changed strategic direction four times in two years on the strength of new model releases, new tools, new research breakthroughs. Each pivot looked sensible at the moment of pivot. The cumulative effect is a team without a durable position and a portfolio of half-finished commitments.

Diagnostic: ask what the team would still be doing if the next two cascade events did not happen. If the answer is "nothing", the team is built around chasing rather than building. The brute-force-then-elegance pattern requires sustained commitment; cascade-chasing is the opposite of that commitment.

Value-mispricing failures

Cascade blindness

Focusing on direct value and missing that a problem's main value is in what it makes possible. The classic example is the early scepticism about ImageNet — fourteen million labelled images is a vast investment for a benchmark dataset — which underweighted the cascade value that turned out to dominate.

Diagnostic: if you cannot articulate the cascade value of a project in three sentences, the cascade case has not been seriously considered. Add it.

Cascade hallucination

The mirror image. Inventing imagined cascades to justify projects that have neither direct nor cascade value. Common in fundraising decks and in research proposals competing for limited funding.

Diagnostic: a credible cascade case names specific downstream problems with specific cost reductions. A hallucinated cascade case waves at "transforming the field". Ask for the specifics.

Demonstration dilution

Repeatedly demonstrating something that has already been demonstrated, on the assumption that another demonstration will produce comparable value. The first lab to fold a protein with a transformer model produced enormous value. The third lab to do the same thing produced almost none.

Diagnostic: if the headline of the project is we did the thing that someone else has already shown can be done, the value is in incremental refinement, not in demonstration. Be honest about the size of the prize.

Direct-value inflation

Claiming a direct value much larger than the project will produce, either by counting cascade value as direct, by selecting the most favourable scenario, or by projecting market sizes that are not actually addressable. Endemic in deep-tech fundraising.

Diagnostic: compare the direct-value claim against a base rate of comparable projects. If it is more than ten times the base rate without an extraordinary explanation, it is probably wrong.

Decay denial

Treating a fast-decaying value as if it were permanent. Some solutions stop being valuable as the world moves around them. A faster transistor in 1995 was very valuable; a faster transistor in 2025 is barely worth a press release.

Diagnostic: ask how long the value of the solution will hold its own against substitutes. If the answer is "less than the development time," the project is mistimed.

Institutional failures

The vanity sprint

A high-profile attack on a problem chosen for its visibility rather than its position on the framework. Common in big-science proposals and in the corporate-sponsored "moonshot" departments that flourish in good times and disappear in downturns.

Diagnostic: the project would not be funded if the celebrity attached were swapped for someone unknown. The framework verdict on the problem itself, in the absence of the celebrity, is much weaker than the funded version implies.

The path-dependence tax

Continuing to work on a problem because the team, lab or institution is set up to do so, after the framework has rated the problem as low-priority. Individually rational, collectively expensive.

Diagnostic: if you are working on the problem because changing would be hard, the framework rates the problem as obsolete and yet it is still funded, the tax is being paid. Sometimes the tax is worth paying — for a year or two while the team retrains. Often it is not.

The grant-shape problem

Funding agencies that reward incremental, predictable work make incremental, predictable work the dominant kind of work. The framework's high-leverage bets — counter-consensus, brute-force-then-elegance, closing-window fieldwork — are systematically harder to fund through standard grant mechanisms.

Diagnostic: if your funder's process for evaluating proposals does not include a why now question with structured answers, the funder is selecting against Problem Timing.

The committee compromise

A panel of decision-makers each preferring different problems will tend to converge on a portfolio that nobody actually wanted, dominated by the problems no member objects to. These are usually not the highest-leverage bets.

Diagnostic: if the funded portfolio looks like the intersection rather than the union of the panel's preferences, the committee compromise is operating.

The replication trap

A field that has demonstrated a result but never tested its robustness can spend a decade producing variations of the unreplicated finding. Common in social and biomedical sciences. The framework reading is usually that replicating the foundational result is the highest-leverage move and the field is avoiding it for sociological reasons.

Diagnostic: if the cascade depends on a foundational finding that has never been independently replicated, the cascade is not real until it has been.

Intellectual failures

False precision

Producing a numeric score from the framework's dimensions and treating the number as objective. The dimensions are a checklist for sharper thinking, not a calculator. A spurious decimal place can be worse than honest uncertainty.

Diagnostic: if the framework verdict is being defended with the score rather than with the reasoning that produced the score, the score is being misused.

Hindsight overconfidence

Reading the historical record of vindicated bets and concluding that you would have made the same bets. You probably would not have. The HGP was contested at the time; AlexNet was a side project; ImageNet was widely thought to be a dead-end investment. The framework's job is to make the historical reasoning legible, not to make hindsight feel inevitable.

Diagnostic: ask whether you would have been the one to fund a now-canonical project at the time the funding decision was actually made. The answer for most readers is no.

The single-axis trap

Treating one dimension of the framework as the dimension that matters. Cost trajectory enthusiasts wave it at every problem; cascade-value enthusiasts wave that one. Most interesting allocation decisions turn on two or three dimensions, not one.

Diagnostic: if you can characterise a problem with a single sentence about a single dimension, you are not yet using the framework.

Framework worship

Treating the framework as a source of authority rather than a tool for thinking. The framework is an opinion, partially calibrated, from a small number of historical examples. It is improvable and falsifiable. People who treat it as a calculator are about as useful as people who do not use it at all.

Diagnostic: if you have not yet found a case where the framework gave the wrong answer and you had to override it, you have not yet used it on enough cases.

The contrarian pose

Mistaking being against the consensus for being right. The framework rewards counter-consensus bets where the reasoning is better than the consensus reasoning, not counter-consensus bets in general. Most counter-consensus positions are counter-consensus for a reason.

Diagnostic: a strong counter-consensus position names exactly which premises the consensus is getting wrong and what evidence would update them. A weak one says "the consensus is asleep" without saying why or how it would wake up.

Single-scenario thinking

Scoring a problem as if there were a single canonical future against which the dimensions should be evaluated, rather than across the small set of plausible scenarios that bracket the relevant uncertainty. The cost trajectory of synthetic biology, the closing window for indigenous languages, the cascade value of fusion — each is conditional on a specific story about how the world will unfold. A score that does not name its own scenario is a number masquerading as analysis. The Pierre Wack tradition (introduced in Intellectual lineage) is the explicit corrective: build three or four internally-consistent scenarios that differ on the variables that most plausibly drive the score, and look for the bets that are robust across them rather than the bets that are brilliant in one and catastrophic in another.

Diagnostic: if you can score a problem in five minutes without having to specify which future you are scoring against, you are doing single-scenario thinking. The fix is to write two sentences naming the scenario before you score, and a third naming the scenario in which the score would invert.

Using the catalogue

The catalogue is most useful as a vocabulary for criticising existing projects and proposals. It is less useful as a generative tool for picking new problems; for that, the framework dimensions in Dimensions and the historical examples in Historical examples are the better starting point.

The catalogue is also incomplete. New patterns will emerge as the framework is used more widely. Submissions of additional anti-patterns are welcome — name, description, example, diagnostic — and will be added in subsequent revisions.

Current bets

Current bets

This is the file that turns the framework into a stance. The dimensions and the historical examples are useful only if they produce verdicts on what should be attacked now and what should be deferred. This section tries.

The list is organised by time horizon — the period over which the allocation decision resolves. Allocators with different cycle lengths read different sections. A founder running a twelve-month seed budget needs the one-year section; a foundation funding a multi-decade cohort study needs the ten-year section; a public funder committing capital to fusion sits between the three.

Within each horizon, entries are grouped by verdict: attack now, attack with caveats, probably wait or stop, and open (where the framework cannot yet rule confidently).

The list is opinionated and revisable. Several of the calls below will be wrong. The point is to be specific enough to be falsifiable, not vague enough to be safe.


Different allocators read different sections of this list. The diagram below shows the four horizons and who reads each — founders and corporate IT directors the one-year section, venture capital and applied research the three-year, foundations and institutional R&D the five-year, patient capital and civilisational bets the ten-year. The bar lengths reflect the time the decision takes to resolve.

Different allocators on different cycles 1 year founders, corporate IT, annual budgets 3 years venture capital, applied research 5 years foundations, institutional R&D 10 years patient capital · civilisational bets Bar length = the window the decision resolves over.
Time horizons of allocation decisions. Bar length = the window over which the decision resolves. Allocators on different cycles read different sections of the list.

One-year horizon

Decisions whose verdict resolves within the next twelve months. Allocators on annual budget cycles, founders making 2026 product calls, corporate IT directors planning the next fiscal year — read this section first.

Attack now

Better verifiers in domains where generators are improving fast. In every domain where generators are improving (code, scientific writing, image generation, structural biology, agentic outputs), the bottleneck is shifting from generation to verification. Cheap, robust verifiers are dramatically undersupplied relative to the value they would unlock. Investment now pays off within months as the next generation of generators ships.

Probably wait or stop

Hand-built rule systems for tasks that AI is closing on. Including many enterprise NLP, document-classification and information-extraction systems. The marginal cost over the next eighteen months will fall faster than the marginal benefit of having the system today. The framework verdict on most of this category is stop and procure off-the-shelf.

Manual content moderation at scale. The cost of automated moderation is dropping faster than the cost of human moderation. The remaining hard cases are the ones requiring judgement; brute-force moderation should be automated and the human capacity redirected to the genuinely hard cases.

Human transcription of well-recorded modern audio. OCR-equivalent technology for speech is cheap and accurate enough that human transcription of clean audio is a tax. Reserve human work for the edge cases where the audio is ambiguous, the language is rare, or the context is genuinely hard.

Yet-more-elegant chess and Go engines. A historical case still occasionally repeated. AlphaZero closed the question. The marginal value of further refinement is essentially zero outside of niche human-coaching applications. The fact that this is still on the list in 2026 says something about institutional path-dependence.


Three-year horizon

Decisions whose payoff is visible by 2029. Most venture timelines, corporate-strategy commitments and applied-research programmes live here.

Attack now

Robotic data collection in the physical world. The supply of internet-scale text data has plateaued. The next bottleneck for embodied AI is grounded sensorimotor data — what a million robots see, touch and move. The cost of robotic platforms is dropping; the cost of the data they produce is the binding constraint. Brute-forcing collection now produces a substrate that the elegant phase will consume within three years.

Materials and chemistry experiment automation. A specific industrial bet. The cost of running a single chemistry or materials experiment has been roughly flat for decades; the cost of automating the experiment-design loop is collapsing. The brute-force version of an autonomous experimental laboratory is now buildable; the elegant version (closed-loop discovery at scale) follows. The first labs to get this right will reshape the field within three years.

Vertical AI agents in regulated and data-rich domains. The horizontal AI-agent race is crowded; the vertical race is not. Legal, medical, financial, scientific, accounting and certain industrial verticals reward depth over generality. The three-year window for taking defensible vertical positions is open now and closing.

Attack with caveats

Drug discovery on AlphaFold-shaped foundations. The cascade has fired. The opportunities are still substantial but pricing has caught up faster than for earlier biotech waves. Specific positions still defensible (membrane proteins, allosteric sites, intrinsically disordered proteins, conformational ensembles); broad bets less so.

Custom protein and enzyme design for industrial chemistry. A genuine current bet. The capability is real and the cascade is just starting. The principal risk is crowding from the next generation of foundation models rather than from competing teams. Pick problems where the data advantage outlasts the model advantage.

Robotics for unstructured environments. Long-running case. The technology has been almost there for ten years. There are reasons to think the cost-trajectory inflection is near (foundation models for vision and control, cheaper hardware, ubiquitous simulation). Worth attacking with eyes open and a willingness to be wrong about timing.

Probably wait or stop

Small-scale domain-specific language models built from scratch in 2026. A predictable failure mode. Base models are improving fast enough that domain-specific small models, unless deeply differentiated by data the base model cannot get, will be replaced by general-model-plus-RAG within a year of shipping.

Most generic ML-pipeline tooling. The space is crowded; the cost of the underlying capability is collapsing; the marginal team's contribution is small. Better to build application-specific infrastructure than yet another general workflow tool.


Five-year horizon

Decisions whose cascade matures by 2031. Most institutional R&D, foundation programmes, deep-tech venture and infrastructure investments live here.

Attack now

Mass digitisation of small archives, libraries and museum collections. The cost of OCR, image classification and semantic indexing is collapsing. The cost of physical scanning — the brute-force step that produces the substrate — is essentially flat and capacity-constrained. Scanning every reachable archive now creates the dataset that the next decade of historical, linguistic and scientific research will sit on. Many archives are degrading; institutional capacity to scan is itself fragile.

Capture and structuring of tacit professional knowledge. In law, medicine, engineering, finance and the trades, vast quantities of tacit expertise live only in the heads of senior practitioners. The cost of structured capture (interview, transcribe, structure, validate) is dropping. The supply of senior practitioners willing to be captured is not. The window is partially closing as a generation retires. The cascade value for both AI training and human education is large.

Open-source canonical datasets in legally complex domains. Healthcare, law, finance, education. The friction is not technical; it is institutional and legal. The patient first-mover who builds the canonical open dataset will compound for a long time. Most teams do not attempt this because the politics are hard. That is precisely the reason the consensus is mispricing it.

Attack with caveats

Carbon capture. Pricing has been disturbed by policy, sentiment and the specific shape of the funding environment. The cost trajectory of direct air capture is steep but starting from a high baseline. The framework rating depends heavily on prices we cannot know — carbon-credit markets, regulatory shifts, breakthrough chemistry. A live bet, not a clear one.

Open

Quantum computing. The cost trajectory for useful quantum compute is genuinely uncertain. The framework would say brute-force the demonstrations of quantum advantage on real problems and let the elegant phase emerge, but the demonstration phase has been longer and harder than initial timelines suggested. Hard to call from where we sit today.

Social science replication. The replication trap is acute. Brute-forcing the replication of the most cited findings in social and biomedical science would be an enormous service to the field. The framework would rate this highly. The institutions that should fund it largely will not.


Ten-year horizon

Decisions whose value accrues over a decade or more. Long-cycle public funders, multi-generational research programmes, patient infrastructure capital and certain civilisational bets live here.

Attack now

Endangered-language and oral-history capture. The closing-window argument is at its sharpest. Roughly half of the world's languages are expected to lose their last fluent speaker within a generation. The cost of recording, transcribing and analysing speech is collapsing; the supply of fluent speakers is not. Disproportionate value sits in capture-now-analyse-later projects, especially because foundation-model approaches will reward the existence of these recordings far more than the consensus currently prices.

Long-running cohort studies in human health. A boring case the framework rates surprisingly highly. The cost of measuring large cohorts continuously — sequencing, wearables, imaging, biomarker panels — has fallen dramatically. The value of the dataset accumulates with time; you cannot start a forty-year cohort study in twenty years. Closing-window dynamics for the data, even though the underlying problem is not closing.

Live closed-window infrastructure: glaciers, ecosystems, archaeological sites. Direct measurement of natural systems that are degrading. The cost of measurement is falling; the systems are not waiting. Most of these projects do not look like the brute-force-then-elegance pattern of HGP or ImageNet. They look like cataloguing, which is unfashionable but correct.

Attack with caveats

Fusion. The hardest call on this list. Real science is happening; capital is flowing in; the timelines remain optimistic by historical standards. Framework verdict: the demonstration of net-energy-positive fusion has enormous demonstration value; the cost trajectory beyond demonstration is much less clear. A small portfolio of fusion bets makes sense; a large concentrated one does not.

Open

Whole-brain emulation. Probably premature on the framework. The dependency graph is long and the demonstration unlock is far away. But the closing-window arguments for capturing existing brain data exist, and the cascade if successful is enormous. Genuine open question.

AI alignment research. Direct value: substantial. Cascade: substantial. Tractability: contested. The framework is partly inadequate here because the what to attack question depends on the very capabilities the work is trying to govern. A meta-question for the framework itself. See Dual-use & catastrophic risk for the framework's modification on this and adjacent categories.

Longevity research. A field where the consensus is sharply divided. The framework would attack specific cell-biology problems that are now becoming tractable (cellular senescence, partial reprogramming, biomarker panels) and probably defer more speculative bets. But the field has produced surprises before; humility is warranted.


How to use this section

The bets in this section are scenario-conditional. The implicit scenario is roughly continued capability scaling, continuing-but-slowing cost declines on energy, sequencing and compute, no civilisational shock, governance evolving slowly behind technology. Several entries flip horizon or verdict in plausible adjacent scenarios — a hard pause on frontier AI, a pandemic of the magnitude the framework's defender-favoured biosecurity entries are written against, a dramatic deglobalisation event, a fusion or geothermal breakthrough that changes the energy curve five years earlier than the modal forecast. Read the list with the scenario lens (see Intellectual lineage and Models & scoring): which bets are robust across the small set of plausible futures, which are directional bets on the modal scenario, and which would invert under one named alternative. The bets that are robust are the ones the framework most strongly recommends.

Disagree, in writing. Send a one-paragraph case for moving any item to a different horizon or a different verdict, with the framework reading you would replace mine with. The list improves through public argument, not through private conviction.

Reread it every six months. The framework rating of any individual problem will move; some of these calls will be wrong; some will be obvious in retrospect. The accuracy of the list across revisions is itself one of the framework's tests; see Limits & falsifiability.

Use it as a recruiting list. The attack now groupings across all four horizons are implicitly a request-for-problems for talented teams. If any of the items match your skills, the world has more capacity than it has people willing to point at the specific problem and commit.

— Siri Southwind

Limits & falsifiability

Limits & falsifiability

Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve.

— Karl Popper, Objective Knowledge (1972)

A framework that cannot be wrong is not a framework. This section is the honest accounting of where Differential Problem-Solving fails, what kinds of evidence would update it, and what it cannot do regardless of how much evidence accumulates.

The point is not modesty for its own sake. The point is that a framework with explicit limits is one users can actually rely on, because they know which decisions to bring to it and which to bring elsewhere. The most dangerous tool in any allocator's belt is the one whose creator has not bothered to mark the cases where it gives wrong answers.

What the framework cannot do

It cannot tell you whether a problem is good to solve. The framework prices timing. It does not price morality. A child-killing technology and a child-saving one sit on the same axes. The choice between them has to be imported from a moral framework the user supplies.

It cannot reliably price unknown unknowns. A substantial share of the most important problems in any decade were not on anyone's list at the start of that decade. The framework, applied honestly in 2005, would have given low ratings to deep-learning research because the cost-trajectory of the relevant compute was not yet visible. The framework, applied honestly in 2017, would have rated transformer-architecture research more highly than most conservative allocators did but would still have understated the cascade. There is no way to fix this from inside the framework. The right response is to allocate a portion of any portfolio — somewhere in the ten-to-twenty-per-cent range — to work explicitly outside the framework, governed by other criteria.

It cannot distinguish vindicated from doomed contrarians in advance. Every successful counter-consensus bet looks identical, ex ante, to a pile of failed counter-consensus bets. The framework provides reasoning for both kinds. Better reasoning increases the hit rate; it does not eliminate failure.

It cannot model the politics of allocation. A correct framework verdict on a problem does not produce funding when the political environment will not permit it. Pandemic preparedness in 2026 is the working example: the framework reading is high; the political tailwind is weak. The framework can tell you the gap between what should be funded and what will be funded; it cannot close the gap.

It cannot tell you whether a specific team will succeed. The framework rates problems, not teams. A well-rated problem given to a weak team is a worse bet than a moderately-rated problem given to an extraordinary one. Allocator judgement on team quality remains independent and irreducible.

It cannot price catastrophic-risk problems usefully on its own. Some problems have asymmetric downside (irreversible existential or civilisational risk) that the framework's cascade-and-demonstration accounting does not capture cleanly. Bostrom's Differential Technological Development, Toby Ord's existential-risk literature and the broader catastrophic-risk frameworks remain the right tools for these cases. Differential Problem-Solving complements them; it does not replace them.

It implicitly assumes a single future. Every dimension score in Models & scoring carries an unstated forecast about how the world will unfold over the relevant time horizon. The cost trajectory of a technology, the closing of a window, the firing of a cascade — each is conditional on a particular future that is rarely made explicit. The framework, used without the discipline of scenario thinking (introduced via Pierre Wack and Royal Dutch Shell in Intellectual lineage), tends to converge on whichever future the user already half-believes. The corrective is to score across a small set of plausible scenarios and look for the bets that are robust to the future-distribution rather than to a single forecast. The discipline is now noted in models and scoring; the framework as a whole is still vulnerable to single-scenario thinking when the user is in a hurry.

When the framework gives wrong answers

A small number of recurring failure modes deserve naming.

Curve underestimation

When a cost-decline curve is much steeper than the framework predicts, attacking now looks correct but turns out to be wasteful in retrospect. Several enterprise NLP projects between 2017 and 2022 are framework cases: the curves were visible but the inflection was sharper than even the optimists expected. The framework's verdict was probably right at the time; the magnitude of the right verdict was wrong.

Curve overestimation

When a cost-decline curve plateaus or reverses, attacking later looks correct but turns out to leave value on the table. Fusion energy is a long-running case: the cost trajectory has been forecast to break for thirty years and has only recently started to. Allocators who deferred fusion in expectation of cheaper future research were partly correct and partly wrong. The framework can point at the curve; it cannot guarantee the inflection.

Cascade misjudgement

When a problem's cascade value is larger than predicted (positively or negatively), the framework verdict is right in shape and wrong in size. The internet's cascade was vastly underestimated by almost everyone in 1990; AlphaFold's cascade is being more honestly priced because the previous case taught the field a lesson. The framework formalises the question; it does not provide certainty in the answer.

Window misjudgement

When the closing-window argument is overcalled, urgent-now projects displace patient-elegant alternatives. Several mass-digitisation programmes have been framed with closing-window urgency that turned out to be exaggerated; the same digitisation could have been done five years later for half the cost.

Demonstration mispricing

When the demonstration value of a project is over- or under-stated, the framework's overall verdict shifts substantially. Apollo's demonstration value is contested in the historical literature; reasonable analysts disagree by a factor of five on how much of the program was justified by demonstration. The framework names the dimension; it cannot tell you the right number for any specific case.

What would change the framework

A framework should be revisable. Several kinds of evidence would force an update.

A pattern of high-rated bets failing. If the current bets list in Current bets produces a hit rate below random over the next five years, the dimensions are weighted wrongly or the framework is missing a dimension. The honest response is to identify which projects failed and why, and to revise the dimensions accordingly.

A pattern of low-rated bets succeeding. Symmetric failure mode. If the current-likely-to-age-badly list survives and thrives, the framework was reading the curves wrong. Several specific entries on that list are explicit predictions; tracking them is the test.

A consistent class of historical examples being mis-rated. The retrospective stupidity index in Models & scoring should produce verdicts that, with hindsight, accord with consensus expert judgement on most cases and diverge interestingly on a small number. If the framework's verdicts diverge consistently and in one direction (always too patient, always too aggressive), the framework has a bias that needs explicit correction.

An institutional failure mode the framework does not name. The anti-pattern catalogue in Anti-patterns is not exhaustive. New patterns will emerge, and incorporation of newly-named patterns is part of how the framework matures.

Mathematical critique of the model. The real-options and optimal-stopping formalisms borrowed in Models & scoring are imported wholesale from finance and operations research. There is room for serious technical critique of how well they transfer. A formal paper showing systematic distortions when those models are applied to scientific or research problems would force a substantial revision.

What the framework is not trying to be

A few things worth saying clearly because they have come up in early readings.

Not a calculator. The 0-3 scoring scheme in Models & scoring is meant to force articulation, not to produce verdicts. Adding the seven scores together is misuse. So is reporting the score without the reasoning that produced it.

Not an EA replacement. The Effective Altruism cause-prioritisation tradition is a sibling, not a competitor. The framework here adds time-dependence to IT-N and is otherwise broadly compatible. People who already think well in IT-N terms will find the framework congenial; people who don't may find it more useful as a starting point.

Not an Austrian or Hayekian framework. It is not making strong claims about the price system, market discovery or the limits of central planning. It is making weaker claims about how individual allocators — including market participants — can think more clearly about timing.

Not a longtermist position. Several of the framework's most aggressive verdicts are short-horizon (closing-window archives, language documentation, current-curve arbitrage). The framework is agnostic on the long-termism debate; it can be used by either side.

Not a complete theory of decision-making under uncertainty. It is one input. Capital constraints, team capacity, political feasibility, moral framing and risk tolerance all remain independent inputs. The framework is meant to make one specific question — when should this be attacked, and at what cost relative to what alternatives — tractable. It is not the whole problem.

A protocol for revising the framework

A framework that does not update is a framework that does not work. The repository commits to a specific revision schedule with public accuracy reporting. The schedule is the discipline that turns the framework into a working instrument rather than a one-off essay collection.

The schedule below is the published commitment. Each revision is a marked release in the repository with a dated changelog and an accuracy log against previous calls.

Six-monthly: the current-bets list

Current bets is revised on six-month cycles. The first revision after this draft is 2026-11-01, then 2027-05-01, 2027-11-01, and onwards. Each revision carries:

  • A diff summary: which entries moved horizons, which moved verdicts, which were added, which were removed.
  • An accuracy log against the previous revision: which earlier calls have already been resolved, which way they went, and how the framework's verdict held up.
  • A short note on any structural changes (new horizon, new verdict category, reorganisation).

The six-month cycle is short enough to catch fast-moving categories (AI, frontier-tech) and long enough that revisions reflect real evidence rather than monthly noise.

Annual: the historical and current-50 lists

The three ranked lists in top_projects/ — smartest, dumbest, current 50 — are revised annually. The first revision is 2027-05-01, then 2028-05-01, and onwards. Each revision:

  • Replaces weaker entries with stronger candidates where submitted (the lists are open and accept substitutions).
  • Reports on resolution of any current 50 entries — which were vindicated as misallocated, which surprised the framework, and what that implies for the framework's calibration.
  • Promotes specific current 50 entries to the dumbest 50 if the resolution is decisive.
  • Notes patterns of error: were the framework's mistakes systematic, in which direction, and what dimension was being underweighted.

Eighteen-monthly: the dimensions and anti-patterns

Dimensions, Anti-patterns and Glossary are reviewed on eighteen-month cycles. The first review is 2027-11-01. The longer cycle reflects that the dimensions and the anti-pattern vocabulary should change less often than the live calls. Each review:

  • Adds dimensions or anti-patterns that the past eighteen months of evidence have made necessary.
  • Demotes or removes dimensions that have proved redundant or that fold cleanly into others.
  • Reorders dimensions where the past evidence has changed which are doing the most work.

Rolling: critiques, examples and contributions

Outside the formal cycles, the repository accepts:

  • New historical examples that should appear in Historical examples or displace weaker entries in the smartest 50.
  • New anti-patterns with name, description, example and diagnostic.
  • Counter-arguments to specific current bets entries with the framework reading the contributor would replace mine with.
  • Field-guide additions for fields not yet covered (compute, robotics, mental health, agriculture, defence, finance, education).

Strong arguments will be incorporated at the next scheduled revision and credited.

What success looks like

By 2031-05-01 the framework will have been through ten revisions of the current-bets list, five revisions of the historical lists, and at least three revisions of the dimensions. The accuracy log over that period is the core test. A framework whose call-by-call hit rate is no better than random is a framework that has failed; a framework whose hit rate is meaningfully above random is the working instrument the repository is trying to build.

The accuracy log will be published in this section and linked from the README. It is the part of the framework that makes the rest of the framework checkable.

What stays out of the protocol

Some things resist this kind of scheduling and should resist it.

The manifesto in Manifesto should not be revised on a calendar. It is the rallying piece; if the argument needs revising, the framework needs more than revision.

The intellectual lineage in Intellectual lineage is mostly historical and changes slowly; ad-hoc updates as new ancestors are recognised are sufficient.

The academic paper in Academic paper follows its own publication and revision cycle, separate from this repository's cadence.

— Siri Southwind

Dual-use & catastrophic risk

Dual-use & catastrophic risk

Why this section exists

The framework is morally neutral by design. The dimensions in Dimensions tell you whether a problem is well-timed, not whether it is good to solve. For most allocation decisions this neutrality is appropriate; the moral content comes from elsewhere, from the values of the allocator and the broader political and ethical context in which they work.

There is a category where this neutrality is insufficient. A small but consequential set of problems carry catastrophic-risk asymmetries: the solutions are easier to weaponise than to defend, the harms scale with the same curves as the benefits, and the cost of getting it wrong is irreversible. For these, the framework's ordinary "attack now if the cascade is large" verdict can be specifically the wrong move, because a large cascade is exactly what makes the misuse case worse.

This section is the framework's modification for dual-use cases. It is brief, deliberate, and anchored to specific 2026 concerns. It is not a general theory of catastrophic risk — that literature exists, is much larger, and is referenced where appropriate. The point here is to make the framework usable for allocators who are currently making decisions in domains where dual-use stakes matter.

The principle: differential problem-solving applied

Nick Bostrom's Differential Technological Development is the relevant intellectual ancestor. The principle in his own words:

Differential technological development: try to retard the development of dangerous and harmful technologies, especially ones that raise the level of existential risk; and accelerate the development of beneficial technologies, especially those that reduce the existential risks posed by nature or by other technologies.

— Nick Bostrom, Superintelligence (2014)

The framework here is the same move applied one level down — choose which problems get attacked when, with explicit attention to whether the solution favours benign or malicious use.

Three things follow.

The timing question is more important for dual-use problems than for benign ones. A benign problem solved early or late produces benign value early or late. A dual-use problem solved early may produce harm before the safety infrastructure exists to contain it.

The cascade-value calculation inverts. For benign problems, large cascade is good — solving it unlocks downstream good. For attacker-favoured problems, large cascade is worse — solving it unlocks downstream harm that may dominate the benefits.

The work itself sometimes has to be deliberately deferred or actively suppressed, not because the problem is uninteresting but because the path from solution to safe deployment runs through institutions that do not yet exist.

Four categories of problem

Almost every problem falls into one of four:

The four categories arrange on a two-axis plane: attacker benefit on the vertical, defender benefit on the horizontal. Each corner has its own framework reading and is described one at a time below, but the structure is worth seeing first — the attacker-favoured quadrant (top-left) is the one where the framework's standard cascade and demonstration readings invert.

Four categories under the dual-use modification → Defender benefit increases ↑ Attacker benefit increases Attacker-favoured Cascade & demonstration INVERT offensive cyber without disclosure persuasion-at-scale systems autonomous lethal targeting Symmetric Standard framework + governance general-purpose AI capability most cryptography research Benign-default Standard framework applies most science literacy programmes most consumer products Defender-favoured Standard framework + bonus weight biosurveillance AI red-teaming & evals cyber defence • climate monitoring
The four categories under the dual-use modification. Large cascade and large demonstration value become reasons for caution rather than confidence in the attacker-favoured quadrant; the standard framework applies in the other three with varying degrees of governance.

Benign-default. No significant misuse pathway. The standard framework applies. Most problems are here.

Defender-favoured. Solving creates more value for defenders than attackers. Examples include intrusion-detection systems that help defenders update faster than attackers can adapt, biosecurity surveillance that detects pathogens before they spread, climate-monitoring infrastructure, vaccine platforms, AI red-teaming and evaluation tooling. The framework should give these a positive weighting beyond their direct value, particularly when the defensive cascade is large.

Attacker-favoured. Solving creates more value for malicious use than defensive use. Examples include certain gain-of-function research, novel offensive cyber capabilities published without disclosure paths, persuasion-and-manipulation systems with weak audit trails, autonomous-lethal-system targeting algorithms. The framework should give these a negative weighting that can override otherwise favourable readings.

Symmetric. Roughly equal value to both sides. Most general-purpose technology lives here. The framework's default reading still applies but the question is supplemented by what does the safety, verification and governance infrastructure look like, and is it ready?

The categories are not always clean. A problem can shift between them as complementary technologies arrive. The classification is not a one-time decision; it should be revisited as the surrounding ecosystem changes.

Specific 2026 categories with elevated dual-use stakes

A working list, deliberately incomplete and revisable.

AI-assisted biology. The combination of foundation models, synthesis-on-demand, and cheap reading-and-writing of biological sequences compresses the cost-trajectory of designing novel biological agents. The defensive applications are real and large; the offensive ones are catastrophic. The framework's attack-now reading on AI-assisted biology broadly is wrong; the reading is conditional on what kind of biology, by whom, with what disclosure regime.

Autonomous lethal systems. Targeting systems that operate without human in-the-loop decision-making. The defensive case (faster response in legitimate self-defence) is real; the attacker case (escalation, accidents, accountability collapse) is severe. The framework's reading needs the dual-use lens explicitly.

Persuasion and manipulation systems. AI tuned for influence — political, commercial, intimate — at scale. Synthetic media saturation. The technology will exist regardless; the framework's question is about what ecosystem of detection, attribution and verification develops alongside it.

Cyber-offensive research without disclosure paths. Vulnerability discovery that flows to offensive use rather than to coordinated disclosure. The defender-favoured version of the same research (responsible disclosure, defensive tooling, intrusion-detection improvements) is straightforward to support.

Surveillance technology. At-scale monitoring including face recognition, behavioural prediction and aggregate-population tracking. Defender-favoured applications exist (counter-terrorism, public health). Attacker-favoured ones exist too (authoritarian control, privacy erosion). The framework's reading is jurisdiction-dependent in a way most of its dimensions are not.

Neurotechnology. Brain-computer interfaces and adjacent technologies. The defensive case (medical applications for paralysis, neurological disease) is real and growing. The dual-use concerns (autonomy erosion, manipulation, surveillance) are early but real. The framework's reading is currently attack the medical use cases, defer the consumer applications, build the governance infrastructure now.

How the framework's normal reading changes

For attacker-favoured problems, several of the standard framework verdicts invert.

Large cascade becomes a reason for caution rather than for confidence. The bigger the downstream effect, the more important it is that the solution arrives with verification and governance infrastructure intact.

Closing window arguments need scrutiny. Some closing-window claims are real (knowledge degrading). Others are pretexts — "we have to do this before someone else does." The framework's closing-window reading on dual-use problems should be checked against the question would we want this problem solved by someone less safety-conscious than us? If the answer is no, the framework still says attack, but with explicit safety infrastructure as a co-deliverable.

Brute force is no longer the default. The brute-force-then-elegance pattern that the framework celebrates is correct for benign problems and dangerous for attacker-favoured ones. A brute-force demonstration of an offensive capability without defensive infrastructure is the canonical bad outcome.

Demonstration value changes sign. For benign problems, demonstrating possibility is what unlocks the elegant phase. For attacker-favoured problems, demonstrating possibility is what tells malicious actors the capability exists. The 2024–2025 generation of papers describing how to bypass AI safety guards is the working illustration.

The asymmetry problem

Solutions to most attacker-favoured problems are easier to weaponise than to defend against. This is not a moral judgement; it is a structural feature of the underlying technology. A single attacker can choose the weakest defence; a defender has to be strong against every attack vector.

Three implications follow.

The framework should bias toward defensive infrastructure when the asymmetry is real. The investment that produces broad defensive capability dominates the investment that produces narrow offensive capability, even when their direct values look similar.

Coordination among legitimate actors matters more for dual-use problems than for benign ones. The framework's ordinary individualism — founders, investors, researchers each making their own calls — is insufficient. Dual-use problems benefit from shared safety frameworks, joint governance and coordinated disclosure norms.

The opposite of publish everything is sometimes correct. The framework's normal preference for open release, open data and open methodology is right for most problems and specifically wrong for the worst dual-use cases. Selective disclosure, security-by-design and information controls are not betrayals of the framework; they are the framework's modification for dual-use cases.

Decision rules

When facing a problem in a potentially dual-use domain:

Classify the problem as benign-default, defender-favoured, attacker-favoured or symmetric. The classification is partly about the technology and partly about the ecosystem in which it would deploy.

For attacker-favoured problems, the framework's standard reading inverts on cascade and demonstration. Slow down. Check what the safety infrastructure looks like. Ask whether the project should be deferred until the verification, governance and disclosure infrastructure exists.

For symmetric problems, evaluate the safety infrastructure as part of the project's scope. A symmetric project shipped with strong governance is a different bet from the same project shipped without.

For defender-favoured problems, the standard framework applies, with a positive weighting for the defensive cascade. Several of the most important problems for 2026 allocators are defender-favoured — biosecurity surveillance, AI red-teaming and evaluation, cyber defence, climate monitoring — and these should be funded more aggressively than they are.

In all cases, ask the asymmetry question: would the solution benefit us more than it benefits an adversary, or vice versa? If the answer is unfavourable, redesign the project.

Institutional gaps

The 2026 institutional landscape for dual-use work has gaps the framework should name.

The academic AI-safety community is real but small relative to the capability frontier. Most large frontier labs have safety teams but they vary widely in scope, autonomy and resourcing.

Government biosecurity surveillance is uneven and underfunded relative to the genuine risk profile. Several of the post-COVID commitments have decayed. The framework would direct substantially more public capital here than is currently flowing.

Cybersecurity research has a healthy disclosure culture in some communities and a weaker one in others. The framework's preference for coordinated disclosure norms aligns with the better existing practice but not all of it.

Most public R&D funding agencies do not have explicit dual-use weightings in their evaluation criteria. Adding them would be a low-cost, high-leverage policy change.

Most foundations and most venture capital funds have no explicit dual-use framework. Several are making implicit calls; the framework's contribution is to make the calls explicit and arguable.

Practical advice by audience

If you are a founder working on a potentially dual-use technology, the most useful exercise is to write a paragraph naming the misuse pathway, the speed at which an adversary could exploit it once disclosed, and the defensive infrastructure your work depends on. If the paragraph is honest and uncomfortable, the framework reading on the project is sharper than the standard one.

If you are an investor in dual-use categories, price the dual-use risk explicitly into your underwriting. Most current term sheets do not. The marginal risk premium is small at investment time and large at exit.

If you are a public funder, differentially fund the defensive and verification side of the dual-use frontier. This is the highest-leverage public-capital allocation the framework currently identifies.

If you are a researcher working in the frontier areas above, treat the disclosure question as a first-class research-design choice rather than a publication afterthought. The framework's modification for your work is mostly about how and when to publish, not whether.

What this section is not

It is not a complete theory of catastrophic risk. The literature on existential risk (Ord, Bostrom, MacAskill, the various Future of Humanity Institute and Center for Human-Compatible AI traditions) is substantially larger and more specialised. This section borrows the relevant insights and adapts them for problem-allocation use; it does not attempt to replace them.

It is not a counsel of paralysis. Most problems remain benign-default and the framework's normal aggressive verdicts apply to them. The dual-use modification kicks in for a specific small set of cases. The cost of being wrong about which case you are in is asymmetric — wrongly classifying a benign problem as dual-use slows useful work; wrongly classifying a dual-use problem as benign produces the harms the framework is meant to help avoid. The framework prefers the first error to the second when in doubt.

It is not a substitute for moral reasoning. The framework can identify which problems carry dual-use risk; it cannot tell you what to do when the political, commercial or scientific incentives press toward the wrong action. That decision remains with the allocator, advised by their values and their conscience.

The dual-use reading is itself scenario-conditional. Whether a particular biotechnology is defender-favoured or attacker-favoured depends on assumptions about adversary capability, surveillance infrastructure, governance trajectories and the rate at which the relevant tooling diffuses. Allocators reasoning about catastrophic-risk problems should construct an explicit small set of scenarios — at minimum a capable defender, slow diffusion scenario and a weak defender, fast diffusion scenario — and check whether the framework's classification is robust across them. A bet that is defender-favoured in one scenario and attacker-favoured in another is a directional bet on the underlying scenario, not a category-stable conclusion. The Pierre Wack tradition (introduced in Intellectual lineage) is the natural discipline; the dual-use chapter is one of the places the framework is most exposed to single-scenario error.

— Siri Southwind

Read the framework · Framework dimensions · Anti-patterns · Intellectual lineage

Reading list

Reading list

The reading list for someone who wants to think the way the framework asks. It is opinionated. It is not exhaustive. Several works that everyone in the relevant fields knows are deliberately omitted because they overlap with shorter, sharper alternatives. Several works that are unfashionable are included because they have aged better than their fashionable contemporaries.

Five tiers. Most readers do not need all five. The first tier is the minimum; the rest is by interest and by the field you actually allocate in.

Tier 1 — The essentials

If you read nothing else from this list, read these.

Richard Hamming, You and Your Research (1986). A talk delivered at Bell Communications Research that has shaped more careers than most academic books. The founding question of the framework: what are the most important problems in your field, and why aren't you working on them? Available widely online.

David Hilbert, Mathematische Probleme (1900). The Paris address that listed twenty-three open problems in mathematics and shaped a century of mathematical work. The founding act of explicit problem-allocation thinking. Read it for the gesture more than for the specific problems.

Nick Bostrom, Superintelligence (2014), particularly the chapters on differential technological development. The most direct intellectual ancestor of the framework's dual-use modification and one of the cleanest formal arguments for choosing which problems get attacked when.

Toby Ord, The Precipice (2020). A serious treatment of catastrophic risk that supplies the moral framework the present framework leaves to other authors. The chapter on policy responses is particularly useful for anyone in a position to allocate public capital.

Philip Tetlock and Dan Gardner, Superforecasting (2015). The discipline of calibrated probability estimation, made readable. The framework's scoring scheme borrows the methodology directly. Read this before you start scoring anything.

Andrej Karpathy, State of GPT and adjacent essays (2023–). Not a single text but a body of essay-and-talk material that captures the practical shape of the AI cost-trajectory better than anything published in book form. Available on YouTube and his blog.

Pierre Wack, Scenarios: Uncharted Waters Ahead and Scenarios: Shooting the Rapids (Harvard Business Review, September–October and November–December 1985). The two essays that turned scenario planning from a Royal Dutch Shell idiosyncrasy into a recognised discipline. Read both. The first essay is the definitive statement of what scenarios are not (forecasts) and what they are (mental-model expansion); the second is the more practical companion. Together roughly thirty pages and worth more than most books on uncertainty.

Tier 2 — The intellectual foundations

These are the texts the framework's vocabulary is built on.

Karl Popper, Conjectures and Refutations (1963). The argument that knowledge-seeking is fundamentally problem-solving, and that the choice of problem is the seat of progress.

Imre Lakatos, The Methodology of Scientific Research Programmes (1978). The progressive-versus-degenerative distinction that informs how the framework reads cascade value.

Thomas Kuhn, The Structure of Scientific Revolutions (1962). The paradigm-shift framework that the present framework defers to for the questions it cannot answer.

Theodore Wright, Factors Affecting the Cost of Airplanes (1936). The original observation behind the framework's cost-trajectory dimension. Short, surprising and still under-cited.

Claude Shannon, A Mathematical Theory of Communication (1948). The founding document of information theory and one of the most consequential single papers in the history of science. The framework's scoring scheme, its assumption that allocation decisions can be characterised quantitatively even crudely, and most of modern AI's vocabulary all descend from this paper. Available freely online from Bell Labs.

Carlota Perez, Technological Revolutions and Financial Capital (2002). The installation-and-deployment phase distinction is one of the most useful single concepts available for thinking about the why now on any technology bet.

W. Brian Arthur, The Nature of Technology (2009). The combinatorial view of how technology accumulates. Pairs unusually well with Carlota Perez.

Paul Romer, Endogenous Technological Change (1990). The economics paper that grounds the framework's claim that which problems are attacked is itself a determinant of growth.

Peter Schwartz, The Art of the Long View (1991). The most accessible practitioner's introduction to scenario planning, written by Wack's successor at Shell and the founder of Global Business Network. Read after Wack's HBR essays; Schwartz turns the discipline into something an operator can actually run. The chapters on driving forces, predetermined elements and critical uncertainties are the heart of the book.

Kees van der Heijden, Scenarios: The Art of Strategic Conversation (1996). The methodologically rigorous treatment of scenario planning, also from inside the Shell tradition. Where Schwartz is accessible, van der Heijden is precise. The most useful single book on how scenarios should integrate into ongoing strategic decision-making rather than living as one-off exercises.

Tier 3 — Practitioners and operators

Books written for people who actually allocate.

William MacAskill, Doing Good Better (2015). The Effective Altruism cause-prioritisation framework, accessible. The framework here is essentially MacAskill's IT-N with a fourth term for time.

Tyler Cowen, The Great Stagnation (2011) and Stubborn Attachments (2018). The arguments about declining research productivity that shape several of the framework's institutional critiques.

Avinash Dixit and Robert Pindyck, Investment under Uncertainty (1994). The real-options textbook the framework's mathematical machinery is borrowed from. Read selectively; the introductory chapters carry most of the weight.

Patrick Collison, "Questions for Science" (2018). A short blog post that asks better questions about science funding than most book-length treatments. Available at patrickcollison.com.

Bret Victor's essays, particularly Inventing on Principle (2012). Not directly about the framework but the right register of thinking — concrete, specific, willing to be wrong.

Stuart Russell, Human Compatible (2019). A book-length treatment of AI alignment from one of the field's founders, written for non-specialists.

Nassim Nicholas Taleb, Antifragile (2012) and The Black Swan (2007). The modern reference on heavy-tailed distributions, convex payoffs and antifragile design. The framework's moonshot logic and its reusability of by-products dimension are both Talebian moves. Antifragile is the more relevant of the two for allocators; The Black Swan is the more famous. Read either; read the long footnotes.

Adam Kahane, Solving Tough Problems (2004) and Transformative Scenario Planning (2012). Kahane facilitated the Mont Fleur scenarios for South Africa in 1991–92 and has since taken the technique into a long series of national-scale and corporate-scale tough-problem facilitations. The case material — Colombia, Guatemala, Cyprus — is the strongest available evidence that scenario planning works above the level of a single firm. Pairs unusually well with van der Heijden.

Tier 4 — Domain-specific

Pick what matches the field you allocate in.

Biology and biotech. Siddhartha Mukherjee, The Gene (2016). Carl Zimmer, Life's Edge (2021). For the longevity-specific case, David Sinclair's work is widely cited but contested; read with the framework's direct value inflation anti-pattern in hand. The Cell and Nature commentaries on AlphaFold's impact (2021–2024) are the working literature for the cascade case.

Physics. Sean Carroll's blog and books are the most accessible serious treatment. For the LIGO and gravitational-wave story, Janna Levin's Black Hole Blues (2016) is excellent. For the philosophical framing of physics' current trajectory, Sabine Hossenfelder's Lost in Math (2018) is sharp and contested.

Materials and chemistry. Cesar Hidalgo, Why Information Grows (2015), for the information-theoretic view of materials economies. The Materials Project's own published reports on materials databases are the working literature.

Climate and energy. Vaclav Smil's books, particularly Energy and Civilization (2017), for the systems-level view that most climate writing lacks. David MacKay's Sustainable Energy — Without the Hot Air (2009) remains the cleanest quantitative treatment of the energy-transition arithmetic, available free at withouthotair.com.

Artificial intelligence and computing. Russell and Norvig, Artificial Intelligence: A Modern Approach, for the textbook foundations. Karpathy's essays (above). For the safety side, Paul Christiano's blog and the broader Alignment Forum (alignmentforum.org). For the policy side, the various Open Philanthropy AI cause reports.

Economics. Joel Mokyr, A Culture of Growth (2016), for the cultural and institutional substrate that makes problem-solving possible. Daron Acemoglu and Simon Johnson, Power and Progress (2023), for a contrarian take on technology's distributional consequences.

Defence and security. Eliot Cohen, Supreme Command (2002). For the AI-and-warfare angle, Paul Scharre, Four Battlegrounds (2023). For the more analytical treatment of strategic stability, Lawrence Freedman's longer-form work. Herman Kahn, Thinking About the Unthinkable (1962), as the founding act of disciplined scenario construction in the strategic-defence domain — uncomfortable to read, but the discipline that the corporate scenario tradition descends from.

Tier 5 — Short pieces, talks, blog posts

The literature where most of the practical thinking actually happens.

Hamming's "You and Your Research" (above) is the canonical short piece.

Patrick Collison and Tyler Cowen, We Need a New Science of Progress (2019, The Atlantic). The short essay that named the field of progress studies and is shorter and sharper than most of the books that followed it.

Sam Altman's blog posts during the 2014–2018 period, particularly the How to Be Successful essay (2019). Mixed quality but the timing-and-allocation thinking is unusually visible.

The Astera Institute and Convergent Research published material on Focused Research Organisations (FROs). The clearest practical treatment of the framework's brute-force-then-elegance pattern as an institutional design.

Slime Mold Time Mold's series on chronic disease and obesity (slimemoldtimemold.com). Idiosyncratic and contested, but a good example of curiosity-driven public scholarship that the framework would otherwise miss.

Various Open Philanthropy cause prioritisation and worldview investigations reports. The most rigorous applied IT-N analysis publicly available.

Selected ARIA, ARPA-H and DARPA programme announcements. Reading what these institutions choose to fund, and how they explain the choice, is one of the best practical exercises in problem-allocation reasoning.

Tier 6 — What I have not included and why

A few omissions worth naming explicitly.

The popular AI-doom literature (Yudkowsky's various essays, the more catastrophist parts of the rationalist canon) is omitted not because it is wrong but because it does not make the framework's calls easier to make. Read Bostrom and Ord; if you want more, the Alignment Forum is open.

The popular biotech-singularity literature (Kurzweil, the more enthusiastic longevity writing) is omitted because it triggers the framework's direct value inflation anti-pattern more often than it helps.

Most strategy books written by management consultants are omitted because they generally describe what successful firms did rather than how successful firms decide. The framework here is about deciding.

Most popular economics books written for non-economists are omitted because they overlap with the more rigorous treatments above and add little that survives serious scrutiny.

How to use this list

If you have an afternoon, read Hamming and one other Tier 1 piece. The framework's working philosophy will be visible.

If you have a fortnight, read all of Tier 1 and the Carlota Perez and Brian Arthur from Tier 2. You will then have most of the vocabulary the rest of the repository uses.

If you have a sabbatical, work through Tier 2 in order, then Tier 3, then Tier 4 in your domain. By the end you will have a sharper reading of your field's allocation patterns than ninety per cent of the people in it.

If you allocate any meaningful capital, the reading list itself is part of the allocation. The hours you spend on these texts compound across every subsequent decision. The framework rates this time as among the highest-leverage time available in 2026.

— Siri Southwind

Read the framework · Intellectual lineage · Anti-patterns

One-pagers (by audience)
Coders & AI builders

Coders & AI builders

You can already do almost anything. The question is what to do.

The cost of writing software, training a model, generating an image, parsing a corpus, classifying a million examples, simulating a system — all of it is collapsing. Most of the people you work with have not quite registered what this means. They are still picking problems by what is locally tractable, by what their teammates are excited about, or by what their last role conditioned them to think is hard.

You should not be doing that. The framework below is for people who can build almost anything and need to choose what.

The thesis

Problem selection is now the dominant variable in technical work. What you build matters more than how well you build it, because the floor on "how well" keeps rising for free and the ceiling on "what" is determined by your taste.

Three categories worth distinguishing.

Soon-to-be-trivial. The problem will be solved cheaply by AI, by infrastructure, by someone else, or by all three, within twelve to twenty-four months. Most enterprise NLP work between 2017 and 2022 lived here. Some current ML pipelines do too. Working on these is a tax on your time.

Soon-to-be-tractable. The problem is just out of reach today. The next generation of models, the next drop in compute cost, or one missing piece of infrastructure will tip it over. Working on these now is the highest-leverage move available, because you arrive at the demonstration and the cascade just as the consensus admits the problem is solvable.

Stubbornly hard. The problem will not yield to current trajectories. Either the data does not exist, the verification is intractable, the coordination is the bottleneck, or the underlying physics will not move. These are worth attacking with eyes open: the by-products had better be valuable on their own.

The job is to spend your time in the second category and ruthlessly avoid the first.

How to tell which is which

Three questions, in order.

What input dominates the cost? If it's compute, scaling laws give you a forecast. If it's data, ask whether the data already exists somewhere or whether someone has to make it. If it's expert judgement, ask whether AI is closing in on the judgement and whether the verifier is cheap. The dominant input is your forecast.

What does the demonstration look like, and who needs to see it? If a working version of the system would shift a serious investor's view of the field by a measurable amount, you are probably in the second category. If a working version would just be a nicer version of something that exists, you are probably in the first. The demonstration value is the unlock.

What does the by-product look like if you fail? If the data, infrastructure, methodology or talent you generate are valuable independent of whether the headline succeeds, the bet is much better than the headline implies. If a failure would leave nothing behind, raise the bar.

Where to look right now

Without committing to specifics — see Current bets for the live list — the cracks worth examining are:

Problems that look like search but are really taste — agents that have to decide which of ten thousand options matter, and where the ground truth is implicit. Problems with rich verifiers and weak generators — places where you can score outputs cheaply but generating good ones is hard. Problems where the data exists but is locked up by friction — institutions, formats, attention. Problems sitting upstream of regulated domains where most builders won't go because the politics scare them. Problems where the brute-force version is twenty engineers for two years and the elegant version is impossible without it. Problems with closing windows — knowledge being lost, ecosystems collapsing, witnesses dying — where attacking now buys evidence that won't exist later.

Notice what is not on this list. Yet another wrapper around a foundation model. A mildly-better classifier on a tabular dataset. Re-implementing what an off-the-shelf API will do for ten dollars by next quarter. The floor keeps rising; do not stand on it.

A starting menu of framework-high engineering problems

The 50 possibilities list contains a clear cluster for engineers and AI builders: AI red-teaming and evaluation infrastructure (6), mechanistic interpretability (7), held-out evaluation benchmarks (8), memory-safe rewrites of critical software (9), post-quantum migration tooling (10), open medical-imaging foundation models (25), privacy-preserving ML at production scale (26), formal-verification toolchains (27), reversible compute substrates (44), Lean and Mathlib expansion (47), generalisation benchmarks for autonomous robots (48). Each of these is a place where the gap between the framework's call and current allocation is largest, and where good engineering is the binding constraint rather than science or capital.

The standard you should hold yourself to

Two questions, repeated weekly.

Is the problem I'm working on closer to the second category or the first? Be honest. If the answer is "first, but it's funded," that is a fact about the funding, not the problem.

If a smart friend in a different field looked at my project, would they immediately understand why now? If the answer is no, the timing argument is probably broken and you should rebuild it.

The framework will not tell you what to work on. It will tell you when your reasons are weak, when the consensus is wrong, and when the by-product of being early is more valuable than being right. That is most of what good problem selection looks like in practice.

The world has more important problems than it has people who can pick them. You are one of the people who can. Pick well.

— Siri Southwind

Read the framework · Open questions · Current bets · 50 possibilities

Investors & funders

Investors & funders

The consensus is mispricing problems. That is the trade.

For most of the last fifty years, capital allocators competed on selection within accepted asset classes, accepted theses and accepted timelines. The deal flow was the deal flow; everyone saw the same companies. The edge came from picking better among them.

That world is shifting. Cost-to-solve curves in compute, biology, materials, energy and software are now steep enough that the timing of when a problem becomes commercially solvable is the dominant variable in many investment outcomes. Allocators who think about timing explicitly will out-perform those who do not, by margins that compound over a decade.

The trade in one paragraph

Every problem in the world is an option whose strike price — the cost to solve it — is changing over time. Most options are getting cheaper to exercise. A few are getting more expensive (knowledge being lost, ecosystems collapsing, witnesses dying). The market for attention to these options is illiquid, biased toward fashion and full of mispricings. Differential Problem-Solving is the discipline of identifying the mispricings and acting on them with patient, portfolio-shaped capital.

Where the mispricings live

Three patterns recur.

Curves the consensus has not noticed. A specific underlying input — sequencing, gene synthesis, satellite launch, simulation hours — is on a steep cost-decline curve, and the businesses built on it are still priced as if the curve were flat. The classic example is sequencing companies in 2008–2012. The current examples are several, sitting in custom biology, energy storage and certain corners of robotics.

Cascade dependencies that are about to fire. A foundational problem is solved or about to be solved. Downstream problems that depended on it are not yet repriced. AlphaFold did this for protein-dependent sub-fields. Foundation models did it for several application categories. The arbitrage closes within months once the cascade becomes visible; the window is real.

Closing-window problems. Some problems get more expensive over time, not less. Knowledge in fragile institutions, evidence in collapsing ecosystems, manuscripts in unstable archives, indigenous languages with three speakers left. These are mispriced in the opposite direction — the consensus assumes they will always be there. Funding them now is essentially buying an option that will not exist later.

What this looks like in practice

A portfolio thesis built on Problem Timing has three properties most existing portfolios do not.

It is time-shaped, not just sector-shaped. The sector in which a problem sits matters less than the curve it is on. Two biotech companies can be on opposite sides of the framework if one rides a steep input cost decline and the other does not.

It is deliberately heavy in the second category. Problems that are just out of reach today and will be tractable in eighteen to thirty-six months. These are the bets where being early is correctly priced as edge rather than as risk. The first category — already-tractable problems — is more crowded and the floor under returns is rising as the work becomes more easily replicable.

It is built to absorb variance. The asymmetric-payoff dimension of the framework justifies portfolio strategies that a normal-distribution decision-maker would reject. A portfolio of twenty Problem-Timing bets, with twenty-per-cent expected hit rates and ten-times outcome distributions, dominates a portfolio of five conservative bets with eighty-per-cent hit rates and two-times outcomes. Most allocators understand this in theory and resist it in practice.

What the framework gives you

Four practical inputs to a deal decision.

A cost-trajectory forecast for the dominant input behind the company. This is the single most useful number you can generate before committing capital, and almost no decks supply it.

A cascade map of what becomes solvable if this company succeeds. The framework's cascade-value dimension is essentially a discounted-cash-flow on the next generation of companies that will exist because of this one. Most term sheets ignore it.

A neglectedness check. The Effective Altruism tradition's most useful contribution. If twenty other capable teams are attacking the same problem, the marginal return on your dollar is lower regardless of how big the prize is.

A demonstration-value premium. Some companies create value disproportionately by proving the category exists. That value accrues partly to them and partly to the cascade. Pricing it explicitly changes some valuations meaningfully.

What it does not give you

It does not pick your team for you. It does not reduce the importance of judgement, taste or relationships. It does not substitute for due diligence on the technical risk or the business model. It is not a calculator.

What it does is make the timing thesis explicit and arguable. Most term-sheet conversations have an implicit "why now" and almost none have an explicit one. Forcing the explicit version produces sharper bets, fewer of them, and a clearer story to tell your LPs.

Who should pay attention to this

Patient capital, broadly: foundations, family offices, sovereign funds, deep-tech investors, science-focused philanthropy, the new wave of long-horizon technology funds, and the small subset of generalist VCs who actually mean it when they say they fund hard things.

If you allocate other people's capital and your benchmark is one-year liquid markets, this framework will not help you. The mispricings live in time horizons that public markets cannot price.

On scenarios — and what gets through them

The implicit forecast inside any term sheet is the part most likely to be wrong in five years. Take any portfolio company and write down the three scenarios in which the bet could play out — modal, fast-cascade, deglobalisation-or-shock — and re-rate the bet under each. The bets that score well across all three are robust positions in the framework's preferred sense. The bets that score brilliantly in one and catastrophically in another are directional bets on a specific scenario, which is fine, but they should be priced as such. The scenario tradition in the lineage chapter (Wack, Schwartz, van der Heijden) is the explicit machinery; most LP frameworks are weaker than this implicitly demands. A fund that runs scenarios on its portfolio quarterly will catch the misalignments earlier than a fund that does not.

What the framework points at right now

A worked starting list lives in 50 possibilities. For an investor reading the dimensions, the entries that combine cascade, demonstration value and a steep cost curve — closed-loop geothermal (11), long-duration storage (12), HTS magnets (15), pan-cancer detection (19), engineered phages (20), whole-organ vitrification (23), open-data clinical interoperability (38), reversible compute (44), zero-gravity manufacturing (42), open protein-function models (45) — sit closest to the kind of bet a long-horizon fund can underwrite. The defender-favoured entries (1, 4, 5, 6, 8, 9, 10, 41) are usually a worse fit for VC and a much better fit for foundation and patient-capital allocators.

The ask

If you find this argument credible, the most useful next step is to take three current portfolio companies and three companies you passed on in the last twelve months and run them through the framework's dimensions. The conversation that produces is more valuable than the score, and the score will tell you something about your own systematic biases that the deal flow on its own will not.

— Siri Southwind

Read the framework · Historical examples · Markets and arbitrage · 50 possibilities

Founders

Founders

The thing nobody tells you about founding a company is that you can build almost anything competently and still fail because you picked the wrong problem at the wrong moment. Most founder failure is not execution failure. It is timing failure dressed up as execution failure after the fact.

The four timing positions

Every problem you might attack as a founder sits in one of four positions on the Tractability Frontier, and each requires a different kind of company.

Far too early. The technology will not exist, or will not be reliable enough, for five to ten years. Most "frontier tech" companies that fail quietly fail here. You become a research lab with a runway, the market does not arrive, and your investors lose patience long before the world catches up. The defensible version of this is to build the missing piece of infrastructure on the way to the eventual technology — sell picks and shovels that work today, not the gold mine that does not yet exist.

Just early enough. The technology is almost there. The next eighteen to thirty-six months will tip it over. You can build a working version now that the market is starting to want, and you will be the one with the demonstration when it does. This is where most great companies are founded. It is also where most founders systematically fail to position themselves, because the consensus calls these ideas "too early" right up until the moment it calls them "too crowded".

On time. The technology works, the market wants it, and twenty teams are racing. Execution and distribution are the differentiators. Many viable companies live here, but the marginal return on extreme founder talent is lower than people pretend. If you can pick a just early problem instead, you should.

Late. The problem has been solved. You are reselling someone else's commodity with a better wrapper. There is real money in this position but no leverage; you are running a business, not building a category. Be honest if this is what you are doing.

The job is to spend most of your time choosing problems in position two.

How to pick

Three questions are worth asking explicitly, before product-market fit, before the deck, before the round.

What input does this problem depend on, and where is its cost going? If your business is a thin layer on top of an input whose cost is collapsing, your moat is collapsing with it. If your business is the tool that bends a steep curve into useful product, you are well-positioned. The classic shape of a great founding bet is we are the company that will productise this curve.

What does the world look like the day after this works? If the answer is "a slightly better version of what already exists," the bet is too small for the founder pain it costs. If the answer is "an entire downstream category becomes possible," you are sitting on cascade value, and that is what makes a category-defining company.

Who else can do this, and why aren't they? If twenty equally-capable teams are attacking the same problem, the marginal return on your work is lower than the founder narrative suggests. The interesting bets are the ones the consensus has not yet caught up to — sometimes because the field is unfashionable, sometimes because the input curve is invisible to outsiders, sometimes because the demonstration has not happened yet and the conventional wisdom thinks the problem is intractable.

What founders consistently get wrong

A few patterns the framework helps surface.

Founders systematically over-rate direct value and under-rate cascade value. The thing the company does today matters less than the thing it makes possible tomorrow.

Founders systematically over-trust consensus tractability and under-trust their own read of the curve. If you genuinely believe a curve will break in eighteen months and the market believes it will break in five years, that is the bet. If you cannot articulate why your read differs, the bet is weaker than it feels.

Founders systematically under-diversify their by-products. The headline outcome ("we solve protein-folding for X") is what you pitch. The by-products (the data, the methodology, the talent, the infrastructure) are what you keep if the headline misses. Companies whose by-products are valuable independent of the headline outcome are much more durable than companies whose by-products are not. Build the company so that failure on the headline still leaves something behind.

The "why now" answer

Every good pitch has a why now. Most of them are weak. The framework's contribution is to make the why now a structured argument rather than a hand-wave.

A strong why now names the curve, names the input, names the inflection point and names the cascade. Something like: the cost of structured biology data is falling on a Wright's-law curve dominated by sequencing and protein synthesis; we are at the point on the curve where the bottleneck shifts from data acquisition to data interpretation; the company is positioned to be the interpretation layer, and once the cascade fires the demand for that layer compounds.

A weak why now says: AI is changing everything and we are using AI. This is currently passable in fundraising conversations and will not be in twelve months.

A starting menu of company-shaped opportunities

The 50 possibilities list contains roughly fifteen to twenty entries that are recognisably company-shaped today: heat-pump electrification of industrial heat (13), grid-software automation (14), engineered phage therapies (20), AI rare-disease platforms (24), open medical-imaging models (25), privacy-preserving ML (26), atmospheric water harvesting (30), CO2-to-fuels (32), open clinical-data infrastructure (38), AI tutoring with rigorous measurement (39), zero-gravity manufacturing (42), reversible compute substrates (44), drug repurposing for orphan diseases (46), generalisation benchmarks for autonomous robots (48) and kelp aquaculture at industrial scale (50). None of these is a guaranteed company; each is a place where the framework says the bet is more legible than the consensus is pricing.

What you should do this week

Take the three problems your company is closest to attacking. For each one, write a paragraph that names the dominant input, the curve, the cascade, and the closing window if any. If you cannot write that paragraph, the bet is not yet legible and you have more thinking to do. If you can, you have a better foundation than most founders ever build.

Then run the same exercise on the three companies in your space you most fear. The exercise is sharper when you do it on your competitors than on yourself.

— Siri Southwind

Read the framework · Brute force vs elegance · Current bets · 50 possibilities

Academics

Academics

Richard Hamming's question is the founding text of this framework.

What are the most important problems in your field? Why aren't you working on them?

Hamming asked it in 1986 and most fields still have not answered. The question is harder now than when he asked it, because important and tractable have come apart. Many of the most important problems in your field are now tractable in a way they were not five years ago. Many of the problems your community treats as central are about to be commoditised by tools that did not exist when their importance was first decided.

This is a one-page case for taking the question seriously again, with the modern tractability landscape in front of you.

The two failure modes academia is unusually exposed to

The first is path-dependence. You have invested years building the methods, the relationships, the funded grants and the reputation around a particular kind of problem. Switching to a different kind of problem is expensive in a way that is not visible from the outside. The result is that even when the cost of the right problem has dropped to nothing, the cost of you switching to it is still very high. Most fields do not move because most researchers cannot afford to.

The second is consensus drag. Conferences, journals and grant panels reward incremental work on accepted problems. The marginal return on producing a slightly better answer to last decade's question is real and high. The marginal return on switching to a problem the consensus has not yet recognised is theoretically higher and practically punishing. The system is calibrated to the first kind of work, not the second.

Both failure modes are individually rational and collectively expensive. The framework cannot solve them but it can make the cost legible.

What the framework offers

A vocabulary for the why now question. Why is this problem tractable now in a way it was not five years ago? Which input cost or complementary technology has shifted, and how confident are we in the trajectory? These are questions any serious research proposal could answer in a paragraph; few do.

A cascade map. Some problems sit upstream of many others. Crystallography sat upstream of AlphaFold. Sequencing infrastructure sat upstream of the GWAS revolution. Foundation models sit upstream of much of current applied AI. Identifying upstream problems and being willing to work on them — even when their direct outputs are unglamorous — is a high-leverage move that the framework legitimises.

A defence against premature elegance. Most fields have a folk tradition that elegant theoretical work is more valuable than brute-force empirical work. The framework's brute-force-then-elegance pattern argues the opposite in many cases: an ugly demonstration that a problem can be attacked at all is often more valuable than another elegant treatment of an already-attackable one.

A closing-window argument for fieldwork, archives, oral history, observational astronomy, and any other domain where the evidence is degrading. These are systematically underfunded by frameworks that assume costs only fall over time.

A diagnostic, in three questions

Take the problem you are most likely to publish on next year and ask:

If a competent team in industry decided this problem was worth attacking with current tools, how long would it take them? If the answer is months, your contribution is in the framing or the rigour, not in the result. Be honest about which.

What does my work make possible that does not exist without it? Direct results are the answer most researchers give. Cascade and demonstration value are usually missing from the answer. They should not be.

If I had to defend why I am working on this rather than the most important problem in my field, what would I say? Most defences come down to path-dependence and consensus. These are reasons. They are not always the right reasons.

The framework is not a moral indictment of working on what you work on. Some path-dependence is rational, some consensus is well-founded, and not every researcher should be chasing the highest-leverage problem at all times. But the question is worth asking honestly, and most fields ask it too rarely.

A note on the role of curiosity-driven research

The framework would underrate, if applied naively, a lot of curiosity-driven work whose cascade value emerges decades later or not at all. This is a known limit. The recommended portfolio share for unallocated curiosity-driven research — research governed only by quality of the people and absence of obvious harm — is somewhere in the ten-to-twenty per cent range of any serious public funding portfolio, and is probably underfunded today. The framework should be used to direct purposeful allocation, not to crowd out the unguided minority that produces a disproportionate share of breakthrough results.

What to do this week

If you have grant capacity for the next five years, take the three biggest problems in your field that are just out of reach today and write a paragraph for each on what would tip them into reach and on what timeline. Then ask whether your current research programme is positioned to ride those tips when they arrive.

If you supervise students, the question to ask them is Hamming's, with the timing dimension added: what is the most important problem in the field, on what curve does it sit, and why are we not the team to attack it now? The question is uncomfortable. That is the point.

— Siri Southwind

Read the framework · Intellectual lineage · Open questions · 50 possibilities

Policymakers

Policymakers

Public R&D money is allocated by inertia. This is not a polemical point; it is a structural one. National science budgets are dominated by continuation grants, departmental fairness, lobbying intensity and prior-decade priorities. The institutions are large, the timelines are long, and the political risk of stopping a programme almost always exceeds the political risk of starting one. The result is a portfolio that drifts further from optimal each year as the underlying tractability landscape moves faster than the funding does.

The argument here is not that public funders should imitate venture capital. They should not. Public funders have specific advantages — long horizons, low coordination cost between projects in a single portfolio, the ability to build durable institutions, comfort with non-financial returns — and the framework's job is to help them use those advantages more deliberately rather than to mimic a different model.

Where public funding is best in class

Three categories of work are systematically under-supplied by markets and well-suited to public funders.

Demonstration projects whose value is mostly cascade. The Human Genome Project is the canonical case. No private actor would have funded the first reference sequence at three billion dollars; the social return on capital is enormous and most of it accrues to other people working on later problems. The framework's brute-force-then-elegance pattern is essentially a description of the project type that public capital exists to fund.

Closing-window fieldwork, archives and observation. Endangered languages, fragile ecosystems, manuscript collections in unstable institutions, long-running cohort studies, large-area sky surveys. The cost-trajectory of measurement is collapsing; the supply of evidence is degrading on its own clock; the cascade value to future researchers is enormous and impossible to capture commercially. These are exactly the projects markets do not fund and exactly the projects civilisations should.

Foundational infrastructure for fields that do not yet have customers. ARPANET before there was an internet economy. The Protein Data Bank before there was a commercial use for protein structures. PubMed and NCBI before computational biology was a discipline. National measurement standards. The patient build of the substrate that later commercial activity sits on.

These three categories absorb roughly the fraction of public R&D spending that the framework most strongly endorses. The portfolio share allocated to them in most national systems is too small.

Where public funding consistently underperforms

Three categories of work are systematically over-supplied by public funders and produce returns that the framework would rate weakly.

Imitation programmes. A national ambition to "have a domestic AI champion", "build a national semiconductor industry", "produce a domestic biotech sector" funded as a strategic-imitation bet rather than as comparative-advantage analysis. Most of these programmes underperform because the imitating country lacks one or more of the conditions that produced the original. The framework's neglectedness and cost-trajectory dimensions would flag this consistently if applied.

Late-stage commercialisation. Public capital used to subsidise the deployment of technologies whose marginal cost is already on a clear declining curve and whose private capital is plentiful. Most large-scale solar and battery deployment subsidies fall here. Some defensible on transition-speed grounds; many not.

Coordination-cost monsters. Programmes whose technical core is sound but whose institutional architecture imposes coordination costs greater than the technical value. The NHS National Programme for IT, several large EU framework programmes, multiple European space-launch consortia, many large-scale cross-departmental modernisation efforts. The framework's coordination cost dimension should be load-bearing in every public-funding decision and is rarely treated as such.

What the framework would change

If a national funder ran their portfolio through the framework honestly, the directional changes would be substantial.

A larger fraction of capital would flow to closing-window projects — language documentation, archive digitisation, ecosystem monitoring, long-running cohorts, oral-history capture. These would not pass conventional cost-benefit tests because their value is in cascades that are hard to model; they pass the framework's tests because the alternative is permanent loss of evidence.

A larger fraction would flow to foundational infrastructure — open canonical datasets in legally complex domains, public-good measurement standards, the slow patient cataloguing efforts that compound for a century. Most science funders find this difficult to fund because it is unglamorous; that is precisely why the marginal public dollar produces so much more here than in fashionable alternatives.

A smaller fraction would flow to strategic-imitation programmes. Sovereign-AI-clones, sovereign-cloud projects, sovereign-fab subsidies that lack a coherent comparative-advantage thesis. Some subset of these is worth funding for genuine strategic-autonomy reasons; the framework helps separate that subset from the political-imitation residual.

A larger fraction would flow to replication and verification of foundational findings in social and biomedical science. The replication crisis is a textbook framework failure: a field cascading on findings that have not been independently verified is in a worse position than a field with fewer findings honestly verified.

A larger fraction would flow to deliberately unallocated curiosity-driven research, governed only by quality of the people and absence of obvious harm. Ten to twenty per cent of any serious public R&D portfolio should be unallocated to specific problems — the framework's explicit acknowledgement that unknown unknowns are real and important.

Where the framework would direct unusually high allocation

The 50 possibilities list flags the entries that public capital is almost uniquely placed to fund: standing pathogen surveillance (1), pan-family vaccines (2, 3), far-UV-C deployment (4), universal nucleic-acid synthesis screening (5), AI evaluation as public infrastructure (6), memory-safe rewrites of critical-infrastructure code (9), post-quantum cryptography migration (10), ocean carbon-removal MRV (33), open scientific-reproducibility infrastructure (34), pre-extinction microbiome and ice-core archives (35, 36), civilisational food and water reserves (37), auditable open-source voting infrastructure (40), asteroid characterisation (43) and engineered biocontainment (49). Each is defender-favoured, public-good in form, and currently funded at a fraction of what the framework would imply.

How to operationalise this

Three practical changes would move a national funder substantially in the right direction without requiring legislative change in most jurisdictions.

Add a structured why now requirement to every grant proposal above a threshold. Identifying the dominant input cost trajectory, the cascade map, the closing-window argument if any, and the demonstration value. A paragraph each. Most proposals would fail this exercise on the first attempt. Forcing the question is half the work.

Run a periodic retrospective stupidity index. Pick the twenty largest projects funded ten years ago and rate them through the framework with hindsight. The exercise is uncomfortable and useful. Most agencies have never done it. Doing it once would change funding cultures.

Reserve a portfolio share for closing-window and foundational-infrastructure work that is procurement-light and panel-light. The grant-shape problem named in Anti-patterns is real: standard panels select against the kind of patient cataloguing the framework most rewards. Carving out a portion of the budget for fast, simple, longer-term commitments to high-reputation teams pays back far more than its share.

What the framework cannot fix

It cannot make pandemic preparedness politically popular. It cannot prevent the imitation impulse from continuing to drive sovereign-X programmes. It cannot remove the procurement-and-audit constraints that make public IT projects difficult. It cannot supply the institutional courage required to stop a programme that has been running for a decade.

It can, however, make the cost of these failures legible. A funder who has the framework in front of them can see, in writing, that they are choosing imitation over comparative advantage, choosing flat-curve work over steep-curve work, choosing visible ribbon-cutting over invisible cascade. The choice is then explicit. Sometimes the political logic will still win. Sometimes it will not.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Students

Students

The decisions you make in the next few years are unusually high-leverage. What you study, who you work with, which problems you spend your first decade on. These compound for a long time. A small framework that sharpens any one of those decisions pays back across the rest of your career.

The thesis

You are choosing what to spend the most expensive years of your life on. The expensive part is not the tuition; it is the opportunity cost of those particular years of attention.

Most career advice tells you to find your passion and follow it. The framework asks a different question: which problem, of all the problems in the world, would your specific combination of taste, talent and timing actually be the right person to attack?

Most fields you might enter are at different points on different cost curves. Some are saturated and the marginal contribution of one more talented person is small. Some are at an inflection where the right person walking in this year will compound across a decade. Some are genuinely closing — fields where the evidence base is degrading and the work cannot be done later. The framework is, in part, a tool for telling these apart.

Three questions to ask before committing

Is this field on a curve, and which way is it moving? If the underlying technology is on a steep cost-decline curve, the work in five years will be different from the work today. If the underlying technology is roughly flat, the field rewards depth of skill more than timing. Both are real positions. Know which one you are entering.

What does the field's bottleneck look like? If the bottleneck is data, who has it and how do you get access. If the bottleneck is talent, are you going to be one of the few. If the bottleneck is regulatory or institutional, are you the kind of person who can navigate that environment. Different bottlenecks reward different temperaments.

What is the demonstration value of being in this field at this moment? Some fields offer the chance to be among the people who showed something was possible. Others offer the chance to be among the thousandth person to do the well-understood thing. The framework rates the first kind of opportunity highly even when the immediate compensation is lower.

What this means in practice

The framework rates several categories of work as substantially better than the consensus credits.

Just-early fields with new tools. Fields where foundation models, automation, or new measurement infrastructure has changed what kinds of problems are tractable. Materials science, structural biology, climate adaptation, parts of the social sciences with new data, AI-augmented experimental science. Walking into a field at the moment its tools change is one of the strongest career bets available.

Closing-window fields. Endangered languages, manuscript studies, oral history, the cataloguing of fragile ecosystems, the scientific study of populations whose data is degrading. Unfashionable, undercredentialed, undervalued. The framework rates the time-spent here as among the highest-leverage time available, because the work cannot be done later.

Picks-and-shovels for fields you do not enter directly. Building the verification infrastructure, the open datasets, the measurement tools that the people who do enter the field will use. Less photogenic. More compounding.

The framework is more cautious about a few categories than the consensus is.

Fields whose recent successes were driven by tools the field did not build. Several once-elite specialisations are being commoditised by foundation models and automation. The work continues but the marginal contribution falls. Be honest about whether the field's prestige reflects current value or accumulated reputation.

Fields with severe replication or methodology problems. Several social-science and biomedical sub-fields are operating on findings that the next decade will probably not vindicate. Walking into one of these without a clear strategy for engaging with the replication problem produces difficult later positions.

Fields whose advocates promise that they will be huge. If everyone has been saying so for a decade, the timing is the issue, not the promise. Several long-promised fields will eventually deliver; few of them will deliver in your particular twelve-month window.

A simple discipline

Once a year, write a short paragraph for yourself answering: given what I know now, what is the most important problem in the field I have committed to, and what would I have to do to be the right person to attack it? This is the Hamming question, applied annually.

If you can answer the question, you are on a deliberate trajectory. If you cannot, you are drifting. Both are sometimes appropriate. Most students drift longer than they should.

What you should not do

Do not ignore your taste because the framework rates a different field higher. The framework is a tool for sharpening the question of which problems are worth attacking, not a replacement for the human judgement of whether a particular problem is yours to attack. Some of the highest-leverage work in history was done by people who were temperamentally unsuited to the theoretically optimal problem and chose the second-best one because they could actually do it.

Do not treat the framework as a calculator. The dimensions in Dimensions are a checklist for thinking, not a scoring engine that produces the right answer. Several of the most important career bets in the framework's historical record were made on the strength of one dimension while the others looked moderate.

Do not outsource the question. The most useful version of the framework is the version you have argued with yourself and disagreed with. Read the framework, then read the field guide that fits where you might go, then decide.

Where to start

The reading list at Reading list is the cheapest investment available. Hamming alone is enough for an afternoon's reflection. The field guide that fits your interest in field_guides/ gives you a working vocabulary for the timing question in your specific domain.

The world has more important problems than it has people who can pick them. You are at the beginning of the period in your life when you can actually pick. The framework is here to make the picking better.

— Siri Southwind

Read the framework · Reading list · Open questions · 50 possibilities

PhD applicants

PhD applicants

The single most important career decision a doctoral student makes is choosing the dissertation problem. Not the school, not the supervisor, though both matter. The problem. Five to seven years of your life will be spent on it. The institutional system will not steer you away from a problem that has already been solved, that is being commoditised, or that the next decade of technology will make uninteresting. You have to do that yourself.

The thesis

Most PhD problems are inherited. They come from the supervisor's existing programme, from the funding agency's last request-for-proposals, from the laboratory's ongoing work, from what is locally interesting at the moment of admission. None of these are guaranteed to align with what is globally interesting at the moment of completion.

The framework asks a simple question: of all the problems in your field that could be the foundation of your career, which one is on the right curve, has the right cascade, and will be the right thing to have worked on in 2032? The answer is rarely the most obvious problem in your laboratory.

Three filters before you commit

Will the tools that are coming change what kind of problem this is? The most consequential research moves of the past five years have come from fields where new tools (foundation models, AlphaFold-style protein prediction, automated experimentation, single-cell methods, gene editing at scale) have changed the structure of what counts as a tractable problem. If the new tools are about to land in your field, picking a problem the new tools cannot do is one mistake; picking a problem that becomes trivial with the new tools is another. The right move is to pick a problem that the new tools enable but does not solve for you.

Is the field replication-stable, or is it about to undergo a methodological correction? Several major sub-fields have unresolved replication problems whose resolution will redefine what counts as established knowledge over the next decade. Building your dissertation on a finding that the next decade will probably not vindicate is a hard career bet. The framework's replication trap in Anti-patterns is the relevant pattern.

Does this problem sit upstream of other problems? The framework rates cascade value highly because problems that, if solved, unlock many downstream problems compound differently from problems whose direct value is the only value. Asking your prospective supervisor what does my work make possible that does not exist without it is a sharper version of the standard what is the contribution question.

Choosing the curve, the lab, the problem — in that order

The conventional advice is to pick the supervisor first, the lab second, the problem third. The framework reverses this for PhD applicants in fields whose tractability is changing fast.

Pick the curve. What field, what sub-field, is at an interesting moment? Where are foundation models or new instrumentation about to change the structure of work? Where is the closing-window argument real? Read the field guides in field_guides/ and the current bets list in Current bets.

Pick the lab. Within the chosen sub-field, which laboratories are doing the work that will define the next five years? Not the most prestigious lab. The lab whose recent papers are using the new tools well, whose alumni are landing in interesting positions, whose supervisor is willing to let students work on the question of why now rather than only on the question of how to do this paper.

Pick the problem. Inside the chosen lab, the problem you negotiate with the supervisor should be the problem that is just out of reach today and will be solvable during your dissertation, not the problem that is solvable now or the problem that will not be solvable for fifteen years.

This ordering is non-standard and uncomfortable. Most prospective PhD students are not in a position to evaluate which curves are interesting; that is what the field guides and reading list are for. The framework's claim is that the work of evaluation, done before the application is sent, is among the highest-leverage time available to a prospective student.

What this looks like in different fields

If you are entering biology or biomedical research, the field is in mid-transition. Computational and AI-augmented work is changing what kinds of biological questions are tractable; experimental work in fields with closed-loop automation is also changing fast. The framework reads against pure traditional bench science in commodity sub-fields and toward computational-experimental hybrid work.

If you are entering computer science or AI, the path-dependence is severe. Foundation-model labs employ many of the people who would historically have done foundational research in academic settings. Picking an academic PhD in CS in 2026 means picking a sub-field where academic work is still where the action is — interpretability, certain theoretical questions, certain applied questions, alignment work that the labs cannot or will not do internally.

If you are entering physics, materials science, or chemistry, the framework rates the AI-and-experimental hybrid work highly. The labs running closed-loop autonomous experiments are doing work that pure-theoretical labs cannot match.

If you are entering social science or psychology, the replication-and-methodology question is unavoidable. The framework rates work that explicitly engages with the methodology problem above work that builds on contested findings.

If you are entering humanities, the framework's reading is unfashionable: the closing-window arguments for endangered languages, manuscript work, oral history and adjacent fields are real and the cascade with AI tools is large. This is the moment to do work that compounds for the long arc of human knowledge, even though the academic job market is brutal in those specific sub-fields.

What you should not do

Do not let prestige carry the decision. The most prestigious lab in an unfashionable sub-field can be a worse bet than a less famous lab at an inflection point.

Do not optimise narrowly for the academic job market. The work you do in your PhD will land you in a particular position; the position depends substantially on the field's trajectory over your dissertation period, not on the field's current trajectory. The framework's cost trajectory and tractability trajectory dimensions matter more than the current placement statistics.

Do not avoid the why now question because it makes you uncomfortable. If you cannot answer why is this problem worth attacking now, why has nobody attacked it yet, and why am I positioned to do it, you are about to spend half a decade on a problem you have not honestly evaluated.

A simple discipline before applying

Write a one-paragraph framework reading on each of the top three problems you are considering. Identify the curve, the cascade, the closing window if any, the demonstration value, and the by-products. If one of the three reads substantially better than the others, that is a strong signal. If all three read about the same, you are choosing on something other than the framework — which is sometimes correct, but you should know that is what you are doing.

The framework will not pick your dissertation for you. It will not tell you which combination of taste, talent and timing is yours. It will sharpen the question of which problems are worth attacking in your moment, which is most of what you can do before you start.

— Siri Southwind

Read the framework · Reading list · Anti-patterns · Field guides · 50 possibilities

Government science advisors

Government science advisors

The chief scientist's office, the science-and-technology committee staffer, the strategy unit inside a ministry. The people who advise governments on what to fund. You sit on the leverage point of substantial public capital, and the framework here is in many ways written with you in mind. The analytical work is similar to what your colleagues already do; the framework's contribution is sharpening the why now, the what should we stop, and the what should we be doing that we are not.

The thesis

Public R&D allocation drifts toward the median of consensus opinion. The political incentives reward continuation of existing programmes, even when the underlying tractability landscape has moved. The technical incentives reward depth of expertise in current programmes, even when adjacent programmes would be more productive. The result is a portfolio that is systematically off the optimum, with the gap growing year by year.

The framework will not solve this. It will, with discipline, narrow the gap. Specifically: it will help you defend funding choices that look unfashionable but are correct, defund choices that look prestigious but are obsolete, and identify whole categories of work that the system is structurally undersupplying.

Three structural opportunities you can act on

Closing-window infrastructure. Several categories of work have the property that the work cannot be done later: endangered-language documentation, fragile-archive digitisation, ecosystem and glacier monitoring, long-running cohort studies, the cataloguing of populations and species under stress. Public funding is essentially the only credible source of capital for these — they fail commercial cost-of-capital tests because their cascade value is diffuse and decades out, but the alternative is permanent loss of evidence. The framework rates these as among the highest-leverage public allocations available, and almost every science-funding portfolio underweights them. Adding even five per cent of the standard research budget to closing-window work is one of the cheapest large-impact moves a government can make.

Foundational measurement and data infrastructure. The Protein Data Bank, the Materials Project, PubMed and the various open canonical datasets are the substrate that subsequent commercial work runs on. The framework rates the public-good cascade of this infrastructure as enormous. Most public-funding systems undersupply it because the work is unphotogenic and the credit goes to the downstream users rather than to the funders. A direct policy move — carving out budget for foundational data infrastructure at the level of fundamental research — pays back across decades.

Replication and verification of foundational findings. The replication crisis in social and biomedical sciences is severe, well-documented and structurally hard for principal investigators to address (no career reward). Public funders are essentially the only actors who can credibly fund the replication work; almost none currently do at the appropriate scale. A small dedicated replication budget within each major funding stream pays back faster than most other interventions because it cleans up the basis on which everything else builds.

Three structural problems you can mitigate

The grant-shape problem. Standard grant mechanisms reward incremental, predictable work and select against the kind of brute-force-then-elegance projects the framework rates highly. Carving out budget for fast, simple, longer-term commitments to high-reputation teams (the FRO model, the ARIA-style commitments, the various OTA-equivalent mechanisms) creates room for the high-leverage work without restructuring the entire portfolio. The marginal effort to do this is small; the marginal return is substantial.

The committee compromise. Panels of decision-makers each preferring different problems converge on a portfolio dominated by what no member objects to. Adding explicit why now questions to evaluation criteria — the cost-trajectory of the dominant input, the cascade if successful, the closing-window argument if any — is a process change that does not require structural reform but materially changes which proposals win.

The imitation impulse. The political incentive to fund a national version of whatever the leading country is funding produces consistent misallocation. The framework's neglectedness and cost-trajectory dimensions, used as evaluation criteria for sovereign-X programmes, produce sharper portfolio decisions than the standard strategic-imitation logic. This is politically harder than the previous two but worth attempting.

Specific framework readings on current public allocation

A subset of categories where the framework's verdict diverges from current funding patterns:

Pandemic preparedness. Allocation has decayed since 2022. The framework's reading remains positive — the cascade if a future pandemic is detected and contained early is enormous, the marginal capital is well-leveraged, and the political tailwind is unfortunately weakening. Public funders are essentially the only actors who can sustain the capacity through the inter-pandemic period.

Methane reduction. The fastest-acting climate-mitigation lever for the next two decades and one of the most under-invested. Detection, agricultural-emission reduction, leak-prevention infrastructure. The framework rates this above many of the more photogenic climate categories.

Permitting and transmission infrastructure. The highest-leverage policy investment available in many jurisdictions. Not science-funding strictly, but the framework's reading is that the cascade from removing permitting friction dominates the cascade from many specific science-funding decisions.

Defender-favoured AI safety work. See the dual-use modification in Dual-use & catastrophic risk. The framework rates AI red-teaming, evaluation infrastructure, and verification tooling above several more glamorous AI-funding categories.

Sovereign frontier-model programmes. The framework's verdict on most of these is unfavourable; the cost-trajectory of frontier capability is collapsing fast enough that being twelve to twenty-four months behind buys little. Sovereign compute and sovereign deployment infrastructure are different bets with different verdicts.

F-35-shape sustainment commitments. The F-35 is the canonical case but the pattern recurs across major procurement. The framework's reading is that incremental sustainment of programmes whose strategic premise is from a previous era is among the most consistently misallocated public capital available.

On scenarios

The discipline you are most likely to know already and most likely to undervalue is scenario planning. The Pierre Wack tradition at Royal Dutch Shell, descended from Herman Kahn's RAND-era work and now embedded in IPCC and UNDP practice, is the explicit corrective to single-future thinking. The framework as currently written is single-scenario by default; the sharper version is to build a small set of plausible scenarios — three or four, differing on the variables that most plausibly drive the score — and run the framework's dimensions across them. A national-portfolio review that asks which of our bets are robust to all three scenarios will produce a different list of priorities from one that asks what is most likely to happen. The first list is what the framework most strongly recommends.

What the framework would have you push for

A worked starting list lives in 50 possibilities. For an advisor in your seat, the entries that combine cascade, defender-favoured dual-use and public-good shape — pathogen surveillance (1), AI evaluation infrastructure (6), mechanistic interpretability (7), memory-safe critical software (9), post-quantum migration (10), ocean carbon-removal MRV (33), reproducibility infrastructure (34), open clinical-data interoperability (38), formal-verification toolchains (27), engineered biocontainment (49) and asteroid characterisation (43) — are the ones an honest framework reading would push hardest into a national portfolio. The list is opinionated; specific entries will be wrong. The exercise of arguing them with your committee is closer to what the framework actually delivers than the entries themselves.

A small operational discipline

Once a quarter, take three of the largest active programmes in your portfolio and one programme that was rejected in the past two years. Run each through the framework's dimensions in Dimensions. The dimensions you find yourself defending against produce the sharpest internal arguments. The rejected programme that scores favourably under the framework is often the one worth re-opening; the active programme that scores poorly is often the one worth quietly winding down.

Once a year, run the retrospective stupidity index on a sample of programmes that completed five to ten years ago. The methodology is in Models & scoring. The exercise is uncomfortable. The improvement in subsequent funding decisions is real and visible.

What this framework will not do for you

It will not give you the political cover to make the decisions it implies. It will not solve the procurement-system constraints that produce most of the misallocation in defence and large-infrastructure programmes. It will not tell you when to override its own verdict because the social or political stakes are different from the analytical ones.

It will, however, make the misallocations legible. A funder who has run their portfolio through the framework once will not be able to un-see what they have seen. The choice between what the framework says and what the political system rewards becomes explicit. Sometimes the political logic will still win. Sometimes it will not. The framework's contribution is to make the choice conscious.

The world has more important problems than it has people in your position who are willing to argue for the unfashionable allocation. The framework gives you the vocabulary. The argument is yours.

— Siri Southwind

Read the framework · Limits and falsifiability · Dual-use and catastrophic risk · Reading list · 50 possibilities

Journalists

Journalists

Most beats run on who, where, when. The framework adds what and why now. It catches stories before they break, finds the misallocations the consensus is missing, and gives a vocabulary for the recurring patterns that other beats describe in case-by-case terms.

The thesis

Most stories about technology, science, business or public spending are reported as discrete events. The framework treats them as positions on cost curves, and the cost curves predict which events will happen in which order. The journalist who reads the curves first writes about the stories the consensus discovers eighteen months later.

This is not a futurist position. It does not ask you to forecast 2035. It asks you to read what is already happening now to sectors whose costs are collapsing, whose verifications are debt-laden, whose moats are eroding under their owners' feet.

Three patterns to look for

Cost-curve crossings. When the cost of doing something falls below a threshold that changes the structure of an industry, an entire wave of stories follows. The early warning is in the input cost, not in the consumer-facing product. Sequencing falling below a thousand dollars produced a decade of biotech stories. Foundation-model inference falling below a cent per thousand tokens is producing the current wave. The next wave is in robotic data collection, in catalysis and materials simulation, in autonomous experimentation. The early stories there are available now.

Cascade firings. When a foundational problem is solved (AlphaFold, a specific battery breakthrough, a regulatory change), a wave of downstream stories becomes inevitable but is not yet visible. The cascade is predictable; the specific companies that will benefit are not. The journalist who tracks the cascade rather than the headline finds stories that are still under-reported.

Allocation failures. When a substantial public or private resource is being committed to something the framework reads as misallocated. The five hundred billion dollars committed to NEOM. The continued sustainment cost of programmes whose strategic logic is from a previous era. The persistent enterprise spend on technologies that foundation models are commoditising. These are stories about institutional failure, told via the gap between what is being funded and what the cost-trajectory implies.

What the framework gives a journalist

A vocabulary. The dimensions in Dimensions and the anti-patterns in Anti-patterns name patterns that journalists describe case-by-case. Cascade chasing, verification debt, the path-dependence tax, the vanity sprint, demonstration dilution. Naming a pattern makes it usable across stories and across beats.

A specificity discipline. The framework's current 50 and dumbest 50 lists are calls. They will be wrong in specific places; they will be right in others. The journalist who treats them as starting points for reporting rather than as conclusions finds material that other reporters do not.

A counter-fashion lens. Most technology and business reporting is structurally captive to the consensus narrative because the sources are inside the consensus. The framework supplies a vocabulary for describing what the consensus is missing without resorting to either uncritical hype or reflexive scepticism. The pattern is here is what the curve says, here is what the consensus says, here is the gap.

A way to date predictions. Several of the framework's most pointed claims about specific projects and categories carry implicit time horizons. Reading those claims and revisiting them eighteen months later is one of the cleanest ways to evaluate which sources, which institutions and which strategists are worth listening to over time.

Stories the framework points at right now

Specific categories that are under-reported relative to their importance:

The picks-and-shovels of AI deployment. Most reporting is about model capability. Most of the actual money flows through the verification, evaluation, infrastructure and deployment-friction layer. The reporting follows the press releases; the value is elsewhere.

Dual-use defensive infrastructure. Most defence reporting is about offensive capability. Most of the framework-positive work is on the defensive side — biosurveillance, cyber defence, AI safety, anti-drone systems. Substantially under-covered.

Closing-window fieldwork. Endangered languages, fragile archives, ecosystem monitoring, oral history. Almost no current reporting on the institutions doing this work, on the cost-trajectories of doing it, or on the cascade value when it is later consumed by AI systems.

Allocation failures inside large governments. The procurement systems that produce F-35-shape outcomes are persistent; the reporting on them tends to focus on one incident at a time. Pattern-level reporting on what makes government procurement systematically misallocate is rare and valuable.

The replication crisis as an ongoing institutional fact. Most reporting on individual replication failures; little on the institutional incentives that produce them and the slow movement to address them.

What the framework does not give you

It does not give you sources you do not have. It does not give you the legal or ethical frameworks for working with sources in sensitive categories. It does not tell you how to balance the demands of accuracy and speed in your particular publication.

It does give you a way of organising what you read. The framework's reading list at Reading list is a serious investment that pays back across years of beats. The dimensions vocabulary makes patterns legible across stories. The current bets and ranked lists are intellectual sparring partners — agree, disagree, in writing, with arguments — and the disagreements often produce stories.

A simple weekly discipline

Once a week, look at one story you wrote or one you might write. Apply the framework's dimensions. Which curve is the underlying technology on. Which slow constraint is binding the deployment. What does the cascade look like if it works. What does the why now answer look like if you ask it honestly.

Most stories will pass with their basic structure intact. A few will reveal that they are at a different point on the curve than the press release implied. Those few are usually the ones worth deepening.

The framework is not a substitute for the craft of journalism. It is a way of investing the same effort more profitably across a beat.

— Siri Southwind

Read the framework · Anti-patterns · Current bets · Reading list · 50 possibilities

Field guides
Biotech & health

Biotech & health

This guide applies the Problem Timing framework to biotechnology, drug discovery and human health. It is meant for someone choosing what to work on, what to fund, or what to hold against in a portfolio over the next decade.

The biotech version of the framework is unusually high-leverage. Several of the field's dominant input costs (sequencing, gene synthesis, structure prediction, automated experimentation) are on steep cost-decline curves, while several of its constraints (regulatory approval, clinical-trial enrolment, biological complexity) are essentially flat. The gap between the moving parts and the still parts is where most of the interesting allocation decisions live.

The five curves that matter

Most decisions in biotech come down to where you sit on five underlying cost trajectories. Naming them and tracking them is half the discipline.

Sequencing. Roughly thousand-fold cost reduction over the past two decades. Continues to fall, though more slowly than in 2008–2014. The dominant constraint on most genomic-medicine projects is no longer raw sequencing cost. It is interpretation, sample availability and consent infrastructure.

Gene synthesis. Roughly hundred-fold reduction over fifteen years and continuing. The bottleneck is shifting from base-pair cost to the design-test-learn loop around what to synthesise.

Structure prediction. A discontinuous drop with AlphaFold. The cascade is still firing across drug discovery, enzyme engineering and basic biology. Pricing has caught up partially but not fully.

Automated experimentation. Lab-on-a-chip, robotic liquid handling, autonomous experimental platforms. Cost trajectory is steep but the field is still in its installation phase. Big upside; the curve has not yet reached its inflection.

Single-cell and spatial measurement. Single-cell RNA-seq, spatial transcriptomics, organoid models, multimodal imaging. The cost of resolution-per-dollar is collapsing fast and the methods are still maturing. The dataset that this curve produces will be the substrate of the next decade of biology.

These five inputs power most of what is interesting in current biotech. A project's why now should usually identify which curves it is riding and what it does with the gap between those curves and the slower constraints.

The slower constraints

These have not moved much and probably will not in the next five years.

Clinical trial cost and timeline. Largely flat in real terms for decades. Regulatory friction, trial-site infrastructure, patient enrolment and the statistics of small-effect detection set a floor that compute cost does not lower.

Regulatory approval. Slow, and rationally so for most cases. The cost of getting an approval is dominated by uncertainty about safety, not by the cost of the underlying science.

Biological complexity. Many problems in biology are not constrained by measurement; they are constrained by the underlying system being a tangled mess. AlphaFold solved structure; it did not solve mechanism. Some problems will resist all the cost-decline curves above because they are about the structure of the biology, not the cost of measurement.

Reimbursement and payer adoption. Often the binding constraint on whether a successful science becomes a successful product. Almost completely orthogonal to technical progress.

The interesting bets in biotech are usually at the intersection of fast-moving curves and slow-moving constraints — places where the cost collapse on the input side is not yet matched by the institutional change on the output side.

Specific framework readings

Drug discovery built on AlphaFold

The cascade has fired. Direct value of structure-aware drug design is real and growing. But the crowding is now substantial: every large pharma and many startups are running variations on the same theme. The framework reading is attack with caveats — pick a sub-domain (membrane proteins, intrinsically disordered proteins, conformational ensembles, allosteric sites) where the structure-only approach falls short and where you have a methodological edge. Bare structure-aware drug design as a thesis is no longer a strong bet.

Custom enzymes for industrial chemistry

A live and underpriced bet. Sequence design is cheap; synthesis is cheap; characterisation is getting cheaper. The downstream applications (materials, food, fuels, plastics, pharmaceutical intermediates) are still dominated by traditional chemistry and largely unaware of what enzyme design can now do. Cascade value high, crowding low, demonstration value substantial. Attack now.

Long-running cohort studies

A boring, high-leverage bet. The cost of measuring a cohort continuously — sequencing, biomarker panels, wearables, imaging — is collapsing. The value of a cohort dataset compounds with time and cannot be retroactively created. Attack now, knowing that the payoff is twenty years away.

Organoids and complex in vitro systems

A field that is just early enough. The technology works; the data interpretation is improving fast; the regulatory acceptance for using organoids in drug development is starting to arrive. Cascade value real and growing. Window for being among the first three serious players is closing within twenty-four months.

Diagnostic AI on existing imaging modalities

Largely probably wait in 2026. The capability is mature, the regulatory friction is the binding constraint, and the marginal model improvement is small. Better to attack the deployment-and-reimbursement problem than the model-quality problem. The interesting cracks are in modalities where good labelled data is genuinely scarce (rare-disease imaging, low-resource settings).

Cell therapies and gene therapies for non-orphan indications

A genuinely hard call. The technology works; the costs are still high; the addressable market depends on manufacturing innovation that has been promised for years and arrived slowly. Framework reading: attack the manufacturing cost problem rather than the new-indication problem. The cascade from cheaper, more reliable manufacturing is enormous.

Longevity-specific interventions (cellular reprogramming, senolytics)

A field where the consensus is sharply divided. The framework would attack specific tractable sub-problems (biomarker panels for biological age, replication of mouse-lifespan results in larger animals, mechanism of partial reprogramming) and defer the headline cure aging framing. The cascade if any of the sub-problems work is enormous. The crowding is moderate but rising.

Mental-health interventions

A field where progress has been stubbornly slow and where the framework is at its most uncertain. Some specific bets — closed-loop neurostimulation, precision psychiatry built on biomarker stratification, AI-assisted cognitive therapy at scale — are attackable now. The wholesale "cure depression" framing remains a slow problem.

Synthetic biology platforms

The platforms are real; the customer base is forming. Direct value is moderate; cascade value is large. Crowding is rising. Framework reading: pick a vertical (food, materials, agriculture, therapeutics) where you have a defensible strain library or process advantage; the generic synbio company thesis is weakening.

Brain-computer interfaces

A just early enough bet for a small set of medical indications (paralysis, severe communication impairment) and a moonshot for general consumer applications. Framework reading: attack the medical use cases with conviction; treat the consumer applications as portfolio-shaped moonshots.

Pandemic preparedness

Closing-window dynamics in reverse — the further we get from COVID, the harder it is to fund. Direct value: large in expectation, small in any given year. Cascade value: substantial across diagnostics, vaccines, antivirals, surveillance. The framework rating is high; the political tailwind is unfortunately weakening. A clear case for patient public funding.

What the framework de-prioritises in biotech

A few things the framework would currently rate as lower-leverage than the consensus does.

Yet-another-pharma-company built on a slightly novel target without a cost-curve advantage. The infrastructure to evaluate any single target is now broadly available, and the marginal team's edge in target-picking is smaller than ten years ago.

Marginal improvements to liquid biopsy specificity in heavily-funded cancer indications. Crowded; the cost-trajectory is steep but the rest of the field is moving with you.

Most "AI for X in healthcare" plays where X is a well-funded application with established players. The marginal model improvement is small relative to the institutional friction.

Generic CRISPR-based therapies for indications where multiple teams already have IND-stage programmes. Pricing has caught up; the marginal team's contribution is small.

What the possibilities list says about biotech and health

The biotech-relevant entries in 50 possibilities are unusually dense: pathogen surveillance (1), pan-coronavirus and pan-influenza vaccines (2), mucosal sterilising-immunity vaccines (3), far-UV-C deployment (4), nucleic-acid synthesis biosecurity (5), pan-cancer early detection (19), engineered phage therapies for AMR (20), senolytic screening at scale (21), standardised ageing biomarkers (22), whole-organ vitrification (23), AI rare-disease diagnostics (24), open medical-imaging foundation models (25), formal-verification toolchains adapted for medical software (27), open EHR and clinical-data interoperability (38), open foundation models for protein function and dynamics (45), drug repurposing for orphan and tropical diseases (46), engineered kill-switches and biocontainment (49). Read with the curves and slower constraints in this guide, the cluster maps cleanly onto the under-attacked frontier.

How to use this guide

If you are a founder or investor, the most useful exercise is to take three current bets and one bet you passed on, and run each through the five curves and the slower constraints. The verdict is rarely a single number; it is usually a sentence about which curves the bet is riding and which constraints are binding.

If you are a scientist or PI, the more useful exercise is the Hamming-question version. What are the most important biological problems that are now tractable in a way they were not five years ago? Most senior researchers can answer this. Most graduate students cannot, and most current PhD programmes do not teach the question explicitly. That is a problem in itself.

If you are a foundation or public funder, the framework would direct unusually high portfolio shares to long-running cohort studies, pandemic preparedness, the manufacturing-cost problem in cell and gene therapy, and the open canonical datasets in legally-complex domains. None of these are fashionable; all are high-leverage.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

AI & machine learning

AI & machine learning

This guide applies the Problem Timing framework to artificial intelligence and machine learning. It is meant for anyone choosing what to build, fund or work on inside a field whose own cost-trajectory is the steepest in any technology domain in living memory.

The AI version of the framework is unusual because the field's cost curves are changing the cost curves of every other field. A bet on AI is partly a bet on AI itself and partly a bet on the second-order effects across biology, materials, robotics, software, education and government. This field guide focuses on the first; the second-order effects are addressed in the other field guides.

The most uncomfortable feature of the field is that it is the one most likely to embarrass the framework itself within five years. The reasoning here will read either prescient or naïve depending on which capability arrives when. I have tried to make the calls specific enough to be wrong about visibly.

The five curves that matter

Frontier-model capability. The composite of what the best generally-available model can do across a battery of evaluations. The headline curve. Capability has been rising on what looks like a Wright's-law-ish trajectory in cumulative compute and data, with periodic discontinuities when architectural and post-training innovations land.

Inference cost per useful token. The cost of running a model with a fixed capability profile. Falling fast — at the time of writing, capability roughly equivalent to GPT-4 of 2023 is available at one to two orders of magnitude lower price. Continues to fall.

Training cost for a frontier model. Rising in absolute terms (more compute, more data, more post-training work) but falling for any fixed capability point. Both curves matter. The absolute curve dictates who can afford to play; the fixed-capability curve dictates how fast capabilities flow downstream.

Useful context length and tool-use reliability. Distinct from raw capability. The practical usefulness of a model depends on how much context it can hold, how well it uses tools, and how reliably it executes multi-step plans. This curve has its own dynamics and is the binding constraint on most agentic applications today.

Synthetic data quality. The ability of strong models to generate training data for the next generation. A self-reinforcing loop with diminishing returns somewhere — but where the diminishing returns kick in is genuinely unknown. The single most consequential variable for forecasting the speed of future capability growth.

These five curves drive almost every interesting allocation decision in AI today. A project's why now should usually identify which curves it is riding and which it is exposed to.

The slower constraints

Trust and verification at deployment. The cost of confirming that a deployed system is doing what it should is largely flat. Hallucination, brittleness in distribution shift, alignment with operator goals, robustness to adversarial input. The cost trajectory of the generation side is collapsing while the cost trajectory of the verification side is essentially unchanged. The asymmetry is the most important institutional fact in the field today.

Distribution and trust as a consumer-facing product. Capability does not equal adoption. Apple and Google have distribution that pure-play AI companies do not. OpenAI has reputation that smaller labs do not. The distribution constraint moves slowly relative to the capability constraint and shapes the commercial landscape disproportionately.

Regulation and political acceptance. The EU AI Act, the various national framework regulations, the executive orders and the looming sectoral regulations in healthcare, finance and education. The pace at which the regulatory environment moves is an order of magnitude slower than the capability environment and creates the most predictable arbitrage opportunity in AI today: regulated industries will adopt later but, when they do, will reward the players who understood compliance early.

Energy and physical infrastructure. Data-centre power, water cooling, transmission capacity. The substrate on which the capability curve runs. Until recently treated as a free input by most planners; now visibly the binding constraint on the largest training runs.

Specific framework readings

Frontier-model labs

Direct value: substantial; the labs producing the best general-purpose models capture a disproportionate share of the field's value. Cascade value: enormous and growing. Crowding: extreme — perhaps a dozen credible frontier labs globally with the capital to compete. Framework reading: the marginal team's contribution outside the top three to five labs is unclear; the bet on lab number eight is materially weaker than the bet on lab number two.

Specialist AI companies built on frontier APIs

Companies whose entire value derives from being a thin layer on top of OpenAI, Anthropic, Google or Meta APIs. Framework reading: covered extensively in Current bets and 50 current likely-dumb. The general verdict is probably wait or probably skip; the exceptions are companies whose data, workflow integration or distribution provide genuine moats.

Vertical agents in regulated domains

Legal, medical, financial, accounting, scientific. The verticals where the regulatory friction is real, the domain knowledge is deep and the data scarce. Framework reading: a current high-leverage bet. Cascade value real, regulatory tailwinds favourable for builders willing to do the institutional work, distribution constraint relatively permeable.

Robotics and embodied AI

The category that absorbs most of the second-order effects of AI on the physical world. Framework reading: the foundation-model approach is now showing real progress in vision-language-action models, sim-to-real transfer is improving fast, and the cost-trajectory of robotic platforms is falling. Genuinely just early enough — the kind of bet whose timing is right but execution risk remains substantial. Crowding is rising.

Synthetic data, evaluation infrastructure and verifiers

The picks-and-shovels category. Framework reading: cheap, robust verifiers are dramatically undersupplied relative to the value they would unlock, and the foundation-model owners have an interest in better evals being widely available. The category is small now and the framework's verdict is to attack now; the consolidation will follow once the major labs decide which verifiers they will adopt.

AI alignment and safety research

A field with serious internal disagreement about which sub-problems are tractable and which are not. Framework reading: empirical work that engages with current frontier models compounds; theoretical work that does not engage with deployed systems risks being elegant and irrelevant. The framework rates the empirical wing of the field highly and is sceptical of the theoretical wing's near-term cascade.

Open-weights ecosystems

Llama, Mistral, the Qwen family, the open-weights research community. Framework reading: a category whose direct value is real and whose cascade value (training, education, sovereign deployment, security research) is large. Crowding is moderate, and the institutional logic of open weights is fragile in the current geopolitical environment. Worth attacking now while the window is open; the framework would predict tighter restrictions within a few years.

AI for code

The fastest-moving applied category. Framework reading: foundation-model-native products dominate; standalone wrappers face severe pressure; the genuine differentiation is in integration with existing developer environments and in the model's ability to act, not just to suggest. Cursor-like products that own a primary developer surface are differently positioned from products that compete with Copilot inside someone else's IDE.

AI for healthcare

A separate field guide Biotech & health covers this in more depth. The framework reading at the AI/ML layer is that radiology and imaging are crowded and largely on the wrong side of the framework, while clinical-workflow and care-pathway AI sits on a genuinely permeable regulatory landscape and a real data moat for the operators who build it.

AI for science

Tools, data infrastructure and agentic workflows aimed at accelerating scientific research itself. Framework reading: an underrated category. The cascade value is substantial — a measurable acceleration of any major scientific discipline produces value that compounds across decades. The early players have small teams and small budgets relative to consumer AI; the framework would direct more capital here than is currently flowing.

AI for education

A category competing directly with the consumer products of the major labs. Framework reading: the headline category (general-purpose AI tutors) is being absorbed by foundation-model owners. The defensible niches are domain-specific (language acquisition, professional certification, K-12 with strong distribution to schools), and the most under-priced opportunity may be on the teacher side rather than the student side — tools that radically increase teacher leverage rather than replacing the student-facing channel.

Agentic frameworks and orchestration tools

Categories like AutoGPT-descendants, LangChain, the agentic-framework ecosystem. Framework reading: this is library code in transit to becoming standard library code. The value is being absorbed into the foundation-model providers' SDKs and the cloud platforms' native offerings. Standalone businesses in this category are mostly probably wait unless they own a specific runtime, evaluation or trust layer that the platforms cannot integrate cheaply.

Compute and infrastructure

Data centres, training clusters, inference networks, custom silicon. Framework reading: a category where capital intensity, regulatory complexity and energy access dominate. Public-and-private capital flowing aggressively. The framework reading is positive for the compute layer generally and selectively positive on the custom-silicon layer; specific bets within custom silicon (Cerebras, Groq, the various startup ASIC companies) are individually contested.

Voice, multimodal interfaces and embodied agents

The interface layer. Framework reading: the foundation-model owners are aggressively integrating; the standalone businesses survive only if they own a hardware surface or a regulated workflow. Most current voice-assistant startups are probably wait or skip.

What the framework de-prioritises in AI

Yet-another foundation model trained from scratch by a non-frontier lab. Most thin wrappers without data moats. Most "AI for X" plays where X is a well-served vertical. Most generic agentic frameworks. Most "responsible AI" compliance products built around frozen 2023-era definitions. Most consumer voice assistants. The pattern is consistent: capability is being commoditised; differentiation lives in data, distribution, regulation and integration.

What the framework prioritises that the consensus does not

A few categories the framework rates more highly than current investment patterns suggest.

AI for science in the broad sense — better tools for working scientists in physics, chemistry, biology, materials, mathematics — is significantly underfunded relative to its cascade.

Verification, evaluation and safety infrastructure used by other AI builders is undersupplied as a public good and as a commercial offering.

Capture of tacit professional knowledge before retirement removes it from accessible training data is a closing-window project the consensus is not yet pricing.

Regulated-vertical agents in domains where the institutional friction is high are persistently undervalued by venture capital because the timelines are longer and the diligence is harder.

Robotic data collection at scale is the most underrated bet the framework currently identifies. The supply of internet-scale text has plateaued; the next bottleneck is grounded sensorimotor data, and the cost-trajectory of robotic data acquisition is steep enough that whoever brute-forces it now will compound advantage for a long time.

What the possibilities list says about AI and machine learning

The cluster in 50 possibilities closest to this field reads as the framework's call on where serious AI talent should sit. AI red-teaming and evaluation as public infrastructure (6), mechanistic interpretability of frontier models (7), held-out evaluation benchmarks (8), open medical-imaging foundation models on consented data (25), privacy-preserving ML at production scale (26), formal-verification toolchains for safety-critical software (27), reversible and energy-efficient compute substrates (44), open foundation models for protein function and dynamics (45), Lean and Mathlib expansion with AI-assisted proof (47), generalisation benchmarks for autonomous mobile robots (48). The list is heavy on defender-favoured and verification-economics work for a reason: that is the part of the field the framework most clearly says is under-attacked.

On AI scenarios

This is the field where single-scenario thinking is currently most dangerous. The dominant implicit forecast — continued capability scaling, declining cost per useful token, agents getting steadily more capable, governance lagging — is one plausible scenario among several. The honest scenario set for an AI allocator includes at least: modal (the implicit forecast), capability plateau (scaling laws break, gains come from algorithmic and post-training work, the differentiation moves to evaluation and deployment), fast take-off (one or more frontier labs achieve a recursive-improvement loop and the timeline compresses by years), governance shock (a major incident triggers binding international constraints), hardware bottleneck (energy and fabrication caps bite earlier than the modal forecast assumes). Bets that score well in only the modal scenario are exposed; bets that score well across most of the set are robust. The framework is a sharper tool against the AI question when run scenario-conditionally.

How to use this guide

If you are a founder, the most useful exercise is to identify which of the five curves your business depends on, which of the slower constraints is binding for your customers, and where the gap between the two creates a defensible position. If your business depends on capability and the slower constraint is regulation, the regulatory move is your differentiator. If your business depends on inference cost falling, you are running a clock that someone else controls.

If you are an investor, the useful exercise is to take the current bets list and the current 50 list and disagree with specific entries in writing. The exercise sharpens the lens and the disagreements often reveal the strongest theses.

If you are a researcher, Hamming's question with the AI tractability landscape attached: what is the most important problem in your sub-field that has just become tractable, and why is your current programme not the team to attack it?

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Compute & robotics

Compute & robotics

This guide applies the Problem Timing framework to compute and robotics. It complements the AI/ML guide; that one is mostly about the cognitive layer, this one is mostly about the physical and electrical layer that the cognitive layer runs on. The two are increasingly inseparable, but the cost-trajectories and slow constraints are different enough to warrant separate treatments.

The compression. Compute is moving on multiple curves simultaneously; robotics is at a long-anticipated inflection point as foundation-model approaches start to work for vision and control. The framework rates the picks-and-shovels infrastructure here particularly highly because almost everything else the framework recommends depends on it.

The five curves that matter

Cost per FLOP. The headline compute curve. Has fallen by orders of magnitude over decades and continues to fall, with periodic discontinuities driven by architecture changes (custom AI silicon, photonic, eventual quantum). The relevant sub-curves are inference cost (falling fast) and training cost (rising in absolute terms but falling for any fixed capability point).

Robotic platform cost. The hardware cost of a usefully-equipped robot — arms, mobile bases, manipulators, sensors. Has been falling steadily and is now well below the historical reference points used to dismiss the field as commercially unviable. Industrial-grade arms are an order of magnitude cheaper than they were a decade ago; humanoid platforms are entering single-digit-tens-of-thousands pricing.

Sensor cost and capability. Cameras, LiDAR, radar, force-torque sensors, IMUs, specialised modalities (event-based vision, hyperspectral, soft-tactile). Cost per useful pixel or measurement has collapsed; the bottleneck is shifting to interpretation rather than acquisition.

Sim-to-real fidelity. A specifically robotics curve. The gap between behaviour learned in simulation and behaviour deployed on real hardware. Has narrowed dramatically as foundation-model approaches and large-scale simulation become cheaper. The first time this curve has been steep in the field's history.

Energy cost per useful operation. Watts per inference, per training step, per robot-hour of operation. The constraint is rising in importance as cognitive cost falls — the binding constraint for several large bets is now grid capacity, not chip availability.

The slower constraints

Real-world data. The supply of grounded, high-quality robotic data — what real machines see, touch, manipulate — is the single largest binding constraint on embodied AI in 2026. Internet-scale text plateaued; sensorimotor data has not yet been collected at internet scale and the cost-trajectory of collection, while improving, is slower than the cognitive curve.

Energy and grid infrastructure. Power for compute, particularly at scale. Energy availability is now binding for the largest training runs and increasingly for inference at population scale. Permitting for new generation and transmission moves on a different clock than the underlying technology.

Manufacturing and supply chain. The actual ability to build useful robots at scale. Cost-trajectory of components has improved; coordination and quality at scale remain hard. The post-2020 supply-chain experience is still shaping decisions.

Regulation and liability. Particularly for systems operating in shared physical space — autonomous vehicles, drones, surgical robots, household robots. Regulatory consent for deployment moves substantially slower than capability.

Specialised talent. Robotics requires a combination of mechanical, electrical, control, software and ML expertise that is genuinely scarce. The talent-density dimension is binding in this field in a way that it is not in pure-software fields.

Specific framework readings

Foundation-model accelerators and AI-specific silicon

NVIDIA's dominant position, AMD's growing share, Cerebras, Groq, Tenstorrent, the Chinese players, the Apple/Google/Amazon internal silicon programmes. Framework reading: a category whose unit economics are rapidly improving and whose strategic importance is high. The crowding is severe; the specific architectural choices (transformer-optimised, attention-aware, photonic, neuromorphic) carry different framework verdicts. Most generic AI-silicon startups face the same fate as previous waves of specialised chip companies — consolidation or absorption. The few with genuinely defensible architectural advantages or customer captives can compound.

Custom silicon for specific verticals

Inference accelerators for edge devices, automotive AI silicon, defence-specific chips, biomedical-imaging chips. Framework reading: vertical-silicon plays with embedded relationships and certification pathways are better-positioned than horizontal challengers. Attack now in defensible verticals; probably wait on horizontal generic AI accelerators.

Robotic data collection at scale

Discussed at length in the AI/ML guide and the current bets list. Framework reading: the highest-leverage robotics bet currently available. Companies and consortia building the data infrastructure (collection fleets, annotation pipelines, sim-real bridges) are positioned to compound for a decade or more. Attack now with strong conviction.

Humanoid robots

A category that has gone from speculative to commercially active in the last two years. Boston Dynamics' Atlas, Tesla's Optimus, Figure, 1X, Agility Robotics, several Chinese and Korean players. Framework reading: the platform thesis (a general-purpose humanoid replaces specialised industrial robots) is contested and probably overstated for the near term. The vertical thesis (humanoids in specific environments — warehouses, eldercare, hospitality, certain industrial applications) is more credible. Attack with caveats; pick the vertical and the data-acquisition strategy carefully.

Industrial automation beyond traditional robotics

Closed-loop manufacturing, autonomous mobile robots in warehouses, machine-vision inspection, robotic dispensing, additive manufacturing at scale. Framework reading: a category that has moved from research to deployment, with demand pulled by labour shortages and reshoring. Crowding is rising; the marginal team's edge is in domain-specific integration rather than generic capability. Attack now selectively for verticals with clear customer pull.

Autonomous vehicles

A long-running case. Framework reading: the technology has worked at scale in geo-fenced urban environments since 2023; the deployment economics, regulatory posture and trust environment are catching up unevenly. Waymo, Tesla, Cruise (limited), the Chinese players. Attack with caveats; the cost-trajectory of the technology is favourable, the deployment-environment cost-trajectory is not.

Drone delivery and aerial logistics

Categories ranging from medical delivery in low-infrastructure regions (where it works) to consumer last-mile in dense urban areas (where it is dubious). Framework reading: medical and remote-area applications are attack now for the right operators; urban consumer delivery is probably wait until regulatory and noise environments mature. Defence applications are framework-positive but covered in Defence.

Robotics for agriculture

Covered in Agriculture. The framework's reading there is favourable for specific tasks (weed control, harvesting, monitoring) where the data-acquisition cycle is long and the labour cost is rising.

Surgical and biomedical robotics

Da Vinci-style platforms, the new generation of cheaper minimally-invasive systems, surgical-AI augmentation, microrobotics for drug delivery. Framework reading: a category with a real technical floor (FDA approval, surgeon training) but compelling long-term cascade. Attack with caveats; pick indications where the regulatory pathway is open.

Quantum computing

A category that has been almost there for two decades. Framework reading: most useful quantum advantage demonstrations remain narrow and contested. The framework's open question verdict in the current bets list still holds. Specific bets on hardware (superconducting, ion-trap, photonic, topological) carry different verdicts; the algorithm and software-stack layers are crowded.

Edge inference and on-device AI

Foundation-model deployment on phones, vehicles, sensors and embedded systems. Framework reading: a real and growing category. The cost-trajectory of edge-capable hardware is favourable; the privacy and latency arguments are durable. Attack now selectively for specific application domains.

Energy and cooling for compute

Data centre cooling, novel cooling chemistries, geothermally-cooled facilities, specialised power infrastructure for AI workloads. Framework reading: a category whose strategic importance is rising fast. The picks-and-shovels position here is undervalued relative to direct AI investment.

Photonic and neuromorphic computing

Light-based and brain-inspired alternatives to digital silicon. Framework reading: the framework reads photonic interconnect as a near-term bet (already deployed in some training systems) and photonic compute as still early. Neuromorphic remains an open question — the demonstrations have not yet produced compelling cost-or-energy advantages over conventional silicon for most workloads.

What the framework de-prioritises

Most generic AI-accelerator startups without architectural advantages or customer captives. Most consumer-robot products without a clear vertical use case. Most "humanoid robots will replace all labour" pitches at current capability levels. Most blockchain-related compute infrastructure plays. Most quantum-supremacy claims that do not produce useful work. Most marginal improvements to autonomous-vehicle stacks that do not address the deployment-environment problem.

What the framework prioritises that the consensus does not

Robotic data collection infrastructure remains the most underrated picks-and-shovels bet in the field. The supply of internet-scale text peaked; the next decade's foundation models will be trained on the data current robotic-data efforts are collecting. Whoever owns the collection pipeline owns the next moat.

Energy-and-cooling infrastructure for AI is undercapitalised relative to its strategic importance. The bottleneck on the largest 2027–2030 training runs will be electrical, not silicon.

Vertical robotics in unsexy industries — eldercare, warehouse logistics in non-amazonised regions, food preparation, building maintenance — receive a fraction of the attention given to humanoid moonshots and pay back faster.

Sim-real bridges and grounded evaluation environments are infrastructure that the foundation-model labs themselves want and undersupply. Building them is a defensible business with a long compounding curve.

Standards and interoperability infrastructure for multi-robot systems is dramatically undersupplied. Whoever builds the equivalent of TCP/IP for embodied agents is sitting on a category-defining position.

How to use this guide

If you are a founder, the most useful exercise is to identify whether your bet rides the cognitive curve (cheap), the platform curve (cheap), or the data curve (expensive but compounding). If the answer is the cognitive curve alone, your moat is weak. If the answer is the data curve, the moat is real but takes years to compound.

If you are an investor, ask honestly what the binding constraint on the customer's deployment is. For most physical-AI businesses today, it is regulation, real-world data, or energy — not capability. Pricing the regulatory and infrastructure friction explicitly produces sharper portfolio decisions than capability-only analyses.

If you are a public funder, the framework directs you toward shared robotic-data infrastructure, energy-and-grid investment specifically for compute, the qualification and standards work that no commercial actor will fund alone, and the sim-real-bridge research that compounds across many downstream applications.

— Siri Southwind

Read the framework · AI and machine learning · Defence · Current bets · 50 possibilities

Climate & energy

Climate & energy

This guide applies the Problem Timing framework to climate change and energy systems. The field is unusually well-suited to the framework because so many of its key technologies sit on Wright's-law-style cost-decline curves, and unusually badly served by it because the political economy of climate policy distorts the framework's standard inputs.

The honest framing is that climate is partly a curve-bending problem (technologies whose costs need to fall further to displace incumbent fuels) and partly a deployment-and-coordination problem (policies, grids, supply chains, behaviour change). The framework is more useful for the first than the second; the second has its own literature and its own discipline.

The five curves that matter

Solar photovoltaic levelised cost. Has fallen by roughly two orders of magnitude over the past two decades on a Wright's-law trajectory. Continues to fall, though the panel itself is now a small fraction of installed system cost — most of the remaining cost is balance-of-system, permitting, grid connection and labour.

Lithium-ion battery cell cost per kWh. Roughly an order of magnitude reduction over fifteen years and continuing. The dominant cost driver of electric-vehicle and grid-storage economics. Battery chemistry diversification (LFP, sodium-ion, solid-state) is now meaningfully changing the trajectory.

Wind energy levelised cost. Slower decline than solar but real, particularly offshore where turbine size and installation methods have driven significant gains. Mature relative to solar.

Electrolysis and clean hydrogen cost. A genuinely contested curve. Stack costs are falling; system integration costs are not falling as fast; the cost per kilogram of clean hydrogen at scale remains uncertain. The framework verdict on hydrogen as a vector depends critically on this curve.

Carbon-capture cost per ton. The most important of the second-tier curves. Direct-air capture is in the early-Wright's-law phase with steep promised declines but small cumulative production; point-source capture is more mature but constrained by application. The pace of this curve over the next decade will determine whether net-negative pathways are available at acceptable cost.

These five curves drive most of the technology side of climate work. Most of the rest of the field is about deploying or integrating the outputs of these curves into an existing system.

The slower constraints

Grid infrastructure and interconnection. The cost of building and permitting transmission has barely fallen and in many jurisdictions has risen in real terms. The single largest deployment bottleneck for solar and wind in most developed countries.

Permitting, environmental review and siting. Process-bound and politically constrained. The deployment cost of new energy infrastructure is dominated by these in many regions; technology improvements pass through to deployment slowly.

Heavy industry inertia. Steel, cement, ammonia, plastics, aviation, shipping. Capital stock turns over on multi-decade cycles. Even with mature technology and favourable economics, deployment lags by ten to thirty years in most cases.

Behavioural and institutional change. Diet, urban form, mobility patterns, building practices. The slowest-moving constraints; the framework can identify them but not move them.

Regulatory and political volatility. Climate policy oscillates by jurisdiction and by electoral cycle. Investment decisions made under one regime are repriced under the next. The cost of this volatility is real and persistently underestimated.

Specific framework readings

Utility-scale solar deployment

Direct value: large. Cascade: moderate (drives further curve-bending and grid evolution). Crowding: high. Framework reading: the technology bet has been won; the deployment bet is dominated by transmission, permitting and finance. Investment in the deployment-friction problem (permitting reform, transmission acceleration, project finance innovation) is more leveraged than investment in panel-tech itself.

Offshore wind

Direct value: large in jurisdictions with the right resource. Crowding: moderate. Cost trajectory: still declining but less steeply than solar. Framework reading: a deployment-finance and supply-chain problem more than a technology problem. The recent setbacks in US offshore wind illustrate the pattern: the technology works, the project economics are sensitive to interest rates and supply-chain shocks.

Battery storage at grid scale

Direct value: large and growing. Cascade: enormous — grid-scale storage is the substrate for high-penetration renewables. Crowding: rising fast. Cost trajectory: still favourable. Framework reading: a current high-leverage bet on the deployment side; the technology layer is already commoditised and dominated by Chinese cell manufacturers.

Battery chemistry beyond lithium-ion

Sodium-ion, solid-state, flow batteries, novel chemistries for stationary storage. Framework reading: a category with real upside and considerable execution risk. The framework would attack the grid-stationary segment (where energy density matters less than safety, cost and cycle life) more aggressively than the EV-applied segment (where Li-ion has compounded advantages). Sodium-ion in particular looks favourably timed.

Direct-air capture

Cost per ton currently in the high hundreds of dollars; needs to reach the low tens to be a serious component of net-negative pathways. Framework reading: a curve-bending project at the edge of the framework's confidence. The cascade if the curve breaks is large; the cost of brute-forcing demonstration projects is justified. The category is correctly funded today; whether it will be in five years depends on early-deployment performance.

Geothermal — deep, enhanced and superhot rock

Recently re-energised by techniques borrowed from the shale-gas industry. Framework reading: a field whose cost-trajectory has been flat for decades may be about to inflect. The crowding is low, the cascade is large (firm, dispatchable, low-carbon power), and the timing argument is strong. A current attack now in the framework's reading.

Nuclear fission, both conventional and small modular reactors

A long-running case the framework has trouble with. Direct value: substantial. Cascade: large where deployed. Cost trajectory: highly path-dependent; can be favourable in some regulatory regimes (Korea, France historically) and unfavourable in others (US, UK recently). Framework reading: not a uniform call; depends heavily on jurisdiction. SMR specifically is attack with caveats — a real chance of bending the cost curve, with execution risk amplified by regulatory uncertainty.

Fusion

Discussed in Current bets. Framework reading: a small portfolio of fusion bets is justified by the cascade-if-successful; the demonstration value of net-energy-positive operation, if it arrives, would be enormous. A large concentrated fusion bet is harder to defend.

Hydrogen as an industrial feedstock

Steel, ammonia, refining. Framework reading: the genuine high-leverage applications of clean hydrogen. The cost-trajectory of electrolysis matters here more than in any other application; the policy support (production tax credits, contracts-for-difference) is substantial in several jurisdictions. Attack with caveats in the credit-supportive jurisdictions; probably wait elsewhere.

Hydrogen for personal transportation

Discussed in Current bets. Framework reading: this category lost the platform war to battery-electric vehicles and the curve does not favour a recovery. The infrastructure investment is largely framework-misallocated.

Sustainable aviation fuel

A category receiving substantial public and private support. Framework reading: the cost trajectory is unfavourable for first-generation fuels (HEFA-based) and depends entirely on whether next-generation pathways (e-fuels, MSW-derived) bend the curve. Worth a portfolio bet; not yet a confident sector call.

Long-distance shipping decarbonisation

Methanol, ammonia, biofuels, wind-assist. Framework reading: a slow but durable category. The fleet turnover is multi-decade; the policy direction is increasingly clear (IMO net-zero by 2050); the technology choice is contested. The framework would direct capital toward whichever fuel pathway can show the steepest cost-decline; ammonia is currently favoured but contested.

Heat pumps for residential and commercial buildings

A mature technology with persistent deployment friction. Framework reading: attack the deployment friction, not the technology. Installer training, supply-chain build-out, retrofit financing, electrification mandates. The technology curve is essentially flat; the deployment curve has substantial room to move.

Geoengineering — solar radiation management

A genuinely controversial category. Framework reading: the dual-use and irreversibility concerns are severe; the framework's basic dimensions are inadequate without explicit weighting on those. A small portfolio of research on SRM (so we know what would happen if anyone tried it) is defensible; deployment is in a category the framework cannot evaluate without an external moral framework.

Methane reduction

The fastest-acting climate lever for the next two decades. Framework reading: a chronically under-invested category given its leverage. Detection (satellites, sensors), abatement at fossil-fuel facilities, agricultural reduction (cattle feed additives, manure management). The cost-trajectory of detection is collapsing fast and the regulatory direction is clearer than for most climate categories. Attack now in the framework's strongest reading.

Critical minerals and supply-chain build-out

Lithium, cobalt, nickel, rare earths, copper. Framework reading: a category dominated by capital-intensive long-cycle projects with substantial geopolitical complexity. The framework reading varies by mineral and by jurisdiction. Lithium specifically is having its boom-bust cycle; the patient capital that survives the cycle will compound.

What the framework de-prioritises in climate

Most consumer carbon-offset schemes with weak verification. Most retail-finance "green portfolio" products that do not finance new capacity. Most behavioural-change campaigns at the individual scale (the leverage is on industry, not on consumers). Most marginal improvements to mature renewable technologies once costs have hit deployment-bound floors. Most "smart grid" enterprise software bought by utilities rather than built into the grid.

What the framework prioritises that the consensus does not

Methane reduction is consistently underrated relative to its near-term climate leverage.

Permitting and transmission reform is the highest-leverage policy investment available in many jurisdictions and gets a tiny fraction of the climate-philanthropy budget.

Geothermal is at a curve-inflection moment and undercrowded.

Industrial decarbonisation is undercapitalised relative to consumer-facing alternatives because it is unphotogenic; the cascade value is enormous.

Climate adaptation — the unfashionable cousin of mitigation — is approaching its own moment of relevance as the realised effects of warming intensify; the framework would be more aggressive on adaptation infrastructure than current allocations are.

What the possibilities list says about climate and energy

The climate-relevant entries in 50 possibilities hit the curves and slower constraints in this guide head-on: closed-loop and advanced geothermal drilling (11), long-duration energy storage (12), heat-pump electrification of industrial heat above 200°C (13), grid software and interconnection-queue automation (14), high-temperature superconducting magnets at production scale (15), battery chemistries beyond lithium (31), CO2-to-fuels and CO2-to-cement chemistry (32), ocean carbon-removal MRV (33), high-resolution glacier and ice-core archives (36), kelp and seaweed aquaculture (50). Several of these are deployment-curve bets where friction reduction beats further technology development; a few are genuine demonstration plays where first-of-a-kind capital is the binding constraint.

A note on IPCC scenarios

Climate is the field where the Wack/Shell scenario tradition has had the most public impact — successive IPCC emissions scenarios (SRES, RCPs, the SSPs) descend directly from the discipline, with Shell alumni including Ged Davis instrumental in the early IPCC scenario process. The point worth taking from this for any climate allocator: the scenarios are not predictions, and the correct response is not to bet on the modal scenario but to identify the bets that pay off across the envelope. A first-of-a-kind clean-cement plant is robust across most of the SSPs; a sovereign hydrogen-economy programme is directional on a specific scenario about industrial demand. The framework's portfolio recommendation — over-weight robust bets, take directional bets only with antifragile by-products — applies to climate-and-energy more cleanly than to almost any other field.

How to use this guide

If you are a climate investor, identify which of the five curves you are riding, which of the slower constraints is binding for your customers, and where the gap creates a defensible position. Most underperformance in climate venture comes from underpricing the slower constraints, not from misjudging the technology curves.

If you are a policymaker, the framework directs you to permitting reform, transmission acceleration, methane regulation and adaptation infrastructure ahead of further deployment subsidies for already-cheap technologies.

If you are a founder, the most useful exercise is to ask honestly whether your business depends on a technology curve that has already flattened (the marginal value of further reductions is small) or on a deployment curve that is still moving fast (the marginal value of friction reduction remains large).

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Materials

Materials

This guide applies the Problem Timing framework to materials science and chemistry. The field is interesting because it has been on the receiving end of accelerating technology rather than a driver of it — its core experimental practices have been substantially the same for fifty years — and is now poised for a transition that the framework rates very highly.

The argument compresses to a sentence. Materials and chemistry have lived for a century with a mismatch between the speed of computation (fast) and the speed of experimental validation (slow). Several technologies are now closing that gap simultaneously. The institutions that move fastest to exploit the closing gap will compound advantage over the next decade.

The five curves that matter

Density-functional-theory and ab-initio simulation. The cost of usefully accurate quantum-mechanical calculations on real materials has fallen by orders of magnitude over two decades. The bottleneck has shifted from compute to chemistry-aware setup; the next inflection looks likely to be from machine-learned potentials trained on simulation outputs.

Machine-learned interatomic potentials. A genuinely new curve. Models like NequIP, MACE and the broader graph-neural-network potential family are now producing molecular-dynamics-quality simulations at orders of magnitude lower cost than ab-initio methods, with steadily-improving accuracy. The curve is steep and the cascade is just starting to fire.

Robotic and high-throughput synthesis. The cost of automating the experimental design-test-learn loop has fallen dramatically. Self-driving laboratories — closed-loop systems that propose, run and analyse experiments without human intervention — are now deployed in research settings and approaching production scale.

Characterisation throughput. X-ray diffraction, electron microscopy, mass spectrometry and increasingly automated optical characterisation. The throughput per dollar has risen substantially over the past decade and continues to do so. The bottleneck is shifting from data acquisition to data interpretation.

Open materials databases. The Materials Project, OQMD, AFLOW, NOMAD, Open Quantum Materials Database, the more recent Materials Genome Initiative outputs. The substrate that the elegant phase will run on. A patient brute-force project that has been quietly producing one of the most valuable scientific datasets of the past two decades.

These five curves are converging. A materials project's why now should usually identify which combination of them it is exploiting and what the project produces that the consensus is not yet pricing.

The slower constraints

Synthesis-property gap. The hardest constraint in the field. A material can be designed in silico with desirable predicted properties and turn out to be unsynthesisable, or to behave very differently in practice. Closing this gap is essentially the field's central problem. The cost-trajectory of prediction is collapsing; the cost-trajectory of synthesis at desired properties is moving more slowly.

Manufacturing scale-up. A novel material that works at gram scale may not work at ton scale. The cost of scale-up is essentially flat, and historically a substantial fraction of promising materials die at this stage.

Regulatory approval and qualification. Particularly for aerospace, automotive, biomedical and food applications. Materials qualification cycles are measured in years to decades. New materials face systematic resistance from procurement systems calibrated to existing suppliers.

Supply chain and feedstock cost. Some material designs require feedstocks that are themselves expensive or geographically constrained. The cost of an exotic material is bounded below by the cost of its inputs in a way that does not obviously fall over time.

Specific framework readings

Battery materials beyond lithium

Solid-state electrolytes, sodium-ion cathodes, silicon-rich anodes, alternative chemistries for stationary storage. Framework reading: a current high-leverage area. The five curves above are operating in tandem with substantial demand-pull from the EV and grid-storage markets. Crowding is rising but cascade value remains large. Attack now with caveats on which specific chemistries.

Catalysis for industrial decarbonisation

Better catalysts for ammonia synthesis, methanol production, CO2 reduction, water splitting. Framework reading: the highest-cascade-value sub-field of materials right now. The combined effect of better catalysts on the cost-trajectory of decarbonisation is enormous; the field is undercrowded relative to its leverage; the closing-window argument (we need the decarbonisation now, not in twenty years) is real.

Structural materials — alloys and composites

Aerospace and automotive applications driving demand for stronger, lighter, more thermally-stable materials. Framework reading: a field where the qualification timeline (often a decade) substantially dampens the framework's cost-trajectory logic. The curve-bending is real; the deployment is slow. Better positioned for patient capital than for venture timelines.

Functional ceramics, particularly for energy and sensing

Solid oxide fuel cells, piezoelectrics, advanced sensors, thermoelectrics. Framework reading: a category with a long history of slow progress and a recent inflection driven by the design-cost curve. Attack with caveats; the category-level bet is favourable but specific applications vary.

Metal-organic frameworks (MOFs) and porous materials

Direct-air capture, gas separation, hydrogen storage. Framework reading: an experimentally rich field with extensive computational design support. The cascade depends critically on whether MOF-based DAC reaches the cost targets needed to be competitive; the framework verdict is conditional on that downstream curve.

2D materials beyond graphene

Transition-metal dichalcogenides, hexagonal boron nitride, the broader 2D family. Framework reading: a field with substantial scientific accomplishment but limited commercial traction. The cascade has been slower than predicted; the framework would now rate further investment more cautiously than it did a decade ago.

Semiconductors — beyond silicon

Wide-bandgap semiconductors (SiC, GaN), photonic materials, neuromorphic substrates. Framework reading: the wide-bandgap category has been vindicated by EV power-electronics demand and is moving from research to deployment. Crowding is rising. The photonic-and-neuromorphic categories are earlier and more contested.

Biomaterials and biocompatible interfaces

Materials for medical implants, brain-computer interfaces, tissue engineering. Framework reading: an interdisciplinary field with relatively low crowding and large cascade value into healthcare. The qualification timeline is long but the regulatory pathway is at least established. Attack now selectively.

Self-driving laboratories as a category

Closed-loop autonomous experimental platforms. Framework reading: the highest-leverage infrastructure bet in the field today. The first labs to operationalise the design-test-learn loop at speed will produce data, methods and patents that compound. Funded at a small fraction of the leverage they offer.

Materials for additive manufacturing

3D-printable metals, polymers, composites, and the broader formulation problem. Framework reading: a category that has matured faster than expected and is now deployment-bound rather than technology-bound for most common cases. Specific niches (high-temperature alloys, biomedical, multi-material printing) remain interesting.

Critical-minerals substitution and recycling

Reducing reliance on cobalt, lithium, rare earths through substitution or improved recovery. Framework reading: a category that the framework rates highly because the strategic-and-economic motivation aligns with the technical opportunity. The cascade into electric mobility, electronics, magnets is large.

Computational materials design as a service

Companies offering ML-accelerated materials discovery to industrial customers. Framework reading: a category whose value proposition is real and whose competitive landscape is rapidly tilting. The foundation-model approach to chemistry (e.g., recent generative models for materials) may compress the standalone-discovery business in the way foundation-model APIs compress generic NLP companies. Attack with caveats; pick verticals where the data moat is real.

Polymers and circular-economy applications

Recyclable polymers, depolymerisation chemistries, bio-derived plastics. Framework reading: a category with strong regulatory tailwinds (EU plastics directive, US PFAS regulation) and meaningful technology progress. The cascade depends critically on regulatory enforcement; investment patterns track regulatory expectations imperfectly.

What the framework de-prioritises in materials

Most incremental improvements to mature commodity materials whose cost curves are already flat. Most "magic material" announcements without scaled synthesis pathways. Most academic publications optimising properties on materials whose synthesis is decades from feasibility. Most marginal polymer chemistries competing in saturated markets. Most "smart material" demonstrations without a clear application pull.

What the framework prioritises that the consensus does not

Catalysis for decarbonisation is undercrowded relative to its leverage and the public-funding system has been slow to recognise it.

Self-driving laboratories as infrastructure are dramatically underfunded relative to the productivity multiplier they offer to the rest of the field.

Open materials databases and characterisation infrastructure — the patient cataloguing tradition — receive a fraction of the public funding their cascade value justifies. The Materials Project's productivity-per-dollar is comparable to the Protein Data Bank's, and the latter is now widely understood as one of the highest-leverage scientific datasets of the past century.

Materials for low- and middle-income countries — affordable solar materials, cheaper water-purification membranes, simple agricultural improvements — are persistently underfunded by frameworks calibrated to wealthy-market demand.

How to use this guide

If you are a materials founder, identify the curve you are riding, the slower constraint that binds your customers, and the synthesis-to-property gap your technology either closes or relies on someone else closing.

If you are a public funder, the framework directs you toward catalysis-for-decarbonisation, self-driving-lab infrastructure, expanded open databases and the unfashionable cataloguing work that the field's elegant future will rest on.

If you are a researcher, ask whether your work is producing data the field will still want in twenty years (the patient cataloguing case) or demonstrations that close a question the field is currently arguing about (the demonstration case). Most of the highest-cited materials work of the past century has been one or the other; relatively little has been incremental optimisation.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Longevity

Longevity

This guide applies the Problem Timing framework to aging biology and human longevity. It is the most contested of the field guides because the field itself is contested. Reasonable, intelligent people disagree about whether human lifespan is genuinely extensible, by how much, and on what timeline. The framework cannot settle that disagreement; it can sharpen the allocation question conditional on a range of beliefs.

The compression. Aging research is in the just early enough phase of multiple sub-fields simultaneously. Several specific cellular and molecular pathways are now tractable in a way they were not five years ago. The field's core problem — translating mouse results into human outcomes — remains slow and mostly flat-curve. The interesting allocation decisions are about which sub-fields are on a steep curve and which are on the flat one.

The five curves that matter

Single-cell and spatial omics in human tissues. The cost per cell of single-cell RNA, ATAC, proteomics and spatial methods is collapsing fast. The dataset that this curve produces is the substrate for understanding which cellular changes drive aging in real tissues. The bottleneck is shifting from data acquisition to biological interpretation.

Biological-age biomarker assays. Methylation clocks, proteomic clocks, transcriptomic clocks, multi-omic indices. The cost-per-sample is collapsing; the validation work is improving steadily; the regulatory and clinical infrastructure is starting to form. The single most important measurement infrastructure for the field.

Animal-model throughput. Cost per intervention-effect-on-lifespan study in mice has fallen modestly but the throughput per dollar — the ability to test many interventions in parallel — has risen substantially. The Interventions Testing Program at the NIH is the canonical example. C. elegans and Drosophila throughput has increased even faster.

Cellular reprogramming and partial reprogramming techniques. A genuinely new curve, opened by the Yamanaka factors and now being explored in partial-reprogramming protocols. The cost per experiment is falling; the range of interventions is expanding; the safety and specificity questions remain hard.

Senescence biology assays and senolytics. The ability to identify, isolate and selectively kill senescent cells in real tissues. A curve that has moved from research to early clinical stages over the past decade.

These five curves drive most of the interesting allocation decisions in current longevity work. A project's why now should usually identify which combination of them it is exploiting and what specific cellular or molecular target it is committing to.

The slower constraints

Translating mouse to human. The hardest constraint in the field. Most candidate interventions that extend lifespan in mice fail in humans, for reasons that are not always obvious. The cost-trajectory of human-equivalent validation is essentially flat and will remain so. This single constraint dampens the framework's verdicts on the entire field.

Clinical trial design for aging itself. Aging is not a recognised clinical indication at the FDA or most regulators. Trials must use specific disease endpoints, which substantially constrains how interventions can be tested. The TAME trial framework (Targeting Aging with Metformin) has been one attempt to change this; the regulatory move has been slower than the science.

Cohort longitudinality. Useful aging studies need long observation. The cost of recruiting and retaining a cohort is largely flat, and the value of an aging cohort compounds with time in a way you cannot retroactively create.

Public sentiment and political resistance. Aging research has a more complicated public-sentiment problem than most biomedical fields. Religious, ethical and political opposition is real and shapes the funding environment, particularly for the more ambitious therapeutic approaches.

The supplement-industry overlap. A persistent problem for the field's credibility. Real, careful science is constantly distorted in public perception by the supplement-marketing layer that surrounds it. Public-funding decisions are made in this perception environment.

Specific framework readings

Methylation clocks and biological-age biomarkers

The infrastructure layer of the field. Framework reading: the highest-leverage current bet in longevity. Without robust biological-age measurement, every intervention bet is forced to use disease endpoints, which makes nothing testable on a useful timescale. The first labs to produce regulator-acceptable, clinically-deployable biomarker panels will reshape the field. Attack now.

Cellular senescence and senolytics

Compounds that selectively eliminate senescent cells. Framework reading: a category that has moved from research to early-stage clinical trials. Crowding is rising; specific targets (BCL-xL, FOXO4-DRI, navitoclax-like compounds) have varying levels of validation. The framework reading is attack with caveats — the category is correct but specific bets vary.

Cellular reprogramming and partial reprogramming

Yamanaka-factor-based interventions that aim to rejuvenate cells without inducing pluripotency. Framework reading: genuinely novel territory with real upside and substantial safety questions. The cascade if specific protocols work is enormous. The framework rates the category highly and is properly cautious on specific therapeutic claims.

Mitochondrial dysfunction interventions

NAD precursors (NMN, NR), mitophagy enhancers, mitochondrial transplantation. Framework reading: a category that has been over-promised on the consumer-supplement side and under-validated on the therapeutic side. The science is real; the specific therapeutic translation has been slower than the field hoped. Attack with caveats; differentiate between research-level work and the supplement market.

Proteostasis and autophagy enhancement

Interventions targeting the cell's protein-quality-control systems. Rapamycin and rapalogs are the canonical case. Framework reading: a category with substantial mechanistic support, real human safety data and a regulatory pathway that has been opened by mTOR inhibitor approvals in other indications. Among the more attractive bets in the field.

Stem-cell-niche restoration

Interventions that aim to restore the function of tissue-resident stem cells. Framework reading: the cascade if successful is large but the field has had multiple disappointments. The framework rates the category cautiously and the specific bets individually.

Inflammation and "inflammaging"

Chronic low-grade inflammation as a driver of age-related disease. Framework reading: a category with strong epidemiological support and weaker therapeutic clarity. Specific anti-inflammatory interventions (IL-6 blockade, NLRP3 inhibitors) have produced mixed results. The framework reading is attack with caveats; the underlying biology is real, the specific therapeutic claims need careful evaluation.

Glucose metabolism and metformin-class interventions

Metformin, GLP-1 agonists, the broader metabolic-health axis. Framework reading: GLP-1 agonists have already produced one of the most consequential pharmacological successes of recent years, with substantial effects beyond their initial diabetes and obesity indications. The framework reading is positive on the metabolic-health category as an aging-relevant area.

Polyamines and dietary interventions

Spermidine, fasting protocols, caloric restriction mimetics. Framework reading: the science is real; the consumer-product layer obscures the science; the therapeutic translation has been slow. A persistently undervalued research area and an oversold consumer area.

Gene therapies for monogenic age-related conditions

Gene therapies for specific age-related disorders (some retinal diseases, certain forms of cardiomyopathy, lysosomal storage disorders that present in adults). Framework reading: a high-leverage category with a clear translational pathway and the same manufacturing-cost problem that constrains gene therapy generally.

Heterochronic parabiosis and blood-derived interventions

Studies of young-vs-old blood factors and downstream therapeutic candidates. Framework reading: a category with extensive scientific interest and disappointing therapeutic translation so far. The mechanism is real; the specific agents identified to date have not lived up to the basic-science promise.

Long-running human aging cohorts

Framingham, the UK Biobank, the Nurses' Health Study, BIDMC, the Baltimore Longitudinal Study. Framework reading: a chronically underfunded research infrastructure category. Each cohort accumulates value with time and cannot be retroactively created. The framework would direct substantially more capital here than is currently flowing.

Anti-fibrotic interventions

Tissue fibrosis as a target, particularly in lungs, kidneys, liver and heart. Framework reading: a category with substantial therapeutic momentum and clear clinical endpoints. Less aging-pure than other categories but with measurable lifespan implications.

What the framework de-prioritises in longevity

Most consumer supplement brands with weak biomarker evidence. Most "personalised longevity" services that combine commodity testing with non-validated interventions. Most rejuvenation clinics with vague proprietary protocols. Most early-stage therapeutic claims that depend on extrapolations from C. elegans or Drosophila to humans without intervening rodent data. Most "biological age testing" services that do not contribute to validation of the underlying clocks.

What the framework prioritises that the consensus does not

Biological-age biomarker validation — the unfashionable, slow, methodological work that the rest of the field depends on — is dramatically undercapitalised relative to its leverage.

Long-running human cohorts receive a tiny share of the public funding their cascade value justifies. The cost of starting a new forty-year cohort cannot be deferred; the cost of analysing one that already exists is collapsing fast.

Public-good therapeutic platforms in cell and gene therapy that lower manufacturing cost would unlock most of the longevity-relevant gene-therapy categories simultaneously. The framework would direct substantial capital here that currently flows into individual therapeutic candidates.

Translational infrastructure — better large-animal models, better non-human primate aging studies, better clinical-trial design specifically for aging endpoints — is undersupplied and would benefit the entire field.

How to use this guide

If you are a longevity investor, identify which curve your bet rides and which slow constraint binds the customer outcome. Most underperformance in the field comes from underpricing the mouse-to-human translation gap, not from misjudging the underlying science.

If you are a researcher, the Hamming question with longevity tractability attached: which aging-biology problem has just become tractable that you would not have attempted five years ago, and is your current programme positioned to attack it?

If you are an interested non-scientist, the most useful posture is cautious optimism — the field is real, the cascade is potentially enormous, and the consumer-product layer is mostly noise. Pick research institutions to follow, not supplement brands.

— Siri Southwind

Read the framework · Current bets · Biotech and health · 50 possibilities

Mental health

Mental health

This guide applies the Problem Timing framework to mental health, psychiatric medicine and adjacent neurotechnology. Mental health is the most contested major medical field on this list — diagnostic categories are unstable, mechanism of action for many widely-prescribed drugs is poorly understood, replication failures are endemic, and the relationship between subjective experience and biological substrate remains open at a level that does not apply to most other medical domains. The framework does not resolve this. It does suggest where, within this contested field, the highest-leverage current allocations live.

The compression. After several decades in which the field made limited progress beyond the post-1990 generation of antidepressants and antipsychotics, several distinct cost-trajectories are now moving simultaneously: digital therapeutics, biomarker-led psychiatry, novel mechanisms (psychedelics, ketamine derivatives, GLP-1-adjacent effects), closed-loop neurostimulation, and AI-augmented care. The slow constraints (stigma, payer reimbursement, diagnostic uncertainty, clinical-trial design) remain genuinely slow, but the rate of relative change in the technology layer is the highest in the field's history.

The five curves that matter

Cost of digital cognitive-behavioural therapy at scale. AI-augmented digital therapeutics for depression, anxiety, insomnia, OCD, eating disorders. Cost-per-patient-month has fallen substantially; clinical-effect-sizes for the best-implemented programmes are comparable to or modestly below in-person therapy in well-controlled trials.

Cost of psychiatric biomarker assays. Methylation patterns, neuroimaging features, blood-based proteomic markers, genetic-risk scores, digital phenotyping signals. Cost-trajectory is steep; the bottleneck is shifting from acquisition to clinical validation.

Cost of neuroimaging research. fMRI, EEG, MEG, structural MRI. Per-scan cost has fallen modestly; per-useful-finding cost has fallen more substantially with better analytical methods. Population-scale neuroimaging studies are now plausible in a way they were not.

Cost of novel-mechanism drug development. Psychedelics (psilocybin, MDMA derivatives), ketamine and esketamine, novel monoamine modulators, the increasingly-credible GLP-1-and-related effects on mood. Cost-trajectory has been improving with both regulatory pathway clarification (in some jurisdictions) and the foundation-of-AI in drug-discovery.

Cost of closed-loop neurostimulation. Transcranial magnetic stimulation, deep-brain stimulation for treatment-resistant cases, transcranial direct-current stimulation, the various focused-ultrasound approaches. Per-treatment cost is falling; the targeting and personalisation infrastructure is improving.

The slower constraints

Diagnostic uncertainty. The most distinctive constraint of the field. The DSM categories are statistical groupings rather than mechanistic ones; two patients with identical diagnoses may have very different underlying biology and respond very differently to the same intervention. The cost-trajectory of better diagnosis is improving but slowly.

Stigma and patient behaviour. Despite substantial cultural change over the past decade, mental-health stigma remains a binding constraint on care-seeking, particularly in masculine-gendered and culturally-conservative contexts and for the more severe diagnoses. Cost-trajectory of changing this is essentially flat on useful timescales.

Payer and reimbursement environment. Insurance coverage for mental-health treatments is structurally inadequate in most jurisdictions, including those with universal healthcare. Parity legislation has improved the formal position substantially without solving the access problem.

Regulatory environment for novel treatments. Particularly for psychedelics, the regulatory pathway has matured substantially in some jurisdictions (Australia, parts of the US for specific indications, Switzerland, the Netherlands for research) and remains restrictive elsewhere. The cost-trajectory of the regulation is jurisdiction-specific.

Clinical-trial design. Mental-health trials face structural problems beyond those of most medical research: the placebo effect is large and variable, blinding is difficult for psychotherapeutic interventions, and the most clinically-meaningful endpoints (sustained quality-of-life improvement) are expensive to measure. Cost-trajectory of running a credible trial is essentially flat.

Specific framework readings

AI-augmented digital therapeutics for common conditions

Apps and platforms providing CBT, mindfulness, and adjacent therapies for depression, anxiety, insomnia and stress. Framework reading: the technology layer is mature; the deployment-and-reimbursement layer is the binding constraint. The category has been crowded since 2018; the surviving companies have either employer-and-payer relationships or specific clinical evidence. Attack with caveats for serious clinical operators; probably wait on most consumer-app plays competing with foundation-model alternatives.

AI-driven psychiatric assessment and diagnosis

Tools that augment clinician decision-making with structured assessment, longitudinal tracking, and pattern recognition. Framework reading: this is the most under-addressed category in current psychiatric AI work. Diagnostic uncertainty is the field's central problem; tools that improve the diagnostic precision compound across every downstream treatment decision. Attack now.

Biomarker-led psychiatric medicine

Methylation clocks, blood-based markers, neuroimaging-derived phenotypes for depression, schizophrenia, bipolar disorder, autism. Framework reading: the science is at the inflection point where category-defining work is genuinely possible. Attack now for the strongest scientific groups; attack with caveats on commercial translation, which remains harder than the underlying science.

Psilocybin and MDMA-derived therapeutics

The psychedelics-as-medicine category. Framework reading: the regulatory pathway is materially clearer than five years ago; clinical evidence is accumulating; the manufacturing-and-protocol infrastructure is forming. The honest reading is that the category will exist and will be substantial; the question is which companies survive the regulatory and commercial transitions. Attack with caveats; pick the indication and the regulatory regime carefully.

Ketamine and esketamine for treatment-resistant depression

A category that has moved from research to deployment. Framework reading: the underlying clinical effect is well-established. The deployment-and-pricing infrastructure is maturing. Attack with caveats for differentiated formulations or delivery modes; the basic category is consolidating.

GLP-1-related effects on mood and cognition

A category that did not exist three years ago and now plausibly does. Semaglutide, tirzepatide and adjacent compounds appear to have effects on mood, addiction and cognition that go beyond their direct metabolic mechanisms. Framework reading: a genuine open question with substantial upside. Attack now for serious investigators; the cascade if these effects are real and durable is enormous.

Closed-loop neurostimulation for refractory cases

TMS, deep-brain stimulation, focused ultrasound, transcranial direct-current stimulation. Framework reading: the technology has matured enough to support real clinical work. Most consumer-facing applications remain probably wait; clinical-grade applications for specific indications (treatment-resistant depression, OCD, certain seizure disorders) are attack now selectively for serious operators.

AI-driven crisis-line and on-demand support

Tools deploying AI as the first-line response to suicidal-ideation, crisis-mental-health, and adjacent acute-need cases. Framework reading: a category with substantial dual-use risk (see Dual-use & catastrophic risk) where the framework's normal speed-and-cascade arguments need to be balanced against the cost of being wrong on individual users. Attack with caveats for clinical-grade operators with serious safety infrastructure; probably skip most consumer-facing crisis apps without it.

Workplace mental-health programmes

Employer-sponsored mental-health benefits, EAP services, and the various platform players. Framework reading: a real and growing category whose unit economics depend on demonstrated clinical outcomes rather than on engagement metrics. Attack with caveats; the consolidation around a smaller number of credible operators is well underway.

Mental-health for low- and middle-income countries

A field with chronically inadequate clinical capacity globally. Framework reading: AI-augmented tools designed for community-health-worker deployment are one of the highest-leverage applications of AI to global mental health. The unit economics work because the alternative (no treatment available) is so much worse than the alternative in wealthy markets. Funded mostly by foundations and global health bodies; undercapitalised relative to the leverage. Attack now.

Children and adolescent mental health

A field whose demand has risen sharply over the past decade, particularly for anxiety, depression and eating disorders. Framework reading: the technology adoption layer is more permissive (younger patients are digitally native) but the regulatory and parental-consent layer is more restrictive. Attack with caveats; the operators who navigate the consent and safeguarding architecture carefully are positioned to compound.

Eating-disorder and addiction treatment

Categories with substantial unmet need and challenging treatment economics. Framework reading: AI-augmented intensive outpatient programmes, particularly for binge-eating-disorder and stimulant-use-disorder, are categories where the operators with serious clinical evidence are positioned well. Attack with caveats.

Long-term cohort studies in mental health

Population-scale longitudinal studies with regular assessment of mental-health status, biomarkers, and life-outcome data. Framework reading: chronically underfunded relative to the value of the data. The cohort-effect-tracking studies that the next generation of psychiatric medicine will need are largely not yet running. The framework's normal closing-window argument applies: cohorts started in 2026 produce twenty-year data in 2046; cohorts not started do not.

What the framework de-prioritises in mental health

Most consumer wellness apps without clinical evidence. Most "wearable mental-health monitor" plays without integrated clinical pathways. Most generic mindfulness apps competing with first-party offerings from major platforms. Most AI-companion apps positioned as mental-health products without clinical safety infrastructure. Most workplace-wellness programmes optimising engagement metrics rather than clinical outcomes. Most neuroscience-derived consumer cognitive-enhancement products with weak evidence base.

What the framework prioritises that the consensus does not

AI-augmented psychiatric diagnosis and assessment is dramatically undercapitalised relative to its leverage on every downstream treatment decision. Diagnostic uncertainty is the field's central problem, and the tools that address it compound everywhere.

Long-running mental-health cohorts are receiving a small fraction of the public funding their value justifies. The work that the next generation of psychiatric medicine will need cannot be done retrospectively.

Mental-health interventions for low- and middle-income countries deployed via AI-augmented community health workers are one of the highest-leverage global health interventions currently available, and one of the most underfunded.

Public-good infrastructure for clinical-trial replication in mental health is essentially absent. The replication crisis in psychology and psychiatry is severe and known, and the institutional infrastructure to address it does not exist at the scale required.

Tools designed for clinicians rather than for patients — the equivalent of teacher-leverage tools — receive a tiny share of mental-health-technology funding relative to consumer-facing tools, despite being where most of the deployable leverage actually lives.

How to use this guide

If you are a mental-health-technology founder, the most useful question is whether your bet is at the diagnostic-precision layer, the treatment-delivery layer, or the system-of-care layer. Each has a different binding constraint; mixing them produces companies that are mediocre at all three.

If you are an investor, the framework directs you toward clinical-grade operators with serious evidence, picks-and-shovels diagnostic infrastructure, and the picks-and-shovels work that the foundation-model providers and wellness-app market does not address.

If you are a clinician or healthcare-system leader, the framework's most useful contribution is permission to be sceptical of the wellness-app market while taking the diagnostic-and-clinical AI tools seriously. The latter category is real and useful; the former is mostly noise.

If you are a public funder, the framework directs you toward cohort studies, replication infrastructure, and global-mental-health applications that no commercial actor will fund alone. These are unfashionable and undervalued.

— Siri Southwind

Read the framework · Biotech and health · Longevity · Current bets · 50 possibilities

Space

Space

This guide applies the Problem Timing framework to space. The field is unusual because the relevant cost curves were essentially flat for forty years, then broke sharply in the past decade, and the consequences are still propagating. Most of the institutional thinking about space — including most of what governments and incumbents currently fund — is calibrated to the era when those curves were flat. The framework's job in space is partly to identify projects whose institutional logic is from 1995 and whose cost environment is from 2025.

The five curves that matter

Cost per kilogram to low Earth orbit. Falcon 9 and increasingly Falcon Heavy have collapsed launch costs by roughly an order of magnitude over the past decade. Starship, if its operational cadence reaches design intent, will produce another order of magnitude. The single most important curve in the field; almost every downstream decision is sensitive to it.

Cost of small-satellite buses. Standardised platforms (cubesats, small commercial buses) have collapsed satellite costs in parallel with launch costs. The combined effect — cheap launch, cheap satellites — is what enables megaconstellations, low-cost Earth observation and the broader new-space economy.

Earth-observation data cost per scene. Driven down by the rise of small-sat constellations (Planet, Maxar's smaller buses, multiple SAR providers, hyperspectral entrants). The remaining bottleneck is data interpretation more than data acquisition.

On-orbit service and manufacturing capability. A genuinely new curve, still in the early-installation phase. Refuelling, debris removal, in-space assembly, on-orbit manufacturing of pharmaceuticals or fibre optics. The cost-trajectory is not yet established but the underlying inputs (cheap launch, cheap satellites, robotics) all favour it.

Lunar-and-cislunar access. Driven by Artemis, by China's lunar programme, by Blue Origin's New Glenn, by the various commercial lunar landers. A specific curve worth tracking separately because it interacts with national-security and political-prestige dimensions in ways the other curves do not.

The slower constraints

Spectrum allocation and orbital coordination. Largely flat-curve. The Wireless Telegraphy Acts of 100 years ago became the ITU process and that process moves on its own clock. The constraint is the binding limit on several parts of the megaconstellation industry.

Regulatory environment for novel applications. Earth observation with AI has run into resolution and dissemination restrictions; on-orbit service has run into licensing ambiguities; lunar resource extraction has unsettled treaty implications. The regulatory side moves substantially slower than the technology side.

Customer base and demand. The supply side of space has expanded faster than the demand side. Many small-satellite businesses fail not because their technology is wrong but because the customer pull does not yet exist at the price point they need.

Reliability and qualification. Space hardware lives or dies on reliability and the qualification timeline is not on a fast curve. New components, new propulsion, new manufacturing methods all face long, expensive flight-heritage requirements before serious customers will commit.

Specific framework readings

SpaceX's Starship development

The bet that if cost-per-kilogram to orbit drops another order of magnitude, the consequences cascade across every other space sub-field. Framework reading: the most consequential single technology programme in space, with cascade value that is both enormous and difficult to price. Already discussed at length; the framework reading on Starship itself is strongly attack now, on its cascade implications attack now selectively across the rest of the field.

NASA's SLS / Artemis launcher programme

Already discussed in Current bets and 50 current likely-dumb. Framework reading: a programme whose cost structure is from a different era and whose continuation is institutional momentum.

The commercial Earth-observation industry

Planet, Maxar, Capella, Iceye, multiple new entrants. Framework reading: the supply side is now ahead of the demand side. Crowding is high; the marginal team's contribution depends on data interpretation, distribution into specific industries and regulatory navigation. Attack with caveats; pick verticals where the customer-pull is real (defence, agriculture-at-scale, infrastructure, climate).

Megaconstellation broadband

Starlink dominant; OneWeb, Kuiper, several Chinese constellations following. Framework reading: a category where the leader has built a deep moat that is difficult for followers to cross. Not impossible — the satellite-to-cell market is genuinely contested — but the framework reading on follower constellations is materially weaker than on the leader.

On-orbit service, debris removal, in-space refuelling

A category with several well-funded entrants and a small but growing customer base. Framework reading: a just early enough bet on the category; specific business models within it remain contested. The cascade value if the category matures (extending satellite lifetimes, enabling new mission profiles) is large.

Lunar landers and lunar-surface operations

Intuitive Machines, Astrobotic, Firefly, Blue Origin, multiple international entrants. Framework reading: a category with strong public-funding tailwinds (Artemis CLPS contracts, ESA programmes) and uncertain commercial pull beyond the public customer. Framework reading is attack with caveats and acknowledges the high probability that public customers will remain dominant for the next decade.

Asteroid resource extraction

A category that has been just early for thirty years. Framework reading: still early. The cost-trajectory of access has improved; the cost-trajectory of extraction has not changed materially. Worth a small portfolio bet; not a confident sector call.

Space-based solar power

A category with periodic re-attention. Framework reading: still on the wrong side of the framework. The launch-cost reduction does not yet bridge the gap to terrestrial alternatives, and the regulatory and political environment for orbital power transmission is not ready. Small research programmes are defensible; large deployment bets are not.

In-space manufacturing of pharmaceuticals or specialty materials

Microgravity manufacturing of high-value, low-mass products. Varda Space and others. Framework reading: a category whose feasibility is now plausibly demonstrated and whose unit economics depend on continued cost-reduction in launch and re-entry. Attack with caveats; specific applications vary.

Space-domain awareness and tracking

Cataloguing and tracking objects in orbit, increasingly important as constellations multiply. Framework reading: a category with strong public-customer demand (defence, civil space-traffic management) and improving technology. Crowding is rising. Framework reading is attack now selectively for serious teams.

Cislunar infrastructure

Lunar communications relays, lunar Gateway, lunar fuel depots. Framework reading: a category whose feasibility depends on whether sustained human-and-robotic activity in cislunar space materialises. The commercial pull is uncertain; the public-customer pull is real but fragile to political change. Attack with caveats.

Defence and dual-use space applications

Surveillance, tactical responsive launch, secure communications, electronic-warfare countermeasures. Framework reading: a category benefiting from substantially increased defence spending across multiple jurisdictions. Crowding is rising. The framework reading is favourable on specific technologies (responsive launch, low-cost defensive ISR, hardened communications) and cautious on others.

Mars settlement architecture

Planning and prototyping for sustained human presence on Mars. Framework reading: a moonshot in the framework's strict sense. The cascade if any of it works is enormous; the probability of any of it working in the specific forms currently being prototyped is low. Properly portfolio-shaped; almost no individual bet within it is defensible on standard metrics.

Suborbital space tourism

Virgin Galactic, Blue Origin's New Shepard. Framework reading: a category with weak unit economics and limited cascade value. Framework reading is probably skip.

Deep-space science missions

NASA, ESA and JAXA flagship missions to outer planets, comets, asteroids. Framework reading: a category whose institutional logic is sound and whose cost-trajectory has not improved. The framework rates these as classic patient-cataloguing bets — high cumulative scientific value, slow individual payoffs, irreplaceable when funded and irrecoverable when cancelled.

What the framework de-prioritises in space

Most follower megaconstellations without a clear differentiation. Most suborbital tourism. Most "human-mission-to-anywhere-by-2030" claims that require unbuilt rockets and uncommitted customers. Most space-based solar power as a deployment thesis. Most resource-extraction businesses without a near-term commercial pull. Most on-prem ground-segment plays competing with the cloud incumbents now entering the market.

What the framework prioritises that the consensus does not

Earth-observation data interpretation — turning the data flood into specific industry workflows — receives less attention than the data-acquisition layer despite being where most of the value will be captured.

Space-traffic management and debris mitigation infrastructure is dramatically undercapitalised relative to the externalities it will eventually be required to handle.

Robotics for space — the manipulation, repair and assembly capabilities required for on-orbit service — sits at the intersection of two fields whose curves are both moving fast and is undercrowded relative to that combined leverage.

Long-lived science missions (Voyager-shape commitments) receive a fraction of the public funding their cumulative scientific value justifies.

Defence-relevant responsive launch and low-cost hardened satellite buses are likely to be undersupplied as defence spending expands without equivalent expansion of supplier capacity.

How to use this guide

If you are a space founder, identify which of the five curves your business depends on, which slower constraint binds your customer, and how your business ages if launch costs drop another order of magnitude over the next decade. Many current space businesses are quietly betting on Starship's cadence; few have made that bet explicit.

If you are a space investor, the framework directs you toward data-interpretation-and-applications layer over hardware-and-launch (where the leaders have moats and the followers do not), and toward picks-and-shovels infrastructure (space-traffic management, robotics, qualification services) over headline missions.

If you are a public funder, the framework directs you toward space-traffic management, debris mitigation, deep-space science (the patient cataloguing case), and the underfunded categories above.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Finance

Finance

This guide applies the Problem Timing framework to finance and financial technology. Finance is unusual because it is the most reflexive field on this list — financial allocation directly shapes which problems get attacked elsewhere — and because its slow constraints (regulation, trust, network effects, payment rails) are unusually stable while several of its underlying technical curves are unusually steep.

The compression. Most of the value being created in finance over the next decade will come from automation of currently-expensive verification, decision-making and routing tasks. Most of the value being claimed will come from extrapolating those same automations to ambitious new asset classes that the framework reads as crowded or premature. The interesting bets live at the intersection: verification infrastructure, regulated-vertical AI, and the specific corners of tokenisation that survive the regulatory and trust filters.

The five curves that matter

Cost per AI-augmented financial decision. The cost of producing a credit decision, a fraud assessment, a compliance review, a research note, a portfolio rebalance. Has fallen by orders of magnitude in the past three years and continues to fall. The dominant curve in current financial-technology economics.

Cost of latency. The cost of executing a trade, a payment, a settlement. Continuing to fall, with the most consequential gains now in cross-border payments rather than in the already-saturated millisecond-latency trading domain.

Alternative-data acquisition cost. Satellite imagery for commodity flows, anonymised payment data, web-scraped pricing, sentiment from social platforms, IoT telemetry. The cost-trajectory of useful alternative data has improved dramatically; the bottleneck is shifting to integration and lawful provenance.

Tokenisation infrastructure cost. The cost of issuing, settling and managing a tokenised asset on programmable rails. Steep curve; the category has matured substantially since the 2021 hype cycle, with serious institutional infrastructure now in place for a narrower set of asset classes.

Compliance cost per customer. Know-your-customer, anti-money-laundering, sanctions screening, ongoing monitoring. The cost-trajectory has improved with AI augmentation but is bounded below by regulatory expectations that are themselves rising. The asymmetry between rising regulatory expectation and falling per-task cost is one of the more interesting features of current financial allocation.

The slower constraints

Regulation and licensing. The slowest-moving of the major constraints, by design. Cross-border friction, fragmented regimes, and the cost of obtaining and maintaining financial licences are the dominant business problem for most fintech bets. Regulation is moving — sometimes fast, occasionally backwards — but the cost-trajectory of responding to regulation is essentially flat.

Trust and counterparty risk. Hard-won, easily lost. Several FTX-shape failures over the past five years have made institutional money substantially more risk-averse on novel asset classes than the underlying technology would justify on its own.

Network effects in payments. The dominant payment networks and rails (Visa, Mastercard, the major bank-led systems, the emerging instant-payment systems by jurisdiction) are extremely sticky. Cost-trajectory of building a new network from scratch is unfavourable; cost-trajectory of building on top of existing networks is much better.

Distribution. Reaching customers in regulated markets is expensive in a way it is not in pure-software domains. Customer-acquisition cost in financial services has been rising for a decade and is rising faster as digital-channel saturation increases.

Data residency and sovereignty. Particularly in cross-border financial work. Cost of compliant infrastructure is rising; cost of non-compliant infrastructure is rising faster as enforcement activity grows.

Specific framework readings

AI-augmented credit and underwriting

Categories ranging from consumer-lending decision engines to small-business credit to specialty insurance underwriting. Framework reading: the AI capability is real and the cost-trajectory favourable. The moat is in the proprietary data — historical performance, repayment records, alternative signals — that allow the model to outperform incumbent decisions. Founders without that data face squeeze from incumbents who do. Attack now selectively for those with the data; probably wait otherwise.

AI-augmented compliance, KYC and AML

A boring, high-leverage category. Framework reading: regulatory expectations are rising faster than human compliance capacity can keep up. AI augmentation is the only credible path. The moat is in the workflow integration and audit-defensibility, not in the model itself. Attack now with conviction; this is one of the most under-priced categories in fintech today.

Quantitative trading and AI-driven asset management

The historical core of fintech. Framework reading: the high-frequency end is saturated and dominated by a handful of players whose marginal alpha is diminishing. The medium-horizon end (multi-day to multi-week strategies on alternative data) is more interesting and less crowded. Attack with caveats; the marginal team's edge depends on a specific data or methodological advantage that is hard to sustain.

Embedded finance and banking-as-a-service

The category of providing financial services as APIs to non-financial companies. Framework reading: the category has matured and consolidated. The early-mover plays have largely won; the marginal entrant faces difficult unit economics in a regulated space with rising compliance cost. Probably wait unless you have a specific vertical or geographic angle.

Tokenisation of real-world assets

Treasuries, money-market funds, private credit, real estate, art, commodities. Framework reading: the institutional adoption is real and accelerating, particularly for highly-liquid fixed-income instruments. The retail end is contested and regulatorily fragile. Attack now selectively for institutional-grade infrastructure; attack with caveats for retail-facing tokenisation; probably wait on most NFT-derivative plays.

Stablecoins and dollar-denominated digital infrastructure

A category that has moved from speculative to genuinely strategic over five years. Framework reading: the regulatory architecture is converging in 2024–2026 and several of the surviving operators (Circle, the bank-issued players, the emerging non-US versions) are positioned to compound. The plumbing layer (settlement, on-ramps, programmable conditional payments) is undervalued relative to the headline tokens. Attack now on the plumbing; attack with caveats on the issuer side.

Central bank digital currencies (retail)

Already covered in Current bets and 50 current likely-dumb. Framework reading: most retail-CBDC pilots are probably wait or probably won't ship at scale. Wholesale-CBDC and inter-bank settlement infrastructure is more defensible.

Cross-border payments

A category whose cost-trajectory has improved meaningfully with the new rails (RTP networks, Wise's non-bank model, the various corridor-specific players). Framework reading: still a real opportunity, particularly in underserved corridors. The moat is in correspondent-bank relationships and licensing footprint, not in technology per se. Attack with caveats for differentiated geographies.

Climate finance and transition capital

Categories ranging from green bonds and transition financing to specialty climate-risk insurance to carbon-credit infrastructure. Framework reading: the underlying climate cost-trajectories are real (see Climate & energy); the financial-instrument layer is consolidating around a small number of credible providers. Attack now selectively for serious credit and insurance infrastructure; probably wait on most retail green-investment products.

Insurance technology

Reinsurance modelling, parametric insurance, embedded insurance, AI-augmented claims processing. Framework reading: insurance is unusually data-rich and unusually slow-moving. The marginal team's edge is in re-pricing risk in domains where the legacy underwriting models are losing accuracy (climate-related risk, cyber risk, AI-related risk). Attack now in the re-pricing categories; probably wait on most distribution-layer plays.

Robo-advice and retail wealth management

A category that has matured into a duopoly-ish structure dominated by the incumbents (Fidelity, Schwab, Vanguard) and the largest neo-players (Wealthfront, Betterment, the European equivalents). Framework reading: the marginal entrant faces difficult unit economics. AI augmentation creates value but not differentiation. Probably wait unless serving a genuinely underbanked geography.

Prediction markets

Polymarket, Manifold, Kalshi, the various smaller and offshore players. Framework reading: a category whose policy-relevant value is real and substantially under-priced, particularly as a tool for the framework's revision protocol. The legal and regulatory environment is improving in some jurisdictions and deteriorating in others. Attack with caveats; pick the regulatory regime carefully.

AI-driven financial research and equity analysis

A category competing with both the incumbent sell-side research desks and the foundation-model providers themselves. Framework reading: the standalone model is being squeezed. Probably wait unless you have proprietary research data or a regulated workflow advantage.

Decentralised finance (DeFi) protocols at scale

The category that survived the 2022 collapse with reduced TVL and consolidated user bases. Framework reading: institutional adoption is real for the most boring use cases (stablecoin settlement, basic lending, treasury management); the more exotic protocols continue to under-deliver on the original disintermediation thesis. Attack with caveats on the institutional-adjacent protocols; probably wait on most retail DeFi.

What the framework de-prioritises in finance

Most generic neobanks in saturated markets. Most retail-facing crypto products without clear regulatory positioning. Most "build a new payment rail" plays without a specific corridor or regulatory advantage. Most consumer financial-planning apps competing with first-party offerings from major brokerages. Most personal-finance content businesses competing with foundation-model-native alternatives.

What the framework prioritises that the consensus does not

AI-augmented compliance and verification infrastructure is dramatically undercapitalised relative to its strategic importance. As regulation tightens and AI deployment in financial services accelerates, the verification layer is the bottleneck.

Tokenisation plumbing for institutional fixed-income is a real and growing category that is mostly being attacked by infrastructure incumbents rather than well-positioned new entrants.

Climate-risk re-pricing infrastructure is mispriced. The legacy actuarial models are demonstrably losing accuracy as climate-related events change the underlying distributions; whoever rebuilds the pricing apparatus for a warmer-and-volatile world owns a substantial position.

Prediction-market infrastructure for institutional and policy use is undersupplied as both a technology category and a tool for better allocation decisions.

Cross-border payment corridors in underserved geographies — particularly Africa, parts of South-East Asia, parts of Latin America — pay back faster than the consensus credits.

How to use this guide

If you are a fintech founder, the most useful exercise is to identify the slow constraint that protects your business. If your moat is "AI-augmented X" alone, the moat is dissolving as AI commoditises. The defensible position is at the intersection of AI capability, regulatory licensing, proprietary data, and a specific customer relationship the incumbents do not have.

If you are a financial-services investor, the framework directs you away from AI-wrapper plays in saturated verticals and toward the picks-and-shovels infrastructure (compliance, verification, tokenisation plumbing, climate-risk repricing) that everyone else needs. These are usually less photogenic and more defensible.

If you are a public-sector or central-bank reader, the framework's most useful contribution is on the what to regulate question rather than on the what to build question. Better-targeted regulation of attacker-favoured financial categories (see Dual-use & catastrophic risk) and lighter-touch regulation of defender-favoured ones produces better aggregate outcomes than uniform regulatory pressure.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Agriculture

Agriculture

This guide applies the Problem Timing framework to agriculture, food production and the broader land-use system. Agriculture is the most under-attended major sector in the framework's audience — most of the people thinking seriously about Problem Timing live in software, biotech and finance — but it is one of the highest-leverage domains the framework can be applied to. Roughly forty per cent of habitable land is used for food, food production accounts for a quarter of greenhouse-gas emissions, and the cost-trajectories of several core agricultural inputs are now moving in ways that have not been true in living memory.

The compression. Several technologies that were promised for decades and disappointed are now actually working: precision-agriculture sensors, gene-edited crops, alternative proteins at industrial scale, agricultural robotics. The slow constraints (weather, soil biology, regulation, farmer adoption) remain genuinely slow, which is why most allocators dismiss the field. The framework rates this dismissal as a mispricing.

The five curves that matter

Cost of phenotyping at scale. The cost of measuring a plant's traits — yield, drought tolerance, disease resistance, nutritional profile — across thousands of varieties in real field conditions. Has fallen by orders of magnitude with drone imaging, automated growth-chamber systems, and field-deployed sensors. The bottleneck for crop development has moved from measurement to interpretation.

Cost of gene editing in crops. CRISPR and successor techniques applied to plant genomes. Cost-trajectory has been steep over the past decade and continues to fall. The regulatory pathway for gene-edited (rather than traditionally GM) crops is also maturing in several major jurisdictions.

Cost of agricultural robotics per task. Weed control, harvesting, monitoring, micro-precision spraying, autonomous tractors. Cost-trajectory is favourable; the bottleneck is shifting from hardware cost to system integration and farm-specific deployment.

Cost of precision sensing for soil and water. Soil-microbiome sequencing, satellite-derived soil health metrics, in-field water sensors, drone-based crop-health imaging. The cost-per-data-point has collapsed; the bottleneck is the interpretive layer that turns data into farm decisions.

Cost of alternative-protein production. Cultivated meat, precision-fermentation proteins, plant-based formulations. The cost-trajectory has been slower than the optimistic 2020-era forecasts but real; specific categories (precision-fermentation dairy proteins, fungal proteins) are approaching cost-parity in their best-case applications.

The slower constraints

Weather and climate. The dominant cost-trajectory in agriculture is moving in the wrong direction for substantial fractions of currently-productive land. Climate adaptation is now the binding constraint for many specific crops in many specific regions. The framework's normal cost-decline assumption breaks here.

Soil biology. Soil health, microbial communities, organic matter, nutrient cycling. These are slow-moving systems whose degradation is genuinely hard to reverse. Cost-trajectory of soil-rebuilding interventions is slow.

Water availability. Aquifer depletion, snowpack decline, increasing drought variance. A binding constraint that does not respond to AI cost-trajectories on any useful timescale.

Regulatory environment. Particularly for gene-edited crops, novel proteins, novel pesticides and animal welfare. Highly jurisdiction-dependent and slow to update. EU treatment of gene editing is improving; treatment in much of Asia and Africa is variable; treatment in North America is broadly permissive.

Farmer adoption and trust. A genuinely slow constraint. Farmers are individually rational risk-managers operating on multi-year cycles; adoption of new technologies typically takes a generation. The cost-trajectory of getting farmers to use a new technology is essentially flat; the cost-trajectory of building the technology is not.

Supply-chain inertia. Distribution networks, processor relationships, retail buyer specifications, consumer preferences. These move on multi-decade timescales for major commodities.

Specific framework readings

Precision agriculture and farm-management software

Sensor networks, drone imaging, AI-driven irrigation and nutrient management, autonomous-tractor software. Framework reading: the technology layer is maturing; the deployment-economics layer is the binding constraint. Crowded with vendors; the moat is in the integration with existing equipment and the specific crop-and-region focus. Attack now selectively for narrow vertical positions; probably wait on horizontal precision-ag platforms.

Gene-edited crops for climate resilience

Drought-tolerant maize, disease-resistant wheat, salinity-tolerant rice, pest-resistant fruits. Framework reading: the science is real, the cost-trajectory is favourable, and the regulatory pathway is opening in major jurisdictions. The cascade value as climate stress on agriculture intensifies is large. Attack now with conviction in jurisdictions with permissive gene-editing regulation.

Cultivated meat

Has progressed more slowly than the optimistic 2020 forecasts. Framework reading: the cost-trajectory remains favourable but the specific path to price parity for commodity meat is longer than the consensus assumed. The interesting segment is high-value applications (cultured fish for sushi-grade products, novel proteins for pet food, foie gras) where unit economics work earlier. Attack with caveats; pick the highest-margin application.

Precision-fermentation proteins

Animal-free dairy proteins, animal-free egg proteins, novel functional proteins for food applications. Framework reading: a category that has progressed faster than cultivated meat and is approaching cost-parity in specific applications. The regulatory pathway is mostly clear in major jurisdictions. Attack now selectively; the leaders have established positions but the category is still expanding.

Plant-based meat and dairy alternatives

A category that experienced a 2018–2021 hype cycle and a 2022–2024 consolidation. Framework reading: the easy wins (basic burgers, basic milks) are taken. The marginal product needs to compete on taste and texture against improving alternatives. Probably wait on commodity plant-based; attack with caveats on specific high-value formulations.

Agricultural robotics for specific tasks

Weed control (Carbon Robotics, Naio, the Bonsai-like systems), strawberry harvesting, lettuce thinning, vineyard management, autonomous spraying. Framework reading: a real and growing category with clear customer pull from labour-cost rises. The moat is in the specific crop-and-task focus. Attack now in tasks with clear unit economics; probably wait on general-purpose agricultural humanoids.

Vertical farming

A category that has had an extremely difficult 2022–2024 with several high-profile failures. Framework reading: the underlying unit economics for most crops do not work at current energy prices and capital costs. The interesting segments are extremely high-value crops (specialty leafy greens, cannabis, some berries) and pharmaceutical-adjacent applications. Probably wait on commodity vertical farming; attack with caveats in specific high-value verticals.

Soil-microbiome interventions

Microbial inoculants for crops, soil-microbiome sequencing as a diagnostic, biological alternatives to synthetic fertilisers. Framework reading: the category has under-delivered on its 2018-era promises and over-delivered on niche applications. Attack with caveats; the science is real but the field-trial-to-commercial-product translation has been slow.

Climate-resilient seed varieties

Beyond gene editing, the more conventional plant-breeding approaches to drought, heat and salinity tolerance. Framework reading: a quietly important category that is well-funded by the public-sector breeding programmes (CGIAR system, national agricultural research) and undercapitalised in private capital. Attack now for public funders; the private-capital case is harder because returns are diffuse.

Carbon farming and soil-carbon markets

Practices that aim to sequester carbon in agricultural soils, plus the carbon-credit markets that monetise them. Framework reading: the underlying science is contested and verification is genuinely hard. Several major buyers have been burned by weak verification. The opportunity is in better measurement infrastructure rather than in more programmes. Attack now on measurement; probably wait on most farmer-facing programmes.

Aquaculture and alternative aquatic protein

Land-based salmon farms, alternative aquafeed (insect-based, algal, single-cell-protein), kelp farming. Framework reading: a category with strong fundamentals (rising fish demand, declining wild stocks) and weaker individual unit economics. Attack with caveats; pick the species and geography carefully.

Agricultural data and decision-support platforms

Aggregated farm-data platforms, AI-driven planting and treatment recommendations, satellite-derived insights. Framework reading: a crowded category with weak moats. The underlying data is increasingly commoditised; the specific decision-support layer is being absorbed by foundation-model offerings. Probably wait unless you have a specific farmer-relationship advantage.

What the framework de-prioritises in agriculture

Most consumer-facing food-tech brands without proprietary process or supply chain. Most generic precision-ag software platforms. Most carbon-offset programmes with weak verification. Most vertical-farming plays in commodity crops. Most "AI for farmers" wrappers without integrated equipment access. Most direct-to-consumer specialty-food brands relying on social-media customer acquisition.

What the framework prioritises that the consensus does not

Climate-resilient seed varieties — both gene-edited and conventional — are dramatically underfunded relative to the decade-scale cost they prevent. Public funders should be doing much more here than they are.

Soil-carbon and methane measurement infrastructure is undersupplied as both a public-good and a commercial offering. The emerging carbon-and-credit markets cannot function without robust verification, and verification is the binding constraint.

Agricultural robotics for specific tasks in unsexy crops (almonds, citrus, brassicas, root vegetables) is undercapitalised relative to its leverage on labour-cost trends.

Pollinator and biodiversity infrastructure — specifically the data-collection and monitoring layer — is profoundly underfunded relative to the systemic risk that pollinator decline poses to agriculture.

Agricultural research for low- and middle-income countries is one of the highest-leverage public-funding categories the framework currently identifies. The cost-per-life-improved is favourable; the cascade as climate stress intensifies is enormous.

How to use this guide

If you are an agricultural-technology founder, the most useful question is what slow constraint your technology helps the farmer absorb, not what new technology you can deploy. The technology curve is favourable; the deployment curve is not. Solving the deployment problem is more valuable than incrementally improving the technology.

If you are an agricultural investor, the framework directs you toward the picks-and-shovels infrastructure (gene editing, robotics for specific tasks, measurement and verification) and away from consumer-facing food-brand plays. Climate-resilient varieties and adaptive agriculture are the categories most likely to compound across the decade.

If you are a public funder, the framework directs you toward the breeding programmes, the soil-and-climate measurement infrastructure, and the agricultural research for low- and middle-income countries that no commercial actor will fund alone. These are unfashionable and undervalued.

— Siri Southwind

Read the framework · Climate and energy · Materials · Current bets · 50 possibilities

Education

Education

This guide applies the Problem Timing framework to education. Education is the field where the framework's central technology — AI capability rising rapidly, cost of personalised cognitive labour collapsing — collides most sharply with its slowest-moving institutional constraints. The result is that the technological opportunity in education is among the largest available today, and the deployable opportunity is materially smaller. Most allocators on either side of that gap are mispricing it.

The compression. The cost of a competent personalised tutor has collapsed. The cost of changing how schools, universities and credentialing systems work has not. The interesting bets live at the boundary: tools that radically increase teacher leverage rather than replace teachers, tools that work outside the formal credentialing system, tools that sit in the cracks where institutional resistance is weakest.

The five curves that matter

Cost of personalised AI tutoring per student-hour. The cost of a competent foundation-model-driven tutoring interaction is now well below a dollar per student-hour. Continues to fall. The dominant curve in current education-technology economics.

Cost of content generation. The cost of producing high-quality educational content — explanations, problem sets, illustrative examples, multimedia material — has collapsed. The bottleneck has shifted from production to curation and pedagogical sequencing.

Cost of automated assessment. Marking, feedback, formative-assessment generation, plagiarism and originality detection. Cost-trajectory is favourable but verification remains the bottleneck — automated assessment systems make errors that humans must catch.

Cost of language learning. A specifically interesting sub-curve. The cost of getting a learner from beginner to fluent in a new language has fallen dramatically with conversational AI. The genuine advance over the previous generation of apps is real and underappreciated.

Cost of skills-based credentialing alternatives. Bootcamps, micro-credentials, project-based portfolios, employer-recognised certifications outside traditional academia. Cost-trajectory is improving but the social acceptance curve is moving more slowly.

The slower constraints

Institutional resistance to change. Schools, universities and education ministries are unusually slow-moving institutions, with multi-year budget cycles, strong constituency politics, and procurement systems that systematically favour incumbents. The cost-trajectory of changing how a school operates is essentially flat.

Credentialing and accreditation. The signalling value of a degree from a recognised institution is one of the most stable economic facts of the modern world. The cost-trajectory of building a new credentialing institution that matters is multi-decade. The cost-trajectory of being recognised by employers as an alternative is somewhat better but still slow.

Teacher labour markets. Teachers are simultaneously the binding capacity constraint (in many regions there are not enough of them) and the key political constituency (their unions and professional associations shape policy substantially). Both facts are stable.

Parent and student behaviour. Choice of school, choice of degree, willingness to use new tools. Slow-moving and shaped by reputational and signalling considerations more than by educational outcomes.

Regulation. Particularly for K-12 and for cross-border education, regulatory approval moves slowly. AI use in classrooms is a current flashpoint with widely varying policies even within single countries.

Specific framework readings

AI tutoring for K-12 students

Categories from generic AI tutors to subject-specific ones to school-system-licensed deployments. Framework reading: the technology is real; the institutional integration is the binding constraint. The standalone consumer-app market is being absorbed by foundation-model providers' native offerings and is increasingly hard to defend. The school-licensed market has substantial friction but the addressable budget is much larger. Attack now selectively for school-system-integrated plays; probably wait on most standalone consumer-tutor apps competing with foundation-model offerings.

AI-augmented language learning

Conversational AI for language acquisition. Framework reading: the genuine advance over the previous-generation apps (Duolingo and similar) is real. The defensible position is in the conversational-fluency segment where the methodological gap from gamified vocabulary apps is largest. Attack now for genuinely conversational products; probably wait on incremental improvements to gamified vocabulary apps.

Tools for teachers (rather than replacements for teachers)

Lesson-planning assistants, individualised-feedback tools, classroom-management AI, marking and assessment automation. Framework reading: this is the most under-priced category in education-technology today. The political and institutional pathway is much smoother than for student-facing tools, the customer-pull is real, and the unit economics work. Attack now with conviction; the framework rates this above most consumer-facing education plays.

Higher-education credentialing alternatives

Bootcamps in software, data, design, biotech; micro-credentials from established universities and from new providers; project-based portfolios. Framework reading: the technological capacity to deliver excellent credentialing has existed for years; the bottleneck is employer recognition. The market has consolidated around a smaller number of credible providers. Attack with caveats; the addressable market is real but slow-growing.

Online courses and MOOCs

Coursera, edX, Khan Academy, the various platform players. Framework reading: the original promise (universal access to elite education) was largely achieved in the technical sense. The pedagogical-and-completion problem was not solved by the platforms; AI now plausibly does. Attack with caveats for AI-augmented platforms; probably wait on simple content-libraries without integrated AI.

Special-education and assistive learning technology

AI-driven tools for students with learning differences, language disabilities, sensory impairments, and adjacent needs. Framework reading: a category whose customer-pull is real, whose competitive landscape is fragmented, and whose social return is high. The funding pathways (often public, often via specific programmes) are slower but durable. Attack now for serious operators in specific need categories.

AI for high-stakes assessment

University admissions, professional licensing exams, K-12 standardised testing. Framework reading: a category with extreme regulatory and political sensitivity. The marginal entrant faces severe verification and trust requirements. Probably wait unless you have a specific institutional partner; attack with caveats for the augmentation-side rather than the replacement-side of high-stakes assessment.

Adaptive learning platforms

Software that adjusts content sequencing to individual students based on their performance. Framework reading: the technology has been around for two decades; AI augmentation makes it work substantially better. The category's institutional pathway remains hard. Attack with caveats; pick the institutional partner first.

Education for low- and middle-income countries

AI tutoring deployed in regions where traditional teacher capacity is structurally limited. Framework reading: one of the highest-leverage applications of education AI globally. The unit economics work because the alternative (no qualified teacher available) is so much worse than the alternative in wealthy markets. Attack now with conviction; this is one of the most under-priced categories the framework currently identifies.

Teacher training at scale

AI-assisted tools for training new teachers, ongoing professional development, and remediation in regions with teacher shortages. Framework reading: under-addressed by both venture and public funding. The cascade if it works is enormous (a single teacher educates thousands of students over a career). Attack now.

Lifelong-learning and career-transition tools

AI-driven tools for adults navigating career changes, including AI-driven coaching for the upcoming wave of AI-related job transitions. Framework reading: a category whose demand-curve is rising fast and whose supply is fragmented. Attack with caveats; the unit economics depend on whether employers, governments or individuals pay.

Educational research and evidence-base

Public-good infrastructure for measuring what actually works in education at scale. Framework reading: dramatically underfunded relative to the leverage. The replication problem in educational research is severe (see Anti-patterns). The framework would direct substantial public capital here.

What the framework de-prioritises in education

Most standalone consumer AI-tutor apps competing with foundation-model providers' native offerings. Most generic adaptive-learning platforms without strong institutional relationships. Most LMS (learning management system) replacements competing with entrenched incumbents. Most "AI for college admissions" plays. Most education-content marketplaces competing with both AI-generated content and incumbent publishers.

What the framework prioritises that the consensus does not

Tools that increase teacher leverage receive a tiny share of education-technology funding relative to student-facing tools. The framework's reading is that the marginal teacher-capacity gain is the highest-leverage education investment available.

AI tutoring deployed in regions with structurally limited teacher capacity — particularly in low- and middle-income countries with rapid school-age population growth — is dramatically undercapitalised. The unit economics work in these regions in a way they do not in wealthy markets where the alternative is already adequate.

Educational-research infrastructure — the empirical evidence base on what works — is one of the most chronically under-funded research domains. Without robust replication and evidence, the entire field operates on ideology and fashion.

Special-education and assistive learning — the unfashionable category — is undercapitalised relative to its social return.

The credentialing-alternatives layer in non-software domains (healthcare, skilled trades, professional services) is largely unaddressed by the consumer education-technology market.

How to use this guide

If you are an education-technology founder, the most useful exercise is to identify which slow constraint blocks your customer's adoption, not which technology you can deploy. The technology curve is favourable; the institutional curve is not. Solving the institutional problem is the moat.

If you are an investor, the framework directs you toward teacher-leverage tools, low- and middle-income-country deployments, and special-education infrastructure. Pure consumer-facing student-AI plays are increasingly hard to defend.

If you are a public funder, the framework directs you toward educational-research infrastructure, teacher-training capacity in regions with shortages, and the unsexy assessment-and-evidence work that no commercial actor will fund alone.

If you are a teacher or school leader, the framework's most useful contribution is permission to pick a small number of tools that genuinely raise your leverage, and resistance to the much larger number of tools that promise it without delivering. The teacher-leverage tool category is real and rapidly improving. Use the discipline.

— Siri Southwind

Read the framework · Current bets · Anti-patterns · 50 possibilities

Defence

Defence

This guide applies the Problem Timing framework to defence and national-security technology. Defence is the field where the framework's standard apparatus collides most directly with the dual-use modification in Dual-use & catastrophic risk. The framework's normal cascade-and-demonstration logic is generally inappropriate for offensive systems and broadly correct for defensive ones; this guide reads accordingly.

The compression. Every major defence cost-curve is moving — drones, autonomous targeting, satellite costs, simulation, AI for command-and-control, hypersonics. The institutional procurement system is not. The gap between what is possible and what is purchased is the dominant feature of current defence allocation, and it produces both the largest opportunities (for new entrants who can route around the procurement system) and the largest risks (the F-35-shape failures that consume the procurement system's attention).

This guide is written for allocators in non-classified environments. It does not address classified programmes, weapons-system specifics, or operational-security matters. The framework reading is on allocation, not on operations.

The five curves that matter

Cost per autonomous drone. Quadcopters with useful military payload have fallen by orders of magnitude. The Ukraine conflict accelerated the curve substantially. Cost-trajectory continues to favour the side that masters mass-production of cheap autonomous platforms over the side that fields a smaller number of exquisite systems.

Cost per satellite-derived intelligence product. Satellite imagery, signals intelligence, motion-pattern detection. Per-product cost has fallen with the rise of commercial constellations and the AI-augmented analysis layer. Continues to fall.

Cost of high-fidelity simulation. Military training, mission planning, war-gaming, force-on-force simulation, materiel testing. Foundation-model approaches and the broader AI revolution are making simulation cheaper and more useful for a wider range of problems than was true five years ago.

Cost per AI-augmented intelligence-analysis decision. Automated triage of imagery, signals, open-source data; pattern detection in vast intelligence datasets; AI-assisted analyst workflow. Cost-trajectory is steep; the bottleneck is shifting from data collection to analysis.

Cost per defensive-measure deployed. Anti-drone systems, electronic-warfare capabilities, cyber-defensive infrastructure, hardened communications. The defensive cost-trajectory is improving but lagging the offensive curve in several specific categories — the asymmetry that the dual-use file warns about.

The slower constraints

Procurement. The single most-binding constraint on defence allocation in democratic states. Multi-year acquisition cycles, complex requirements processes, single-supplier lock-ins, congressional or parliamentary politics, cost-plus contracting structures. The cost-trajectory of reforming defence procurement is famously slow.

Treaties and international law. The legal architecture for autonomous lethal systems, cyber operations, space-based weapons and certain munitions is partially formed and contested. Compliance and ambiguity both have real costs.

Talent. Particularly for AI-and-defence dual-skilled engineers, defence-grade software developers, and analytical talent with security clearances. The supply is limited and growing slowly.

Industrial base. Munitions production capacity, shipyard capacity, specialty-materials capacity, secure-microelectronics capacity. The post-Ukraine experience revealed that several Western industrial bases are smaller and more fragile than the procurement budgets implied.

Allies and coalitions. Defence work is unusually multi-stakeholder. Coordination cost across NATO, AUKUS, the Five Eyes, and bilateral arrangements is substantial.

Specific framework readings

Autonomous drones at scale

Mass-produced quadcopters and fixed-wing autonomous platforms with useful military payload, including loitering munitions, ISR drones, and adjacent platforms. Framework reading: the highest-leverage current category in conventional military technology, and one of the few where new entrants can move faster than the incumbents. The deployment-and-doctrine layer is also rapidly maturing, partly via the Ukraine conflict. Attack now with conviction for serious operators with manufacturing capacity.

Anti-drone systems

The defender-favoured complement of the previous category. Framework reading: the cost-asymmetry currently favours attackers (cheap drones versus expensive defences) and the framework rates closing this asymmetry as a high-priority defensive capability. Specific approaches (kinetic interception, electronic warfare, directed energy, AI-driven detection) carry different verdicts; the AI-driven-detection layer is the most readily addressable. Attack now on the detection-and-classification layer.

AI-augmented command-and-control

Decision support for military commanders, including target identification, course-of-action planning, multi-domain coordination. Framework reading: the technology layer is real; the procurement layer is the binding constraint. The major primes have substantial advantages here that new entrants find hard to match. Attack with caveats; pick the specific application carefully and find a procurement pathway before scaling.

AI-driven ISR analysis

Automated intelligence-surveillance-reconnaissance analysis: triage of imagery, signals, open-source data; pattern detection at scale. Framework reading: a real and growing category with both commercial-data and government-data deployment paths. New entrants can win here on the strength of better models if they can navigate the security-clearance and procurement environment. Attack now selectively for serious operators.

Hypersonics

Hypersonic glide vehicles, hypersonic cruise missiles. Framework reading: a category with substantial public investment and limited civilian cascade. The capability is real, the cost-trajectory is unfavourable, and the strategic-stability implications are concerning. The framework reads this as a category where public funding will continue but where the marginal additional bet is not particularly leveraged.

Space-based ISR and resilient space architectures

Commercial constellations supplying intelligence, the new defence-specific systems, the resilience-and-redundancy work for space-asset survival in a contested environment. Framework reading: the commercial ISR side has consolidated; the defence-specific work is being addressed by both primes and new entrants (Anduril, the Boeing-and-Lockheed offerings, the various smaller constellation operators). Attack with caveats for serious operators; the category has real demand but the marginal team's contribution is constrained by procurement.

Electronic warfare

Jamming, spoofing, signal intelligence collection, electromagnetic-spectrum management. Framework reading: a category whose strategic importance has risen with the Ukraine experience. Capability is there; the deployment-and-integration layer is the bottleneck. Attack with caveats.

Cyber defence (military and critical-infrastructure)

The defensive side of cybersecurity, including industrial-control-system protection, military-network defence, and critical-infrastructure hardening. Framework reading: a defender-favoured category with substantial customer pull and improving cost-trajectories. The framework rates this favourably and notes that public-private collaboration is the bottleneck more than capability. Attack now for serious operators.

Military medical and casualty care

Combat trauma management, autonomous medical evacuation, point-of-care diagnostics, telemedicine for far-forward operations. Framework reading: an under-attended category with substantial dual-use civilian applications. The cascade if specific technologies (autonomous tourniquets, AI-driven triage, portable surgical infrastructure) work is large. Attack now.

Cyber-offensive research without disclosure paths

Already covered in the dual-use file. Framework reading: the framework rates this attacker-favoured and warns against the standard cascade-and-demonstration logic. Coordinated disclosure pathways are the framework-positive version of the same activity.

AI-assisted offensive cyber operations

A category with severe dual-use implications. Framework reading: the framework's recommendation is attack with extreme caveats — the work happens regardless, the question is whether legitimate state actors do it inside coherent doctrinal and accountability frameworks or whether non-state and adversarial actors lead. The institutional architecture for this is still forming.

Alternative procurement models for defence

Other-Transaction Authority contracts, the various rapid-acquisition programmes (DIU, AFWERX, AFRL contracts, the UK and Australian equivalents), prize-based and challenge-based mechanisms. Framework reading: the highest-leverage institutional reform available in defence. New entrants who can deploy via these pathways move materially faster than those locked into traditional procurement. Attack now on building businesses around alternative procurement; attack now on policy work to expand it.

Industrial-base capacity and munitions production

The unsexy work of building the factories, supply chains and skilled workforce required to produce munitions at scale. Framework reading: the post-Ukraine revelation that Western capacity was inadequate has created substantial public-funding momentum. The category will receive substantial investment in 2025–2030. Attack now selectively for serious operators with manufacturing-and-supply-chain expertise.

Defence applications of biotechnology

Biological surveillance, biosecurity, defensive medical countermeasures, and the harder dual-use questions. Framework reading: the defensive-biotech category is dramatically undercapitalised relative to its strategic importance. The offensive applications are squarely in the dual-use file's attacker-favoured category and should be approached with the modifications that file describes.

Quantum technology in defence

Quantum-key-distribution, quantum sensing, post-quantum cryptography. Framework reading: post-quantum cryptography is attack now and well into deployment. Quantum sensing is attack with caveats; some applications (gravimetry, magnetometry) are showing real promise. Quantum computing for cryptanalysis remains an open question.

What the framework de-prioritises in defence

Most "next-generation manned fighter" programmes whose institutional logic is from a previous era. Most large-scale procurement that depends on stable multi-decade requirements in a fast-changing threat environment. Most exquisite-system platforms in categories where mass-produced cheap alternatives are demonstrably effective. Most defence-tech startups without a clear procurement pathway. Most claims about AI-driven autonomous weapons that ignore the dual-use modifications.

What the framework prioritises that the consensus does not

Industrial-base capacity for munitions and defensive systems is severely undercapitalised relative to demonstrated demand. Public capital flowing here pays back faster than most other defence allocations.

Anti-drone defensive infrastructure, particularly the AI-driven detection layer, is a mispriced category — the offensive curve is faster than the defensive curve and closing the asymmetry is high-leverage.

Defensive biotech and biosurveillance is structurally underfunded relative to the dual-use catastrophic-risk profile.

Alternative procurement pathways (DIU, AFWERX, the OTAs, the various challenge-based mechanisms) are receiving more attention than five years ago but still process a small share of total defence spend. Expanding them is among the highest-leverage policy reforms available.

Allied-coordination infrastructure — the boring work of making coalition systems interoperable — is undersupplied as a category and rewards patient operators.

Military medical and casualty-care technology is undercapitalised relative to its dual-use civilian cascade. Combat-medicine innovations have substantial peacetime trauma-care applications.

What the possibilities list says about defence and security

The defender-favoured cluster in 50 possibilities sits squarely on this guide's territory: pathogen surveillance (1), nucleic-acid synthesis biosecurity screening (5), AI red-teaming and capability evaluation (6), memory-safe rewrites of critical-infrastructure software (9), post-quantum cryptography migration (10), civilisational food and water reserves (37), auditable open-source voting infrastructure (40), consumer-scale hardware roots of trust (41), asteroid characterisation for planetary defence (43), engineered kill-switches and biocontainment for synthetic biology (49). All ten are defender-favoured, almost all are public-good in form, and all are funded today at a fraction of what the asymmetry would imply.

On scenarios — the home discipline

Defence is the field that gave scenario planning to the rest of the world. Herman Kahn's RAND-era work on nuclear-strategy scenarios (the lineage chapter has the full story) is the founding act, and the corporate scenario tradition that descended from it via Pierre Wack at Shell is a translation rather than an invention. Defence allocators reading the framework should do something the corporate world struggles with: take the scenario discipline you already practice on operational planning and apply it to the technology-investment decisions in this guide. Which procurement pathways are robust across a modal scenario, a contested-Pacific scenario, a European-major-war scenario and a grey-zone-only scenario? Which technology bets pay off in three of the four? The framework's robust position concept maps onto defence planning more naturally than onto almost any other domain because the discipline is already native to it.

How to use this guide

If you are a defence-technology founder, the most useful question is which procurement pathway will accept your product and on what timeline, not which technology you can build. The technology-versus-procurement gap is the core economic fact of the field. Building for traditional procurement assumes the procurement system will mature; betting on alternative pathways accepts a smaller initial market in exchange for a real path to deployment.

If you are an investor, the framework directs you toward operators who have demonstrated the ability to deploy via alternative procurement pathways, toward defender-favoured categories where the cost-asymmetry currently runs the wrong way, and away from exquisite-system bets that depend on traditional procurement maturing.

If you are a public funder or policy-maker, the framework directs you toward expanding alternative procurement, building industrial-base capacity in undersupplied categories, and the defensive infrastructure (cyber, anti-drone, biosecurity) where the asymmetry currently favours attackers. These are unfashionable and high-leverage.

— Siri Southwind

Read the framework · Dual-use and catastrophic risk · Compute and robotics · Current bets · 50 possibilities

Open & paper
Open questions

Open questions

A working list of holes in the framework. Things to develop, examples to add, questions not yet resolved. Kept as a single section so the rest of the repository stays clean.

Examples to add

Beyond those in Historical examples:

  • The Encyclopædia Britannica versus Wikipedia.
  • The Tycho Brahe → Kepler arc (a clean brute-force-then-elegance archetype).
  • ENIAC and early electronic computing.
  • The decipherment of Egyptian hieroglyphs (Champollion, Rosetta Stone).
  • Cataloguing the night sky (Hipparchus → Tycho → Hubble → Gaia).
  • The ENCODE project, the BRAIN Initiative, the Human Cell Atlas.
  • The Connectome Project (a live test case — possibly being attacked too early).
  • Materials Project / Open Quantum Materials databases.
  • The Sky Surveys (SDSS, LSST/Vera Rubin).
  • Bibliometric digitisation efforts (Web of Science, Sci-Hub).
  • Carbon capture R&D portfolios as a current case.
  • Fusion programmes (ITER, the new private fusion bets).
  • mRNA platform technology pre-COVID — was it being attacked at the right rate?
  • AlphaGo / AlphaZero / MuZero as a series of demonstration unlocks.
  • The decoding of the Maya glyphs.
  • The Indus Valley script — still undeciphered, framework reading: probably best deferred.
  • The Library of Babel / Alexandria precedents as cautionary tales about fragility.
  • Carl Sagan's Voyager Golden Record as a moonshot with peculiar accounting.
  • Estonia's e-government build — a national-scale brute-force-then-elegance case.
  • The development of the Standard Model of physics.
  • The four-colour theorem as the first major computer-assisted proof.
  • HIV/AIDS treatment development — a window-closing case that was attacked at the right rate.
  • Smallpox eradication.
  • The polio campaigns (still incomplete as of the framework's reference date).

Dimensions to develop further

  • The interaction between dimensions. The current framework treats them as independent for ease of explanation. They are not. Cost trajectory and cascade value, in particular, often correlate. A formal treatment would acknowledge the joint distribution.
  • Calibration. How does someone using the framework get better at it over time? The retrospective stupidity index in Models & scoring is a start; a more rigorous calibration loop would help.
  • The attention dimension. Some problems benefit from sustained attention by the same individuals over decades; some benefit from rotation and fresh eyes. The framework currently does not distinguish.
  • Closing windows that the framework does not yet handle well: indigenous knowledge before language extinction, eyewitness oral history, ecosystems in collapse, knowledge in fragile institutions during periods of unrest.
  • Negative-externality problems. The framework prices benefit but is weak on cost-to-others. Worth a dedicated section.

Models still to develop

  • A worked example of the wait curve with real data — probably AlphaFold or sequencing.
  • A formal real-options treatment with realistic priors over cost-trajectory uncertainty.
  • A prediction-market design specification for problem-solvability contracts. Currently sketched in Moonshots, arbitrage & markets; needs a serious draft.
  • A small, well-tested scoring tool — possibly a spreadsheet or a small web app — that takes a problem description and walks through the dimensions. This would be a test of whether the framework is actually usable.

Philosophers and thinkers to add

  • Hilary Putnam (no clear hook but worth checking).
  • Gregory Bateson on "the difference that makes a difference."
  • Stuart Brand's pace-layering as a way of thinking about which problems sit at which layer.
  • The economics-of-ideas literature: Paul Romer, Charles Jones, Pierre Azoulay.
  • The economic history of ideas literature: Joel Mokyr, Anton Howes.
  • Margaret Boden on the structure of creative problems.
  • The Polanyi brothers (Karl and Michael) on tacit knowledge and the limits of formalisation.

The capacity question

The framework implies that some people should redirect from problems whose costs are about to collapse to problems where their work has more leverage. In practice this is not always feasible. A senior crystallographer is not going to retrain as a machine-learning researcher. Some people will continue to work on problems that the framework rates as low-priority because that is the only work they are equipped to do — and a fraction of that work will turn out to have been more valuable than the framework predicted, partly through mechanisms (tacit knowledge, training the next generation, holding institutional memory) the framework does not score well.

The honest answer is: the framework is for allocators, not for everyone. A funding body, an institute director, a founder, a researcher choosing their next decade's work. It is not a replacement for the work that has to happen anyway.

A more developed treatment of these limits — human-capital constraints, political constraints, and the cases where the framework should not be applied at all — is now in Limits & falsifiability.

The dual-use and ethics layer — partial

Now treated in Dual-use & catastrophic risk, which introduces the four-category classification (benign-default, defender-favoured, attacker-favoured, symmetric) and explains how the framework's standard cascade and demonstration readings invert for attacker-favoured problems. Drawing explicitly on Bostrom's Differential Technological Development.

Still open: a deeper integration of catastrophic-risk reasoning across the field guides, particularly biotech and AI/ML where the dual-use stakes are most concrete. Also open: a working list of defender-favoured problems that the framework would direct unusually high allocation toward (biosecurity surveillance, AI red-teaming, cyber defence, climate monitoring) — currently scattered across the field guides and the current-bets list rather than collected.

Academic paper

Academic paper

Working paper — PDF available Plain academic formatting, A4, 11 pages. Black on white. For citation, archiving and printing.
Download PDF

A framework for resource allocation under accelerating technology

Siri Southwind Working paper. Draft.


Abstract

The cost of solving things is collapsing. Sequencing a human genome, training a foundation model, predicting a protein structure, simulating a molecule, launching a kilogram, classifying an image, parsing a corpus — every one of these costs has fallen by orders of magnitude in the past two decades, and in many cases continues to fall. This paper argues that the dominant question facing allocators of resource — capital, attention, compute, talent — has shifted from can this problem be solved to when should it be solved, and at what cost relative to where the technology will be in one, three, five or ten years. I propose a framework, Differential Problem-Solving, that promotes the time-dependence of tractability to a first-class variable alongside the importance, tractability and neglectedness of the Effective Altruism tradition. The framework integrates real-options theory, Wright's-law cost trajectories, the Hamming "important problems" tradition, Bostrom's Differential Technological Development, and the scenario-planning tradition associated with Pierre Wack and Royal Dutch Shell. It produces falsifiable verdicts on which problems should be attacked now and which should be deferred, and it requires those verdicts to be made scenario-conditional rather than implicitly anchored on a single forecast. I apply it to a curated set of historical projects to test calibration, and to a list of live bets to test predictive value. The framework is designed to be revisable; it makes specific claims that can be wrong in specific ways. I close with limits, open questions and a proposed protocol for empirical refinement.

Keywords: problem selection, technology forecasting, real options, research allocation, differential development, cause prioritisation, scenario planning


1. Introduction

For most of human intellectual history, the binding constraint on consequential work was capability. Could the thing be done at all? Could you, with the resources available, do it? The question of which problem to attack was secondary to the question of whether a problem was attackable.

This priority has been quietly inverted. Across an increasing number of domains, the cost of attacking a problem of fixed difficulty is on a steep declining curve. A specific bioinformatics question that required a billion-dollar institutional programme in 2003 now requires a graduate-student weekend. A specific protein-structure prediction that occupied a postdoctoral career in 1993 is generated in seconds in 2024. A specific class of natural-language tasks that required a department of linguists and engineers in 2017 is solved by API call in 2026. The pattern is not universal — many problems remain genuinely hard for reasons that compute and AI cannot touch — but the pattern is widespread enough to change the calculus of allocation.

When capability is scarce, the right unit of analysis is the project: pick something, attack it, see if you can do it. When capability is collapsing, the right unit of analysis is the moment. The same problem attacked in 2024 and in 2027 may be entirely different in cost, in scope and in cascade. Allocators who reason as if capability were still scarce systematically misallocate.

This paper proposes a framework for reasoning about that allocation problem. The contribution is not the dimensions individually — almost every dimension I name appears somewhere in the existing literatures of cause prioritisation, technology forecasting, real-options theory or operations research. The contribution is the integration: promoting the time-dependence of tractability to a first-class variable, providing a vocabulary that allocators across domains can share, and producing verdicts that are specific enough to be argued with.

The paper proceeds as follows. Section 2 surveys the prior literature and locates the contribution. Section 3 presents the framework formally. Section 4 develops mathematical models, drawing on real-options theory and optimal stopping. Section 5 applies the framework to a set of historical cases to test calibration. Section 6 makes predictions on live cases and proposes the falsifiability protocol. Section 7 discusses limits. Section 8 concludes.

2. Related work

The intellectual lineage of the framework spans several literatures that have not, to my knowledge, been integrated.

2.1 The problem-list tradition

Hilbert's 1900 lecture in Paris, listing twenty-three open problems in mathematics, is the founding act of explicit problem-allocation thinking (Hilbert, 1900; Yandell, 2002). Stephen Smale's 1998 update extended the gesture, with mixed retrospective accuracy. The Clay Millennium Prize Problems (2000) added explicit financial incentives. The tradition treats problem selection as a discipline in its own right but provides no machinery for timing.

Richard Hamming's 1986 talk You and Your Research is the spiritual ancestor of this paper. Hamming asked: "What are the most important problems in your field? Why aren't you working on them?" The framework here is, in part, Hamming's question rendered tractable for an environment in which "important" and "feasible" have come apart on different time-scales.

2.2 Cause prioritisation

The Effective Altruism tradition has produced the most developed existing framework for problem selection (Ord, 2020; MacAskill, 2015; Open Philanthropy, 2014–). The core formulation evaluates problems by importance (how much value would a solution produce), tractability (how much does an additional unit of resource move the needle) and neglectedness (how few people are already on it). The framework here borrows IT–N essentially wholesale and adds a fourth term, timing, treating tractability as a function of when the question is asked rather than as a static input.

The EA tradition has been criticised for various reasons (uncertainty quantification, moral framework choices, neglect of structural factors). The framework here is partly compatible with most of those critiques and partly orthogonal to them.

2.3 Differential Technological Development

Nick Bostrom's principle of differential technological development — that beneficial technologies should be accelerated relative to dangerous ones rather than allowing technologies to arrive in the order their researchers happen to deliver them (Bostrom, 2014; Bostrom and Ćirković, 2008) — is the most direct ancestor of the present framework. Differential Problem-Solving applies the same move at the level of individual problems rather than whole technologies, and adds time-trajectory analysis as a first-class concern.

2.4 Real options and optimal stopping

The mathematical machinery imported here is largely from finance and operations research. Real-options theory (Trigeorgis, 1996; Dixit and Pindyck, 1994) provides the formal vocabulary for valuing the option to wait under uncertainty. Optimal stopping theory (Wald, 1947; Chow, Robbins and Siegmund, 1971) provides the analytical framing for sequential decisions over an unknown distribution. Both bodies of theory are mature; the contribution here is to apply them explicitly to problem selection rather than to investment timing.

2.5 Cost trajectories and learning curves

Wright's 1936 paper introduced the empirical observation that the cost of producing aircraft fell by a roughly constant percentage with each doubling of cumulative production (Wright, 1936). The pattern has held for an extraordinary range of technologies since: solar panels, batteries, sequencing, semiconductors, satellite launches (Nagy et al., 2013; Lafond et al., 2018). The framework relies heavily on Wright's-law-shape forecasts as the empirical backbone of the cost trajectory dimension. Shannon's information-theoretic framing of communication (Shannon, 1948) is a methodologically adjacent move: both Wright and Shannon insist that quantities others treat as resistant to measurement can in fact be measured, and that the act of insisting changes what becomes possible.

2.6 Innovation and growth economics

The economics of ideas, particularly Romer's endogenous growth theory (Romer, 1990) and the more recent work on declining research productivity (Bloom et al., 2020), provides background for the framework's claim that which problems are attacked is itself a determinant of growth. The Carlota Perez tradition (Perez, 2002) on the installation and deployment phases of technological revolutions provides another useful lens.

2.7 Forecasting and superforecasting

The discipline of calibrated probability estimation introduced by Tetlock and the Good Judgment Project (Tetlock, 2005; Tetlock and Gardner, 2015) provides methodological support for the framework's scoring scheme.

2.8 Scenario planning and the Royal Dutch Shell tradition

Tetlock-style forecasting and scenario planning are usually treated as competing rather than complementary. They are complementary. Tetlock seeks calibrated point estimates against operationalised questions; scenario planning, in the tradition associated with Pierre Wack at Royal Dutch Shell (Wack, 1985a, 1985b), refuses point estimates entirely and instead constructs small sets of plausible, internally-consistent narratives spanning the critical uncertainties. Wack's Group Planning team is best known for the 1972 scenarios that anticipated the 1973 oil shock, repositioning Shell to act faster than its peers when the embargo arrived. The discipline was developed further by Schwartz (1991), van der Heijden (1996) and Kahane (2004), and earlier prefigured in the strategic-defence work of Kahn (1962) at the RAND Corporation.

The framework presented here is single-scenario by default — each of the sixteen dimensions, scored honestly, depends on an unstated forecast about how the world will unfold over the relevant time horizon. The scenario tradition is the natural corrective. The implication, developed in Section 4.4, is that the dimension scores should be computed across a small set of scenarios rather than against a single implicit forecast, and that the framework's portfolio shapes should be re-read in scenario language: patient infrastructure shares are the bets that pay off across the entire envelope of plausible futures; moonshot shares are the bets that pay massively in one scenario and produce useful by-products in the rest; just-early shares are the bets timed against a specific scenario about the cost trajectory and are therefore most fragile if the scenario is wrong.

2.9 The science of science

The recent quantitative literature on which projects produce more, which collaborations are productive, which kinds of papers anticipate breakthroughs (Wang and Barabási, 2021; Fortunato et al., 2018; Park, Leahey and Funk, 2023) is empirical where the present framework is theoretical. The two should converge.

3. The framework

The framework consists of sixteen primary dimensions clustered into four families. The companion repository maintains a fuller canonical list in its dimensions file; the sixteen presented here are those most consequential for typical allocation decisions, and the remainder are useful in specific subdomains. The dimensions are not orthogonal — several correlate substantively — and the framework does not claim to be a calculator. Its purpose is to make the relevant variables explicit so that the implicit why now in any allocation decision can be made articulable, criticisable and revisable.

Two dimensions deserve to be foregrounded. Verification cost (D1) and cost trajectory (D12) are the most important variables in almost every interesting allocation decision today. Cost trajectory captures the framework's central temporal insight. Verification cost captures the generation–verification asymmetry — the fact that AI has made it dramatically cheaper to produce hypotheses, code, designs and analyses than to check them. The bottleneck has moved to the verification side and is staying there.

3.1 Cost and difficulty

(D1) Verification cost. What does it take to confirm a proposed solution actually solves the problem? In 2026 this dimension is doing more work than ever, because generation cost is collapsing while verification cost is essentially flat. A problem whose verifier is itself hard is now substantially worse-positioned than the same problem ten years ago.

(D2) Difficulty today. What does it actually take to solve this with current tools, expressed in money, person-hours, compute and time?

(D3) Required talent density. How concentrated is the relevant expertise, and how transferable from adjacent fields?

(D4) Capital intensity. What fraction of the cost is up-front, sunk, irreversible?

(D5) Coordination cost. How many independent actors must agree, contribute or stay out of the way?

(D6) Physical-resource dependency. What does the project require from the physical world that cannot be substituted away — energy intensity, dependency on specific elements (rare earths, lithium, gallium, helium, isotopes), and permitted physical actions (siting, regulatory consent, environmental impact, dual-use export)? When cognition is cheap, physical-resource dependency is increasingly the binding constraint.

3.2 Value

(D7) Direct value. What does the world look like immediately after the problem is solved?

(D8) Cascade value. What other problems become solvable, cheaper or differently-shaped as a result?

(D9) Demonstration value. What is the value of removing the question of feasibility for an entire category of problems?

(D10) Optionality value. What future actions become possible that are not possible today?

(D11) Decay rate of value. How fast does the value of a solution erode as substitutes arrive?

3.3 Time and curve dynamics

(D12) Cost trajectory. How is the cost-to-solve changing year on year, and on what underlying curve?

(D13) Tractability trajectory. Is the shape of the problem changing because complementary technologies are arriving?

(D14) Window. Is there a closing horizon beyond which the problem cannot be solved at all, or cannot be solved with current evidence intact?

3.4 Strategic context

(D15) Reusability of by-products. If the headline goal fails, are the data, infrastructure, methodology or talent generated still valuable?

(D16) Crowding and neglectedness. How many capable teams are already attacking this, and what is the marginal return of one more?

A separate dual-use weighting applies to a small but consequential set of problems carrying catastrophic-risk asymmetries. For these, the standard cascade and demonstration readings invert: large cascade is a reason for caution rather than confidence, and demonstration value can change sign because demonstrating possibility tells malicious actors the capability exists. The dual-use modification is presented separately in the companion repository rather than as a numbered dimension because it changes how several of the standard dimensions should be read. The framework remains morally neutral on most allocation decisions; on dual-use catastrophic-risk decisions it cannot afford to be silent.

The dimensions interact. A problem with high cascade value but a fast cost-decline trajectory and no closing window is a wait candidate. A problem with modest direct value but a closing window and large by-product reusability is an attack now candidate. A problem with all the right scores but extreme crowding is an attack a sub-problem instead candidate. The framework's job is to make these readings explicit.

3.5 Decision rule

The framework reduces to four allocation moves at any decision moment. Attack now with brute force, with the goal of demonstrating feasibility and producing useful by-products. Attack now with elegance, where the curve is mature and the marginal team's contribution is incremental. Wait, on the bet that the cost will fall and the option will still be exercisable. Or attack a different problem entirely, on the bet that one of the alternatives dominates this one on the relevant axes.

A fifth move — decompose — is often the right one: attack a sub-problem now, defer the rest, on the bet that solving the linchpin sub-problem unlocks the dependency graph cheaply.

The framework is silent on which of these moves is correct in any specific case; it provides the vocabulary for arguing.

4. Mathematical formalisation

This section sketches the formal machinery. The technical content is intentionally light; the aim is to be rigorous about the kind of object the framework is, not to produce a full quantitative model. A more developed treatment is left for future work.

4.1 The waiting decision as a real option

Consider a problem that, if solved at time t, produces value V(t) and costs C(t) to solve. Both are stochastic processes. The allocator may choose to attack at any time t or to defer. The decision problem is:

maxt 𝔼[ V(t) - C(t) - r(t) ]

where r(t) is the opportunity cost of capital and attention deployed to this problem at time t rather than to alternatives.

If C(t) is on a fast declining curve (Wright's-law-shape, with declining variance) and V(t) is roughly stationary, the optimal policy is typically to wait. If V(t) declines (decay of relevance) or has a hard horizon (closing window), the optimal policy is typically to attack early. If V(t) has a step function — a complementary technology arrives at some unknown time and unlocks much higher value — the optimal policy depends on the prior over the arrival time and on the cost of being unprepared.

The classical real-options literature (Trigeorgis, 1996; Dixit and Pindyck, 1994) provides closed-form solutions under various distributional assumptions. The application to research-and-development problems is non-trivial because the underlying distributions are often non-Gaussian and the volatility itself is changing. A detailed treatment of this for specific problem classes is the most natural follow-up to this paper.

4.2 The wait curve

A useful informal device is the wait curve: a plot of expected cost-to-solve over time on one axis and probability-weighted value on the other. The right time to attack is approximately where the rate of cost decline first falls below the rate of value decline, adjusted for the expected cascade value of moving early.

In closed-form: attack at time t* such that

d log Cd t|t^*d log Vd t|t^* - κ

where κ is a positive constant capturing the expected cascade and demonstration value of being among the first to attack rather than last.

The closed form is unhelpful in practice because C and V are uncertain. The wait-curve image, however, is useful as a heuristic: it organises the timing question around the ratio of the two underlying rates of change rather than around either curve in isolation.

4.3 The portfolio shape

For an allocator managing many bets simultaneously, the framework implies a portfolio strategy rather than a series of independent decisions. Specifically:

A modal portfolio share — the largest single category — should sit on problems in the just early enough zone, where the cost trajectory is favourable but moving early provides demonstration and cascade value.

A moonshot portfolio share — typically five to fifteen per cent — should sit on problems with low probability of success and asymmetric upside, on the bet that the portfolio's expected value is dominated by the rare wins. The shape is Taleb's barbell strategy applied to research allocation: heavy weight on safe positions, a small allocation to convex bets, very little in the middle (Taleb, 2007, 2012).

A patient infrastructure portfolio share — typically five to fifteen per cent — should sit on closing-window and cataloguing problems whose value compounds over decades and cannot be retroactively created.

A deliberately unallocated portfolio share — typically ten to twenty per cent — should remain available for problems that the framework cannot evaluate because they are not yet on anyone's list.

The remainder, fifty to seventy per cent, sits on the standard attack-now-with-elegance category that conventional allocation already handles well. The framework's main argument is that the first and third categories — just early and patient infrastructure — are systematically underweighted in most institutional portfolios.

4.4 Scenario-conditional scoring

Each dimension score, when the discipline is honest, is conditional on an unstated forecast about how the world will unfold. The cost trajectory of synthetic biology depends on whether biosecurity tightens; the closing window for indigenous-language documentation depends on whether ML-assisted transcription reaches remaining communities before generational hand-off completes; the cascade value of fusion depends on whether grid-scale storage solves intermittency on a different curve. A dimension score that ignores its own conditioning is a number masquerading as analysis.

The discipline borrowed from the scenario tradition (Wack, 1985a, 1985b; Schwartz, 1991; van der Heijden, 1996) is to score each dimension across a small set of plausible, internally-consistent scenarios — three is usually right, four when the field is genuinely contested — rather than against a single implicit forecast. The scenarios should differ on the variables that most plausibly drive the score, not on cosmetic surface features. The output is not a single sixteen-number row but a small matrix: dimensions on one axis, scenarios on the other, with a robustness reading at the foot.

The most useful reading is the shape of robustness. Let S denote the set of scenarios and D(s) the dimension score in scenario s. A bet is robust if D(s) is positive across all s ∈ S; directional if it is positive in some scenarios and negative in others; fragile if it is negative across most. The framework's recommended portfolio shape (Section 4.3) re-reads cleanly in scenario language: patient infrastructure bets are robust by construction; moonshot bets are directional with antifragile by-products that produce useful capital across all scenarios; just-early bets are directional without the antifragile cushion and are the most exposed if the underlying scenario fails to obtain.

This formulation also gives the framework a defensible response to the criticism that it is single-scenario by default. The criticism is correct of any specific application; it is not correct of the framework when the scenario discipline is applied. Most failure modes of the framework, in practice, are failures of the user to make the implicit scenario explicit before scoring.

5. Historical calibration

The framework is testable, in part, against the historical record. This section presents brief readings of six cases, three from the vindicated class and three from the misallocated class, with the framework's verdict and the reasoning behind it.

5.1 The Human Genome Project (1990–2003)

Vindicated. Roughly three billion dollars over thirteen years for the first reference human genome.

Framework reading at decision time: high direct value (genomic medicine), enormous cascade value (the entire downstream genomics field), high cost-trajectory uncertainty, demonstration value substantial. Verdict: attack now with brute force.

Framework reading retrospective: vindicated. The HGP did not just produce a genome; it bent the cost curve, generated infrastructure, trained a generation of bioinformaticians and produced the reference assembly that all later sequencing aligns to. The dollar cost is large; the productivity-per-dollar is among the highest in twentieth-century biomedical investment.

5.2 ImageNet (2007–2010)

Vindicated. Fei-Fei Li and team paid for the labelling of fourteen million images via Mechanical Turk for an estimated few hundred thousand dollars.

Framework reading at decision time: low direct value (a benchmark dataset is not directly useful), high cascade value (training data for the next generation of vision systems), demonstration value uncertain, crowding low (the consensus was sceptical), cost-trajectory of compute favourable. Verdict: attack now with brute force, expect the cascade to fire later.

Framework reading retrospective: among the highest-leverage data-creation projects ever undertaken. AlexNet (2012) and the deep-learning revolution that followed were directly enabled by the dataset.

5.3 The Apollo programme (1961–1972)

Contested vindication. Roughly two hundred and fifty billion dollars in current money.

Framework reading at decision time: high direct value (national prestige and Cold War strategic positioning), substantial cascade value (integrated circuits, materials science, software engineering), high coordination cost, large capital intensity, no closing window in the literal sense but a politically-imposed deadline. Verdict: attack now, conditional on the strategic premise.

Framework reading retrospective: contested. The cascade was real but the magnitude is debated. The framework's verdict is positive on the premise that the alternative (not landing on the moon at all in the 1960s) would have produced substantially less of the cascade. Reasonable analysts disagree on this counterfactual by a factor of three or more.

5.4 Hand-tuned chess engines after Deep Blue (1999–~2010)

Misallocated. Significant academic and commercial effort continued on hand-tuned position-evaluation engines for a decade after the 1997 Kasparov match made it clear that brute-force search and increasingly hardware would dominate.

Framework reading at decision time: low marginal direct value, low cascade value, cost-trajectory of the alternative (compute) collapsing, crowding moderate-to-high, demonstration question already closed by Deep Blue. Verdict: stop.

Framework reading retrospective: misallocated. The community's institutional path-dependence kept the work going past its useful life. AlphaZero (2017) closed the question entirely, by which point the misallocation was essentially complete.

5.5 Enterprise rule-based NLP (2017–2022)

Misallocated. The global consulting industry spent tens of billions of dollars across thousands of engagements building hand-rolled entity extractors, sentiment analysers and document classifiers.

Framework reading at decision time: cost-trajectory of foundation-model alternatives extremely steep, cascade value of bespoke systems essentially zero, crowding extreme, demonstration value already accumulating elsewhere. Verdict: stop, wait, or buy off-the-shelf.

Framework reading retrospective: misallocated. Most of the systems built between 2017 and 2022 were rebuilt or scrapped between 2022 and 2024 once foundation models commoditised the underlying capability. The aggregate waste is among the largest in the recent history of corporate IT.

5.6 The Iridium satellite constellation (1991–1999)

Misallocated. Roughly five billion dollars for a global mobile-phone constellation that filed for bankruptcy nine months after service launch.

Framework reading at decision time: direct value real but heavily dependent on terrestrial cellular failing to expand, cost-trajectory of terrestrial cellular favourable, capital intensity extreme, reversibility low. Verdict: do not attack at this scale, possibly wait or attack with smaller commitment.

Framework reading retrospective: misallocated. Terrestrial cellular expanded much faster than the planning forecast, hollowing the addressable market. The infrastructure later found a niche under different ownership; the original allocation was a misread of the dominant input curve.

5.7 Calibration discussion

The framework's verdicts on the six cases above accord broadly with the post-hoc consensus, which is unsurprising — the framework was developed in part by examining cases like these. The harder test is on cases where the consensus is wrong: where the framework would have flagged a misallocation in advance, or vindicated a project the consensus rejected. The retrospective stupidity index (see Models & scoring of the broader repository) is the working tool for this exercise.

6. Predictions and falsifiability

A framework that produces no falsifiable verdicts is a mood. This paper, and the broader repository on which it draws, makes specific predictions on live cases. Three classes of prediction deserve naming.

6.1 Specific named bets

In the broader repository, the current bets file lists problems the framework rates as attack now, attack with caveats, probably wait, and open. The list is updated on six-month cycles. Each call is specific enough to be evaluated retrospectively five years from now.

The framework's predictive accuracy on the attack-now and probably-wait lists is one direct test. If the attack-now problems substantially underperform comparable controls over the next five-to-ten years, or the probably-wait problems substantially outperform, the framework's calibration is wrong.

6.2 Named anti-predictions

The current 50 list in the broader repository lists projects the framework predicts will age badly. Specific named entries — sovereign frontier-model programmes, the NEOM Line, current Reality Labs spending levels, specific hyperscaling AI infrastructure bets, NASA's SLS — are each individually wrong if the project succeeds. If a substantial majority of the named entries on that list survive and thrive over the next five-to-ten years, the framework's curve-reading is wrong.

I expect to be wrong on roughly twenty per cent of that list. The point is not perfection; it is specificity.

6.3 Class-level predictions

The framework also makes structural predictions that are independent of any specific case. Three are particularly testable:

Foundation-model commoditisation will continue to compress the value of thin-wrapper businesses. This is happening in 2025; the framework predicts it accelerates rather than stabilises through 2027.

Closing-window infrastructure projects will become more strongly vindicated as their substrates are consumed by AI. The Internet Archive, the Materials Project, the Protein Data Bank, the various long-running cohorts will be valued more in 2030 than in 2024. If they are not, the framework's cascade-mapping is wrong.

Differential allocation will produce measurable advantage at the funder level. Funders who explicitly use timing-aware frameworks should outperform funders who use only conventional cause-prioritisation by a margin large enough to be visible in the data over a decade. This is the strongest and most falsifiable claim. If it is wrong, the entire enterprise is wrong.

7. Limits

The framework has named limits.

It does not price morality, or unknown unknowns, or the political economy of allocation. It cannot distinguish vindicated contrarians from doomed ones in advance. It does not model team quality. It is partially inadequate for catastrophic-risk decisions, where Bostrom's existential-risk literature remains the better tool.

It is also subject to two structural failure modes worth naming explicitly.

The first is false precision: producing a numeric score and treating the number as objective when the dimensions are qualitative. The framework is meant to force articulation, not to produce calculator-style verdicts.

The second is contrarian pose: mistaking being against the consensus for being right. The framework rewards counter-consensus reasoning where the reasoning is better than the consensus reasoning, not counter-consensus positions in general. Most counter-consensus positions are counter-consensus for a reason.

A more detailed treatment of limits and failure modes is provided in Limits & falsifiability of the broader repository.

8. Conclusion

I have argued that the dominant question facing allocators of resource has shifted, in many domains, from can this problem be solved to when, and at what cost relative to alternatives. The shift is consequential. Most institutional allocation systems — public science funding, philanthropic giving, corporate R&D, much of venture capital — were built when capability was the binding constraint and reason as if it still were. The result is a portfolio drift away from optimal that compounds each year as the underlying tractability landscape moves faster than the funding does.

Differential Problem-Solving is one attempt to make the timing question explicit. The framework integrates time-dependence of tractability with the existing literatures of cause prioritisation, real-options theory, Wright's-law cost trajectories, and the Hamming–Hilbert problem-list tradition. It is not a calculator. It is a vocabulary, a checklist, and a discipline that produces falsifiable verdicts on what to attack now and what to defer.

The framework will be wrong in specific ways. The wrong ways are the test of its value: a framework that cannot be wrong cannot be improved. The current bets and anti-bets in the broader repository are made specifically so that the framework can be evaluated against the world over the next five to ten years.

The world has more important problems than it has people who can recognise which to attack and when. To the extent that this framework helps any of those people argue more clearly, it has done its job.


References

Bloom, N., Jones, C. I., Van Reenen, J., and Webb, M. (2020). Are Ideas Getting Harder to Find? American Economic Review, 110(4), 1104–1144.

Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.

Bostrom, N., and Ćirković, M. M. (eds) (2008). Global Catastrophic Risks. Oxford University Press.

Box, G. E. P. (1976). Science and Statistics. Journal of the American Statistical Association, 71(356), 791–799.

Bush, V. (1945). As We May Think. The Atlantic Monthly, July.

Chow, Y. S., Robbins, H., and Siegmund, D. (1971). Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin.

Deutsch, D. (2011). The Beginning of Infinity: Explanations That Transform the World. Allen Lane.

Dixit, A. K., and Pindyck, R. S. (1994). Investment under Uncertainty. Princeton University Press.

Fortunato, S., Bergstrom, C. T., Börner, K., et al. (2018). Science of science. Science, 359(6379), eaao0185.

Hamming, R. W. (1986). You and Your Research. Bell Communications Research Colloquium Seminar.

Hilbert, D. (1900). Mathematische Probleme. Lecture, International Congress of Mathematicians, Paris.

Kahane, A. (2004). Solving Tough Problems: An Open Way of Talking, Listening, and Creating New Realities. Berrett-Koehler.

Kahn, H. (1962). Thinking About the Unthinkable. Horizon Press.

Knight, F. H. (1921). Risk, Uncertainty, and Profit. Houghton Mifflin.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.

Lafond, F., Bailey, A. G., Bakker, J. D., et al. (2018). How well do experience curves predict technological progress? A method for making distributional forecasts. Technological Forecasting and Social Change, 128, 104–117.

Lakatos, I. (1978). The Methodology of Scientific Research Programmes. Cambridge University Press.

MacAskill, W. (2015). Doing Good Better. Avery.

Nagy, B., Farmer, J. D., Bui, Q. M., and Trancik, J. E. (2013). Statistical basis for predicting technological progress. PLoS One, 8(2), e52669.

Open Philanthropy (2014–). Cause Reports and Worldview Investigations. Open Philanthropy Project.

Ord, T. (2020). The Precipice: Existential Risk and the Future of Humanity. Hachette.

Park, M., Leahey, E., and Funk, R. J. (2023). Papers and patents are becoming less disruptive over time. Nature, 613(7942), 138–144.

Perez, C. (2002). Technological Revolutions and Financial Capital. Edward Elgar.

Popper, K. R. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.

Popper, K. R. (1972). Objective Knowledge: An Evolutionary Approach. Oxford University Press.

Romer, P. M. (1990). Endogenous Technological Change. Journal of Political Economy, 98(5), S71–S102.

Schwartz, P. (1991). The Art of the Long View: Planning for the Future in an Uncertain World. Doubleday.

Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379–423; 27(4), 623–656.

Smale, S. (1998). Mathematical problems for the next century. The Mathematical Intelligencer, 20(2), 7–15.

Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable. Random House.

Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder. Random House.

Tetlock, P. E. (2005). Expert Political Judgment. Princeton University Press.

Tetlock, P. E., and Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.

Trigeorgis, L. (1996). Real Options: Managerial Flexibility and Strategy in Resource Allocation. MIT Press.

van der Heijden, K. (1996). Scenarios: The Art of Strategic Conversation. John Wiley.

Wack, P. (1985a). Scenarios: Uncharted Waters Ahead. Harvard Business Review, 63(5), 73–89.

Wack, P. (1985b). Scenarios: Shooting the Rapids. Harvard Business Review, 63(6), 139–150.

Wald, A. (1947). Sequential Analysis. John Wiley.

Wang, D., and Barabási, A.-L. (2021). The Science of Science. Cambridge University Press.

Wright, T. P. (1936). Factors affecting the cost of airplanes. Journal of the Aeronautical Sciences, 3(4), 122–128.

Yandell, B. H. (2002). The Honors Class: Hilbert's Problems and Their Solvers. A K Peters.


Working paper. Comments and counter-arguments welcome. The companion Problem Timing repository contains historical examples, ranked lists, field guides, audience-specific one-pagers, the dual-use modification and a live revision protocol with dated commitments.

— Siri Southwind

Try it yourself

Score a problem

The framework's eight-axis rubric, made interactive. The point is not to add the numbers — it is to see the shape. Different shapes correspond to different verdicts, and the framework recognises about a dozen canonical shapes by name.

One scenario at a time. Each set of scores is conditional on an unstated forecast about how the world will unfold. To check robustness, score the same problem twice or three times under different scenarios — for example a modal case, a fast cascade case, and a governance shock case — and compare the verdicts. Robust positions return the same verdict across scenarios; directional positions invert. The discipline is the Pierre Wack / Royal Dutch Shell tradition described in the lineage chapter.

How this works in three steps

  1. Pick an example below to load its scores. Read the verdict, watch which axes the framework treats as decisive, and notice the shape on the radar. The colour of the radar polygon tells you the verdict at a glance.
  2. Adjust the sliders to model your own problem. The two starred axes — verification cost and cost trajectory — carry most of the weight in 2026. Watch the verdict and pattern name change as you move them.
  3. Read the named pattern. Each verdict identifies the canonical shape your scores match — closing-window urgency, brute-force-then-elegance candidate, curve-mispriced bet, and so on. Knowing the pattern is more useful than knowing the verdict.

Pick a worked example

Twelve cases drawn from the smartest and dumbest lists, the current bets, and a small handful of open questions. Each one loads its eight scores, redraws the radar, and shows the framework's reading. Compare two — say Mendel against hand-tuned chess engines — to see how the same eight-axis rubric produces opposite verdicts.

how easy is it to check the answer?
verifier impossibletrivial check
1
how fast is it getting cheaper to solve?
flat curvecollapsing fast
1
if you solved it today, how big is the win?
trivialworld-changing
1
what does it unlock downstream?
nothing followswhole field unlocks
1
does proving possibility unlock something?
nothing demonstratedopens the category
1
how fast is the chance closing?
open foreverclosing now
1
how constrained by energy, atoms, regulation? (inverted)
unconstrainedseverely constrained
1
how empty is the field?
saturatedno one working on it
1
Outer ring = score of 3, inner = 0. Polygon colour mirrors the verdict. Physical-resource axis is plotted inverted so all axes point in the same direction (outward = good for attack-now).
— pick an example to begin —
Verdict: awaiting input
skip
wait
caveats
attack
strong attack
The framework is waiting for scores. Move any slider, or click an example chip above.

From the framework's own anti-pattern catalogue: do not treat this rubric as a calculator. The numbers are a discipline for noticing — not a substitute for reasoning. False precision is the worst failure mode of any allocation framework.