Differentiated component (down)
Scarce inputs — leading-edge chips, fabrication equipment, energy, HBM. Chamath’s “fulcrum assets.”16 Captures value through scarcity and chokepoint control.
In every technology wave the money is made twice — first on the infrastructure, then on the applications.
A compiled, source-verified research digest — every claim cites a downloaded source, every figure is drawn from the data behind it. Not a personal essay.
Historical technology revolutions follow a pattern: an installation phase that over-funds infrastructure, then a deployment phase in which value is realised through broad application and usage.9 AI is mid-cycle. So far, revenue has pooled in the infrastructure layers — semiconductors, hyperscaler capex, cloud — dropping by roughly an order of magnitude at each step up toward applications.6 Two forces are now redistributing it: the cost of a fixed level of machine intelligence is collapsing — roughly 10× per year, a 1,000× fall in three years for GPT-3-class output1 and a more-than-280× fall in ~1.5 years for GPT-3.5-equivalent output3 — and the foundation-model layer is commoditising as capabilities converge and enterprise share reshuffles.5 The durable moats are moving up the stack to proprietary data, workflow integration, distribution, and the interface,713 though a sharp dissent argues the model and inference layer itself captured operating leverage in 2025–26.19
Carlota Perez’s history of technological revolutions gives the canonical map. Each 50–60-year cycle splits into an installation period — when financial capital floods into a new technology, builds its infrastructure, and inflates an asset bubble — and a deployment period, when production capital takes over and the technology’s value is realised broadly across the economy through applications and usage.9 Railways, steel and electricity, and oil and mass production each ran this arc; so did information and telecommunications from 1971.9 The turning point between the two phases is typically a financial crash.9
Two complementary frameworks explain where in a value chain the money settles once a wave matures. Ben Thompson’s smiling curve — originated by Acer’s Stan Shih to explain the PC market — holds that value concentrates at the two ends of a chain (differentiated components on one side, brand, aggregation and distribution on the other) while the undifferentiated middle of assembly and integration gets margin-compressed.8 His Aggregation Theory sharpens the right-hand end: once the internet makes distribution of digital goods free, “suppliers can be commoditized leaving consumers/users as a first order priority,” and the firm that owns the customer relationship and user experience commoditises the suppliers beneath it.7
Why does the value lag the infrastructure rather than arriving with it? Brynjolfsson, Rock and Syverson’s productivity J-curve gives the peer-reviewed mechanism. General-purpose technologies “enable and require significant complementary investments, including co-invention of new processes, products, business models and human capital” — investments that are largely intangible, mismeasured, and slow to pay off.10 Productivity dips before it accelerates, and the firms that ultimately capture the gains are those that build the complementary assets around the technology, not those that supply the raw technology.10 That is, by construction, the application and workflow layer.
The scale of value in play at that layer is large. McKinsey estimates generative AI could add $2.6–4.4 trillion in annual economic value across 63 use cases, with about 75% of it concentrated in just four functions — customer operations, marketing and sales, software engineering, and R&D.11 That value is realised in how the technology is embedded in work, not in the model weights themselves — which is precisely why the question of which layer captures it matters.
The real opportunity isn’t in bigger models but in how AI models are applied.13 — Summit Partners, “Beyond Foundation Models: The Real Value of AI Lies in Applications” (2025)
Map the AI stack as a vertical chain — semiconductors, hyperscaler infrastructure, cloud AI services, then foundation models and applications — and 2024 revenue falls by roughly an order of magnitude at each step up.6 Semiconductors earned an estimated $100–200B in AI revenue, led by Nvidia at a ~$105B run rate.6 The four largest hyperscalers spent more than $175B on capex over four quarters, roughly half of it on data centres and real estate.6 Cloud AI services did ~$20–25B. AI models and applications together sold only $5–10B — “a STEEP drop off in revenue from semiconductors, to data centers, to the cloud.”6
That distribution is the setup for the whole debate, because it is not obviously sustainable. Sequoia’s David Cahn priced the implied bill: take Nvidia’s run-rate data-centre revenue, double it for total cost of ownership, double it again for a 50% end-user gross margin, and you get the AI end-revenue the build-out implicitly requires.12 By mid-2024 that figure had risen from a “$200B question” to a “$600B question,” with the annual gap between implied and actual revenue widening from a $125B “hole” in September 2023 to roughly $500B in June 2024.12 The infrastructure spend is only justified if the value eventually shows up at the application and end-customer layer.12 The question this paper is really about is whether — and where — it will.
The first force redistributing value is the falling price of machine intelligence. a16z’s Guido Appenzeller coined “LLMflation” for it: “for an LLM of equivalent performance, the cost is decreasing by 10× every year.”1 The cost of reaching MMLU-42 performance fell from $60 per million tokens (GPT-3, November 2021) to $0.06 (Llama 3.2 3B) — a 1,000× reduction over three years.1 Stanford’s AI Index puts an independent number on a higher capability tier: inference for a model scoring GPT-3.5-equivalent (MMLU 64.8) “dropped from $20.00 per million tokens in November 2022 to just $0.07 per million tokens by October 2024 — a more than 280-fold reduction in approximately 1.5 years.”3 The drivers are stacked: better GPU cost-performance, 16-bit-to-4-bit quantisation, software optimisation, smaller models surpassing larger predecessors, improved tuning, and open-source competition compressing margins.1
The honest framing is a range, not a single curve. Epoch AI’s task-by-task study found prices falling “between 9× per year and 900× per year, with a median of 50× per year,” with the fastest trends starting after January 2024 — after which the median rate rose from 50× to 200× per year.2 Different tasks commoditise at different speeds: GPT-4’s performance on PhD-level science questions fell 40× per year, while the frontier itself does not fall at all — OpenAI’s o1 launched at the same $60 per million output tokens as GPT-3 in 2021.1 Chamath Palihapitiya, from the opposite side of the value-capture argument, supplies a corroborating figure: “the price of running a model has dropped 1,500× in six years, and intelligence is becoming free.”16
A frequently-repeated figure holds that the cost of GPT-3.5-level intelligence fell ~10,000× over 2022–2026. No source in this corpus states that. The strongest primary (a16z) supports ~1,000× over three years for GPT-3-quality output;1 Stanford documents >280× in ~1.5 years for GPT-3.5-equivalent output;3 Epoch’s range tops out at 900×/year on specific tasks;2 Chamath cites 1,500× over six years.16 Treat 10,000× as an aggressive extrapolation, not a sourced number. The defensible claim is “roughly 10× per year, two-to-three orders of magnitude over the era — unevenly, by task.”
The second force is convergence at the model layer. EQT’s investors put the thesis plainly: “over time the hype around the foundational models will probably subside and they’re likely to become commoditized … a convergence point where all the models are good enough for most standard business applications.”14 Their portfolio companies already “use either open source or just very cheap models because the volume … tends to be very high.”14 Summit Partners describes the same dynamic from the supply side: foundation-model builders “push for better performance at lower costs, often with shrinking margins.”13
The clearest evidence is enterprise share volatility. Menlo Ventures’ usage data shows OpenAI’s enterprise model share falling from 50% at the end of 2023 to 25% by mid-2025, recovering only to 27% by year-end; Anthropic rose from 12% to 40% and Google from 7% to 21% over the same window.520 A layer where the leader can lose half its share in eighteen months is a contested input, not a fortress.
Commoditisation does not mean buyers always chase the cheapest token, which is where multi-model orchestration enters as a hedge against lock-in. Menlo’s data shows churn happening within providers more than between them: 66% of enterprises upgraded models inside their existing provider, 23% made no switch, and only 11% switched vendors — yet within a month of Claude 4’s release it took 45% of Anthropic users while Sonnet 3.5 dropped from 83% to 16%.20 Builders chase performance fast, but they hold the option to move. The same study records the compute centre of gravity shifting from training to inference: 74% of startups now run majority-inference workloads, up from 48% a year earlier.20 Independent of any one report, Innovation Endeavors describes models becoming “10× cheaper, faster, and more capable year over year,” with capability churn rapid enough that models are characterised as obsoleting within weeks.18
If models are a converging, cheapening input, defensibility has to come from somewhere else. The corpus is consistent on where: proprietary data, workflow integration, distribution, and control of the interface. EQT’s conclusion is that “the value is likely to accrue to the application layer and the product companies,” with the best-positioned firms being “those with pre-existing contracts, proprietary data or physical infrastructure.”14 Summit Partners frames durable advantage as a property of application design — narrow domain focus, measurable ROI, and integration with existing workflows — not model access.13 The Brynjolfsson J-curve is the academic foundation underneath both: the returns flow to the complementary intangible assets built around the technology.10
The counter-tension is essential and the corpus does not paper over it. M Accelerator’s case against thin “wrappers” is blunt: “AI wrappers don’t have moats because anyone can call the same APIs you’re using — your entire business model is one OpenAI update away from irrelevance.”15 Better prompts are discoverable, superior UI does not prevent switching when costs drop, and first-mover advantage is irrelevant when switching takes minutes; real defensibility appears only at the layers of proprietary data accumulation and network effects.15 The application-layer thesis is therefore conditional: value is real where there is a hard problem, proprietary data, and deep workflow lock-in — not for the generic middle.
Scarce inputs — leading-edge chips, fabrication equipment, energy, HBM. Chamath’s “fulcrum assets.”16 Captures value through scarcity and chokepoint control.
The customer relationship and workflow surface. Aggregation Theory’s prize: commoditise suppliers, own the user.7 Proprietary data and lock-in.14
Value concentrates at the ends of the chain; the undifferentiated middle compresses — the smiling curve, applied to AI8
Adobe is the most-cited example of an incumbent monetising AI through an existing product surface rather than as a standalone model. On its Q1 FY’25 call, CEO Shantanu Narayen described a three-stage approach — innovate, track usage, then “ensuring value and monetization” — and argued that “when somebody buys Creative Cloud or when somebody buys Document Cloud, in effect, they are actually monetizing AI.”17 AI-first products generated “more than $125 million” exiting Q1, expected to double by end of fiscal ‘25.17 The mechanism — value captured at the application and interface layer, not the model — is exactly the thesis.
A widely-circulated line attributes to Narayen the words “we make our money at the interface layer.” That verbatim phrasing was not found in the cited Diginomica reporting or any source located for this corpus, and is treated as unverified.17 The same article does not contain a “$5B AI-influenced ARR” figure (Adobe’s $5.71B is total Q1 revenue; AI-first products were ”>$125M”). Use the verbatim quotes above, not the paraphrase.
The clearest single proof that baseline intelligence is commoditising is that it now runs on a phone. Microsoft’s Phi-3 technical report introduces phi-3-mini, a 3.8-billion-parameter model trained on 3.3 trillion tokens, “whose overall performance … rivals that of models such as Mixtral 8×7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.”4 Quantised to 4 bits it occupies roughly 1.8GB; the authors deployed it on an iPhone 14 with the A16 Bionic chip “running natively on-device and fully offline,” at more than 12 tokens per second.4 The “runs on a phone” claim, unlike the 10,000× claim, is fully verified.
This is reinforced by the underlying unit-economics trend. Stanford’s AI Index records ML hardware costs for a fixed performance level dropping 30% per year while energy efficiency improves 40% annually — the B100 is 33.8× more energy-efficient than the 2016 P100.3 When frontier-2022 capability fits in 1.8GB on commodity silicon and the silicon itself gets 30% cheaper a year, the pricing power of the model layer erodes from below as well as from competition.43
The application-layer thesis is the corpus’s center of gravity, but it is contested from two directions, and intellectual honesty requires holding both. The first dissent says value pools down, not up. Chamath argues computing eras are won by those controlling fulcrum assets — chokepoints where value concentrates — invoking the pattern of “Rockefeller had 90% of refining by 1880, Cisco had 85% of routing by 2000,” and locating today’s chokepoints in chip-manufacturing equipment, specialty materials, and energy rather than applications.16
The second dissent is sharper still: the model and inference layer itself captured the value in the current window. Clarice Qiu’s “The Year the Commodity Layer Ate the Stack” argues that under specific 2025–26 conditions — high-willingness-to-pay workloads (Claude Code reaching $2.5B ARR in nine months), persistent quality gaps favouring frontier models, and serving-stack innovation cutting cost-per-token faster than retail prices fell — the supposed commodity middle became “the operating-leverage layer,” like oil refining when demand outran capacity.19 Crucially, her conclusion does not rescue the thin middle: the most-exposed position is “applications with no proprietary data, workflow lock-in, distribution advantage, or control over failure modes,” squeezed by labs capturing margin from above and cheap inference from below.19
Reconciled honestly, the evidence supports a layered conclusion rather than a slogan. Value pools at the ends of the chain — scarce physical inputs at the bottom, owned data, workflow and interface at the top — consistent with the smiling curve.8 The model layer is genuinely contested: converging toward “good enough” for standard tasks,14 yet capable of capturing operating leverage where quality and willingness-to-pay stay high.19 What is consistent across every source — bull and bear — is that the thin, undifferentiated middle has no durable claim on the value, and that enterprise spend is, for now, splitting roughly evenly between applications and infrastructure as the deployment phase gets underway.5
Whether 2025–26 model-layer operating leverage19 is a durable structural shift or a temporary spread that closes as capacity catches up is unresolved in the corpus. Qiu herself flags that semiconductor process nodes create stickier constraints than the temporary refining spreads in her analogy. Resolving it would require margin and pricing data the saved sources do not contain.