In the End, it May Just be Judgement that Matters Most.
An honest assessment of my AI musings and where the puck has moved so so very fast – for now.
AI 1.0 is over.
What I’m calling the ChatGPT Age — that initial phase where we all marveled at chatbots that could write poetry and summarize documents — has run its course. We’ve moved at a pace I genuinely didn’t think was possible from mildly useful copilots to agentic interfaces sitting on top of metadata to smart agents that humans direct to — and this is the one that keeps me up at night — semi-autonomous systems of agents that can work independently for days, self-organize, and complete applications, tasks, and entire job functions without much more than periodic check-ins from a human.
AI 2.0 — the Claude Code / CoWork / Codex Era — is a fundamentally different animal.
Before I start writing about what this new era means for software investors and founders — and I have a lot to say — I wanted to check my homework first. I’ve been writing about the impact of AI on application software for 18 months across six posts on this Substack, and I’ve made some big, specific, sometimes uncomfortable predictions. Some of which I’ve been told are either bold, reckless, or “vc dreck” depending on who’s talking.
But there’s another reason I’m doing this exercise. I’m trying to internalize what all of this means while trying not to lose sleep over what happens to a US economy propped up by consumer spending when massive swaths of truck drivers, car drivers, and white-collar workers get displaced by AI. Not to mention a political system that couldn’t survive NAFTA and China entering the WTO. That anxiety is real, and if you don’t feel at least a twinge of it, you’re not paying attention.
So before I tackle what’s next, let’s see if my assumptions about AI 1.0 held up. If not, I should save all of us the time and effort and just stop.
The Core Argument: Why I Said 75% of App Companies Would Disappear
In my very first post — The Three Ages of Applications Software (July 2024) — I made a prediction that a lot of people thought was nuts:
75% fewer employee-facing application software companies. 50% fewer employee-facing application software categories. A return to “winner take most.”
This wasn’t a throwaway number. The reasoning behind it is what matters, and too many people latched onto the headline without processing the actual logic chain.
For 30 years, application software has operated on the same fundamental user experience model: hunt and peck. Users translate what they actually want to accomplish into navigating tabs, views, screens, and fields — what I call the CRUD interface era. Cloud and SaaS made deployment and purchasing dramatically easier, but the core UX — the actual experience of using the software — barely changed. You still had to learn how to use the software. The software never learned how to work for you.
The result? The average knowledge worker uses 10-15 applications. Most of those applications aren’t actually that helpful — they’re repositories and workflow shells that require YOU to do the work of translating your intent into their structure. And companies spend a staggering $750 billion annually on professional services and administrative personnel just to insert their context into the software they’ve purchased. That’s ass-backwards if we’re honest with ourselves.
AI flips all of that. When software can actually do the job — not just support you while you do the job — several things happen simultaneously.
First, users will gravitate to one primary agent interface per function. Not one underlying LLM — one interface. One agentic experience that serves as their primary assistant. Humans have a context-switching constraint that machines don’t. The second a user gets a taste of a genuinely helpful agent, they don’t want to juggle five others. They want that one agent to do more. And in this new world, it’s dramatically easier for software companies to say yes — you don’t need to hire scrum teams to build arrays of screens and workflows. I laid this out in The Implications of the AI Age (August 2024), and this was before MCP came out and accelerated that consolidation dynamic by 1000x.
Second, if you don’t have meaningful context, you disappear as a useful agent — and likely as a company entirely. If your product serves a micro-category without enough meaningful metadata to justify a standalone purchase, you’ve got two endgames: either a broader player absorbs your capability, or your customer builds it themselves.
Third — and this is one I want to make sure lands because it’s playing out exactly as predicted — the return of buy versus build. I talked about this across multiple posts. If an application vendor doesn’t provide meaningful context to you as a customer, or it takes too much time and effort to insert YOUR context into their tool, you’re better off assembling it yourself. Twenty-five years ago, the counter-argument was that building was expensive, painful, and you didn’t have the expertise. Those arguments have largely collapsed. Vibe coding, agent-building platforms, Lovable, Cursor — the lift to build is a fraction of what it was. One of our founders recently built a new product module over a single weekend that his team had spent months debating whether to pursue — without bothering their 75+ strong engineering org. They plan to launch it next week with a 50% upcharge.
And we’re seeing this validated broadly with the explosion of FDES (essentially free professional services) and the massive PS deals that Anthropic, OpenAI, and Palantir are signing with enterprises to custom-build their AI solutions rather than buy packaged software from vendors. Sure seems like the buy vs. build reckoning I predicted has arrived.
Chasing the Moats
If there’s one word that runs through all six of my posts, it’s context. But here’s the thing I didn’t fully appreciate until I looked at these posts together: I’ve spent the last 18 months chasing moats. Each time I thought I’d identified the durable competitive advantage in AI software, it became table stakes or got almost completely commoditized — in months, not years. If you consider the evolution of technology over the last 25 years, nothing has moved this fast. Every moat I identified was real when I wrote about it. And then it wasn’t.
Let me trace the chase:
Moat 1: Metadata (mid-2024). In those early posts, I was hammering on metadata as the moat. More metadata means more context for your agent. More context means better AI. Better AI means a defensible advantage. In AI Unlocks New Super Powers for Founders (November 2024), we were passing instantly on pure LLM wrappers — the metadata moat really mattered. I also predicted that infra providers like Databricks and the hyperscalers would move aggressively up the stack, and that many startups trying to solve the gaps that existed to make AI work correctly would find those gaps filled by the platform providers themselves. Status: commoditized. MCP and LLM-native integrations made everyone’s metadata accessible to everyone else.
Moat 2: Speed and customer insight (early 2025). In The Change Economy (March 2025), I shifted focus to how the speed of product evolution was rewriting every aspect of running a software company. The success formula became Speed of Execution × Level of Insightfulness × Density of High-Quality Talent. JTBD was the organizing framework throughout. Deep customer insight into jobs to be done was the right to play. Status: necessary but insufficient. Speed is the price of entry now, not the moat.
Moat 3: Systems of Action (mid-2025). Then came Schema, Shema: Context for the Win (July 2025), where I said something that surprised even me: my own metadata-as-moat thesis was becoming dead on arrival. Schema doesn’t matter. CRUD doesn’t matter. What matters is what you do with context. Systems of Records have interesting metadata, but Systems of Actions actually get work done. Even Satya Nadella was essentially making the same argument about where value was migrating in the stack. Status: correct for about a year, then commoditized. Systems of record fought back with their own agentic layers, and it became so easy to build software that anyone could stand up a system of action in a week.
Moat 4: Judgment (late 2025). And by It’s The End of the (ARR) World and I Feel Fine (October 2025), I’d arrived at what I believe is the real answer — the one that’s not getting commoditized anytime soon. But more on that shortly.
The pattern here is striking. In previous technology eras, a competitive advantage might hold for five or ten years before getting competed away. In AI, I watched moats I wrote about with conviction get commoditized in months. Metadata, integration capability, even the ability to take action on data — all of it became table stakes faster than any technology shift I’ve seen in 30 years. Which is what makes the judgment thesis so important: it may be the first moat in the AI era that actually holds.
The Honest Scorecard
Let me be direct about how I did.
What I got mostly correct:
SaaS consolidation direction — I called the mechanics, not just the headline.
When I wrote about this in July 2024, I wasn’t just saying “there will be fewer companies.” I laid out the specific mechanisms for why: software’s hunt-and-peck UX was fundamentally broken, users were drowning in apps that weren’t actually helpful, and the second AI could genuinely do the work, users would consolidate around one primary agent per function. Adjacent categories would get absorbed or built internally. That reasoning chain — not the headline number — is what I got right, over a year before “SaaS Apocalypse” became consensus.
One agent interface per function — and I want to be precise.
I didn’t mean one underlying LLM. I meant one agentic interface — one primary experience per functional user, with other agents operating as sub-agents underneath. I wrote about this Boss/Helper/Conductor taxonomy in August 2024. MCP didn’t exist yet. When it arrived, it accelerated this pattern by orders of magnitude, but the user behavior I predicted — gravitating to fewer interfaces, not more — was already happening before the plumbing caught up.
Buy versus build returning — and it’s bigger than I originally framed it.
The core logic was simple: if a vendor doesn’t provide meaningful context, or it takes too much effort to insert your context into their tool, just build it yourself. In the on-premise days, we won that debate with “you’re not experts at building software” and “there are no economies of scale.” Both arguments have collapsed. And we’re seeing it validated at enterprise scale — the FDES explosion, the massive PS deals from Anthropic, OpenAI, and Palantir. When the biggest AI companies in the world are running services businesses to help enterprises build rather than buy, that’s the reckoning I predicted playing out at a scale I honestly didn’t expect.
GTM transformation.
I said it was going to be dramatically transformed and made far more efficient — because when you have a product that visibly does a job, there’s less labor required to explain and deploy it. What was actually counterintuitive is how important GTM became — especially on the marketing and brand-building side. We’re seeing pro-sumer type growth working for enterprise-level ASP products for the first time in B2B software history. Brand, distribution, and recall matter enormously when technical defensibility is diminishing — which partly explains these enormous financing rounds for companies with small ARR in a world where public SaaS companies trade at 3x multiples.
Capital efficiency — founders can build dramatically more with dramatically less. We’re seeing companies skip entire funding rounds — going from seed to Series B — because the economics are unrecognizable when 80% of your code is LLM-generated, your product ships in weeks, and you don’t need the traditional army of SDRs, SEs, CSMs, and account managers to sell and support something that visibly does the job on its own. The old SaaS efficiency metrics — magic number, cash burn multiple, T3D2 — are becoming relics.
Vertical AI wins (and horizontal gets commoditized fastest).
I predicted that vertical, domain-specific AI companies would emerge as the durable winners — and the logic holds, but the reality is more nuanced than I expected. The biggest early AI breakout stars have been overwhelmingly horizontal: Cursor, Glean, Lovable, Perplexity. The vertical winners — Harvey in legal, Abridge in clinical documentation, EvenUp in demand letters, Supio in PI litigation — are real, growing, and generating serious revenue, but none are in the same cultural conversation yet. Here’s the thing though: that’s actually a feature of the thesis, not a bug. The horizontal plays won first because they grabbed the wide-surface-area jobs — search, code completion, content generation — i.e., they built moats on speed and UX. And those are exactly the moats now getting commoditized fastest. Cursor built “AI-powered coding UX” and now Claude Code and OpenAI Codex are eating that layer alive. Lovable and Bolt built “vibe code an app” and the foundation model companies are shipping that natively. The vertical winners are scaling slower but they’re building on the judgment layer — legal reasoning, clinical decision-making, regulatory compliance — which is genuinely harder to replicate from a standing start. And here’s the part nobody talks about: the gross margin profiles are structurally different. Horizontal AI companies are largely reselling foundation model intelligence with a UX wrapper — which means their margins compress as the models get cheaper and the platform companies ship competing features. Vertical AI companies are selling domain-specific judgment and workflow automation where the model cost is a small fraction of the value delivered — a $500 legal memo that costs $0.12 in compute has a very different margin structure than a code completion that’s one pricing war away from free. My bet: breakout velocity has been horizontal, but durable value — and durable margins — accrues vertical. The vertical winners just take longer to scale because the domains are harder to crack. Give it 18 more months.
JTBD as the organizing framework — it’s no longer just a product methodology, it’s the entire operating system.
In the agentic world, your product strategy is JTBD (what jobs does the agent do?), your pricing is JTBD (new job for a new role = new charge), your adoption metrics are JTBD (how many jobs is the customer actually using?), and your retention signal is JTBD (are they using fewer jobs than they should be?). It’s the one framework that survived every moat I chased. And here’s what’s wild: JTBD is now literally how you build the software. You tell the agents what jobs you want done and they just do it. The framework and the product have become the same thing.
What I evolved on:
Metadata as THE moat
I was right that metadata mattered — harmonizing messy, siloed data into clean ontologies was genuinely hard and genuinely valuable. For about five minutes. MCP essentially became the USB-C of enterprise AI — suddenly any agent could plug into any system and pull structured context without needing a dedicated middleware layer. Integration tools like Merge, Finch, and the foundation model companies themselves commoditized the very access layer I thought would be durable. To my credit, I evolved this in real-time across my own posts — you can literally watch me downgrade metadata from “the moat” to “table stakes” over the course of three essays. The lesson: when you identify something as a moat, start the clock on how fast the ecosystem will commoditize it. In the AI age, that clock runs faster than anyone expects.
Data moats broadly
I still believe data moats exist — but they’re far more fragile than I initially argued, and the fragility is mostly self-inflicted. The companies that closed down API access or tried to monetize their data aggressively (looking at you, Reddit, Twitter/X) discovered that AI companies just… routed around them. The real moat in this initial phase of first-time AI buyers has turned out to be something much more old-fashioned: customer love and brand. When a CFO is choosing their first AI-powered FP&A tool, they’re not running a feature matrix — they’re asking their CFO friends what they use and whether they trust it. That’s a brand and community moat, not a data moat. Whether that holds up once we move from first-time buying to retention and renewal cycles is a genuinely open question — my instinct is that the judgment layer eventually becomes the retention moat, but we’re not there yet for most categories..
Systems of Action as the moat
The concept was dead right — systems that actually do work beat systems that just store data. That thesis held for about a year, which in AI time felt like a decade. Two things killed it as a durable moat. First, the systems of record fought back — Salesforce, Workday, ServiceNow all shipped their own agentic capabilities, and when the incumbent already has the data AND can now act on it, your “action layer” advantage evaporates fast. Second, vibe coding and agent-built software made it so easy to assemble a system of action that there’s no long-term defensibility in just being one — especially when a motivated founder or internal team can rebuild yours in a week using Claude Code or Codex. The system of action isn’t the moat. The judgment about which actions to take, when, and why — that’s the moat. Which is exactly where I ended up.
What I underestimated / missed:
The speed of all of this.
I admitted as much in my November 2024 post — I underestimated how fast customers would embrace consolidation. But even that mea culpa was an understatement. We went from ChatGPT-as-a-party-trick to semi-autonomous agent systems in roughly 18 months. Nobody predicted that pace. Consider the timeline: in mid-2024 I was writing about metadata as a moat. By early 2025 MCP had commoditized the entire access layer. By mid-2025 Claude Code and Codex were writing production software and founders were vibe coding MVPs in a weekend. And then the last 60 days happened. Anthropic shipped Claude Code and CoWork — autonomous agents that don’t just write code but collaborate across entire workflows. OpenAI launched Codex as a standalone coding agent. Both companies dropped new foundation models that made the ones from six months ago feel like toys. We’re not talking incremental improvements — capability step-functions arriving every few weeks. The compression of adoption cycles is the part most people still haven’t internalized. In traditional SaaS, you had years between “interesting demo” and “enterprise deployment.” In the AI age, that collapsed to weeks. A CFO sees a demo, runs a pilot with one analyst, and within a quarter the whole team has shifted. That’s not a normal adoption curve — that’s a phase change. And the knock-on effects are brutal: by the time you’ve shipped your copilot, the market has moved to autonomous agents and your customers are asking why they still need to click buttons. I got the direction right. I got the magnitude spectacularly wrong. And by the time you read this, even the last 60 days will probably look quaint.
How easy it became for anyone to vibe code application interfaces and prototypes. The absolute speed at which founders — or frankly anyone with taste and a clear sense of the problem — can now spin up working application UIs and prototypes has been staggering. It’s forcing all of us to wrestle through a fundamental question: what is the role of product managers and designers in a world where the technical lift to build an interface is nearly zero but taste still matters enormously? The answer isn’t that product and design don’t matter — it’s that their roles are being completely redefined from specifying screens and wireframes to being the arbiters of judgment, taste, and customer insight. That redefinition happened way faster than I expected.
The ability of founders to spin up self-organizing agent teams.
I’ll be honest — this one caught me off guard like in a “fall of your chair” kind of way. I was thinking about agents as tools. Better tools, smarter tools, but tools. What’s actually happening is founders are deploying teams of agents that self-organize, divide work, execute independently for days, and check in periodically for guidance and course correction — like managing a team of junior employees who never sleep, never complain, and get exponentially better every month. I still remember when Sarah Franklin at Lattice said we need to start thinking of agents as employees and got laughed at for it. She was not wrong. She was early — and the gap between “early” and “right” closed faster than anyone expected. A solo founder can now spin up an agent team that handles research, writes code, drafts go-to-market materials, and iterates on product — simultaneously — while the founder focuses on taste, strategy, and customer conversations. That’s not a productivity improvement. That’s a fundamentally different model for how companies get built. The org chart used to be founder → first hires → team. Now it’s founder → agent team → first hires when you actually need humans for things agents can’t do yet. And “things agents can’t do yet” is a list that gets shorter every week. I didn’t predict this at the speed it arrived, and I think most investors and sadly many founders still haven’t fully processed what it means for how we evaluate founding teams, burn rates, and time-to-market assumptions.
How Fast This Moved — Including How I Wrote This Post
The underlying models have become dramatically more effective, especially combined with multi-agent architectures. This isn’t like the old ChatGPT days where your chat window slightly improved over a conversation — like a day-4 employee versus a day-1 employee. These new systems return astonishing improvements over their initial output when you give them feedback and direction.
Case in point: this very blog post was created using Claude’s multi-agent capabilities. I set up a team of four specialized agents — a Researcher to catalog my predictions across all six posts, an Analyst to score them against what actually happened, a Thesis Developer to build out the defensibility framework, and a Writer to draft the piece. They self-organized around the work, divided tasks, and produced a first draft. This was the start:
Was it perfect? Hell no. It was actually pretty funky in places. But here’s what’s fascinating — and this directly illustrates the thesis I’m about to lay out: the more rounds of feedback I gave, the better the output got. Not incrementally better. Substantially better. Which then motivated me to keep investing in giving it feedback and guidance — because I now had trust that it was going to deliver something meaningfully improved. The agents learned my voice. They learned what I cared about. They learned what I meant versus what I said. They were the best editor and thought partner to work with to flesh through ideas – 50x better than AI was able to do this 60 days ago. For the record - it is still too sycophantic. So we added skeptic agent to the team to fix that. That trust → feedback → improvement → more trust → more autonomy loop? Remember it. It’s the whole ballgame.
And here’s the uncomfortable implication: what I just described — the ability to direct, guide, course-correct, and invest in that loop — is itself the differentiating skill now. That’s exactly the type of person we’re all looking to hiring across our portfolio companies. Not specialized roles. High-agency people who solve for outcomes. Anyone who has gone deep into vibe coding or Claude Code completely gets the staggering level of capability available now. What I know for certain is that a large swath of employees don’t bring the level of thought, agency, and guidance required to master these tools — and they may become unemployable. That’s sad but very true. The workforce is splitting into those who can direct and guide AI systems and those who can’t.
Where the Real Game Has Moved: The Last Potential Technical Moat for Application software companies?
After 18 months of chasing technical moats (because brand and network effects are super powerful ones – see Jake Saper’s recent post), here’s the stack as I see it now — and pay attention to the pattern, because it mirrors exactly what I lived through in real-time across those six posts:
Layer 1: Data Access / Context.
Can you connect to and ingest the relevant data for your customer’s business? Eighteen months ago, this was genuinely hard — getting clean access to a customer’s ERP, HRIS, CRM, and financial systems required months of integration work, dedicated engineering teams, and a lot of begging for API keys. Companies that had built those connectors had a real head start. That’s over. MCP became the USB-C of enterprise AI — a universal protocol that lets any agent plug into any system and pull structured context. Between MCP, the integration platforms like Merge and Finch, and the foundation model companies building native connectors, data access has gone from competitive advantage to table stakes in about a year. If your pitch deck still says “proprietary data integrations” as a moat, you need a new pitch deck.
Layer 2: API / MCP.
Can your systems talk to other systems and take action? This is the plumbing layer — the ability to not just read data but actually do things across systems. Trigger a workflow in one tool based on a signal from another. Update a record, send a notification, kick off a process. Six months ago, companies that had built robust action layers across multiple systems had a real edge. Then MCP standardized it, the foundation model companies shipped native tool-use capabilities, and suddenly any agent could orchestrate across systems without needing a dedicated middleware vendor. The plumbing got commoditized even faster than the data access layer — because once you can read from any system, writing to it is just the next obvious step.
Layer 3: Harmonization / Ontology.
What’s the appropriate ontology for a given business you serve? How do you translate raw data into meaningful structure for the domain? This layer is bifurcating fast. For horizontal use cases — CRM data, marketing analytics, general business intelligence — the models are commoditizing ontology work quickly. Feed Claude a messy pile of sales data and it’ll infer the schema and map the relationships in minutes. But for deep vertical use cases, the ontology is the product — and it’s far more durable than most people appreciate. Healthcare is the clearest example: a claims ontology isn’t just “what fields exist.” It’s understanding that a CPT code means something different depending on the payer, the state, the provider type, and whether it’s a primary or secondary claim. A foundation model can organize claims data into reasonable columns — that’s a schema, not an ontology. The ontology is the meaning layer on top: the business rules, the edge cases, the domain-specific relationships that only exist in the heads of people who’ve spent years in that vertical. Legal, insurance, supply chain — same pattern. The messier and more regulated the domain, the harder the ontology is to commoditize. And here’s what’s interesting: in deep vertical use cases, Layer 3 and Layer 4 start to blur together — because the ontology itself is a form of embedded judgment about what matters and how things relate. Which is exactly why the best vertical AI companies are harder to displace than they look from the outside – for now
Layer 4: The Judgment Layer.
This is where the real game has moved. And unlike the layers below it, this one is NOT likely to get commoditized anywhere as fast I believe. Judgment breaks down into two levels of agent skills.
The first level is domain skills — your product. Your agents come out of the box with skillful capabilities for the domain they serve. A legal AI that knows how to draft a demand letter. An FP&A agent that knows how to build a revenue model. A recruiting agent that knows how to screen candidates. These are your features now. Not buttons. Not workflows. Skills. And they improve over time as you aggregate learning across your entire customer base — every deployment, every interaction, every correction makes the generic skill better for everyone. This is where the network effect lives: more customers → better domain skills → harder for a new entrant to match your baseline capability.
The second level is customer-specific judgment — their way. Every company operates differently. Every user has their own preferences, processes, edge cases, and institutional knowledge about how they run things. So you need a second layer that captures individual feedback and maps the agent’s skills to exactly how this customer or this user wants things done. The CFO who caps projections at 8%. The sales team that routes West Coast deals differently. The legal team that softens the indemnification clause for a specific client. If you get the interface right, customers will just give this to you — and you deliver on it immediately. The feedback-to-value cycle is instantaneous.
And here’s what makes Level 2 so powerful: it’s not a black box. The customer has a visible judgment layer and memory model they can see, inspect, and understand. Their preferences, their corrections, their accumulated decisions — all transparent, all theirs. They know exactly what the system knows about them and how it’s being applied. The returns are immediate and obvious — every correction improves their experience in real time, which motivates more corrections, which improves it further. That visibility is what drives the FEED ME dynamic. Nobody invests in training a system they can’t see learning.
Your job as the vendor is to do both simultaneously: take the signal from Level 2 across all your customers and continuously improve the Level 1 domain skills that everyone benefits from, while maintaining the customer-specific judgment that makes each individual deployment feel like theirs. The vendors who nail both levels will be extraordinarily hard to displace. The ones who only nail Level 1 will have good products that feel generic. The ones who only nail Level 2 will have personalized experiences built on mediocre skills. You need both. Every other layer in this stack was, at some point, a genuine competitive advantage but no more. Judgement — at both levels — is probably the last likely technical moat standing as of today.
Why Judgment Is the Key — To Adoption AND Defense
The judgment layer isn’t just your competitive moat. It’s actually the key to unlocking rapid adoption of AI across all businesses — and it’s an absolute requirement if you want people to actually hire a system of agents that is ultimately coming for the jobs they currently perform themselves.
Think about it. If I’m a manager and you’re asking me to hand over a chunk of my team’s work to an AI system, I need to trust two things. First, that the agents are genuinely skilled at the domain — that the baseline capabilities are there. That’s Level 1. But Level 1 alone isn’t enough, because my company doesn’t operate like every other company. So second, I need to trust that the system reflects my judgment about how that work should be done. Not generic best practices. MY way. My company’s way. The edge cases we’ve learned the hard way. The tribal knowledge that exists only in the heads of our best people. That’s Level 2.
Without both levels, adoption stalls — no one wants to hand over their job functions to a system that’s either unskilled or doesn’t understand how they operate. With both, adoption accelerates — and then compounds. Which brings us to the uncomfortable part.
The Portability Paradox
Swapping out your agent vendor isn’t like replacing one productive employee with another equally smart employee. It’s like replacing a 5-year veteran of your company — someone who knows how things work around here, who knows the shortcuts and the landmines and the unwritten rules — with a brand new hire who doesn’t know jack about any of it. That new hire might be brilliant, might have every raw capability in the world, but they don’t know your way of doing things. And you might not even remember your way anymore, because you outsourced that knowledge to the last agent.
That’s the 5-year veteran problem — and it gets worse over time. When you use an agent layer to automate away jobs you used to do yourself, you lose the knowledge of how to do that task. Your institutional memory erodes. The more you feed the system, the more you lose the ability to operate without it.
And here’s the uncomfortable truth that makes this all click: your data is portable. Your judgment is likely not.
MCP is the USB-C of AI. You can export your data. You can migrate your schemas. Theoretically, everything is portable. But now think about the two levels.
Level 1 — the domain skills — was never yours to begin with. That’s the vendor’s product, built from aggregated learning across thousands of customers. No one expects to export that, any more than you’d expect to take Salesforce’s recommendation engine with you when you leave. But it means that switching to a vendor with worse domain skills is an immediate downgrade — regardless of what data you bring with you.
Level 2 — your customer-specific judgment — feels like it should be portable. It’s visible. It’s yours. You can see your preferences, your corrections, your accumulated decisions right there in the memory model. And customers will absolutely push for portability here, just as they push for data portability today. Regulators will follow. I’m not naïve about that.
But here’s why it’s harder than it sounds. Yes, you can export the structured preferences — the rules you consciously stated, the corrections you deliberately made. That’s the easy part. The hard part is that your Level 2 judgment was built on top of a specific vendor’s Level 1 skills. Your preferences were shaped by how that system responded. Your corrections were calibrated to that system’s behavior. Importing your preferences into a competitor’s system only works if their domain skills are good enough to act on them the same way — and they won’t be, because they were built from different data across different customers. Exporting your preferences to a vendor with different Level 1 skills is like handing your playbook to a team that runs a completely different system. They have the instructions but can’t execute the same way.
And because the judgment layer is visible — because the customer can literally see everything they’ve taught the system — they know exactly what they’d be walking away from. That’s not legal lock-in like an annual contract. That’s cognitive lock-in — leaving means abandoning the judgment layer you built with your own hands.
The Trust → Autonomy Flywheel
What I’ve observed — both in our portfolio companies and in my own experience creating this piece — is a very specific virtuous cycle that operates across both levels simultaneously:
You give the agent feedback → it reflects your judgment better → you trust it more → you let it run more autonomously → it delivers more value → you invest more in training it → it gets even better.
This is the FEED ME dynamic. The best agentic products actively prompt their human managers to inject know-how — preferences, edge cases, process exceptions, tribal knowledge, historical decisions that inform how things should be done. And because the system delivers on that feedback immediately — not next quarter, not after retraining, right now — the motivation to keep investing is obvious. Every correction makes the experience better in real time. That instant return is what powers the loop.
And here’s the part that makes this defensible at scale: every piece of Level 2 feedback doesn’t just improve that customer’s experience — it also feeds back into Level 1. The vendor gets smarter about the domain with every customer interaction. Which means the next customer starts with better baseline skills. Which means they trust the system faster. Which means they invest in Level 2 sooner. Which means more signal flowing back into Level 1. It’s a flywheel within a flywheel.
Systems of agents amplify this exponentially. When you have not just one agent but a coordinated team of agents — each learning your preferences for their specific domain, each building judgment about how your organization operates — the accumulated institutional knowledge becomes massive. And massively defensible.
When reinforcement learning truly arrives at scale, the winners will have something far more valuable than proprietary data. They’ll have proprietary know-how at both levels — Level 1 domain skills built from thousands of deployments, and Level 2 customer-specific judgment that makes every individual deployment feel irreplaceable. That’s the ultimate moat: accumulated wisdom that gets smarter with every customer and stickier with every interaction. If you can deliver that, it should make the “should I just build this myself” quandary from your customer a much more difficult one.
Your Product Is Never Finished — And That’s the Point
Here’s what’s crazy and also beautiful about this new era: your product is never shipped as “finished.” It’s finished when the full job to be done is actually done in the manner that your customer wants. And that’s the whole point of the two-level system — Level 1 gives you the skilled baseline, and Level 2 lets the customer shape it to their way. What’s cool is that the customer is willing to guide you and tell you exactly what that looks like — because they can see the judgment layer, they can see the system learning, and they can see the immediate returns on every piece of feedback they give.
For 40-plus years, the feedback loop in software has been a black hole. You submit a support ticket. You request a feature. It disappears into a backlog. Maybe it ships in 18 months. Maybe never. You learn to stop asking.
With agentic systems, that loop closes instantly. Tell the agent how you want something done, and it learns. Right now. Not in the next quarterly release — now. That’s Level 2 working in real time. And in the background, your feedback is making Level 1 better for every customer who comes after you. Just think about what this process used to look like just a few short years ago - looked like 24 months ago — after someone god forbid figured out all the admin rules, configuration settings, and user permissions. And thank god you don’t need to understand the tool’s workflow builder with its cascading and often conflicting sets of rules. Sayonara. Just tell me what you want done and how you want it done and we’ll sort all of that out for you. Now.
So What Do You Do About It?
There is a great deal of public gnashing essentially ala “oh well, it’s kaput for software companies and I guess Claude, Gemini, and OpenAi will represent/replace 100% of the world’s application software”. That’s some weak AF sauce. Winning big at software has always been very war-like and take no prisoners. It’s why when I worked at Siebel and salesforce.com, I used to give new managers on my teams two books –Sun Tzu’s The Art of War (the battleplan) and Daniels Pink’s Drive (how to create warriors). So go out and fight, smartly understanding how the stakes have changed. It’s not like there were strong technical moats for SaaS when the second company in a category showed up – the moats were customer love, network effects, and brand / distribution. Embrace all that AI allows you do that wasn’t defensible (building screens) and wasn’t fun (managing a 1000 FTE org full of explainers with interminably long feedback loops) at unbelievable velocity levels.
For Founders
Build Level 1 domain skills that compound.
Your agents need to arrive skilled — not generic. Every customer deployment should make your baseline capabilities better for the next customer. This is your network effect. If your hundredth customer gets the same out-of-the-box experience as your first, you’re not learning from your own deployments and you have a serious problem.
Make Level 2 judgment capture the core of your product.
Not an afterthought. Not a nice-to-have for V2. This is the feature. Make it dead simple for users to inject their preferences, correct the agent, and teach it their way — and deliver on that feedback immediately. The companies that nail this will see dramatically higher engagement and dramatically lower churn.
Own the judgment layer.
Don’t outsource it to a platform. If your orchestration and judgment are sitting in someone else’s infrastructure, you’re renting your moat. Own it.
Make the feedback loop visible.
Show users their judgment layer. Surface the moments where their input made the system better. Make teaching the agent feel like collaboration, not data entry. When users can see the system learning, they invest more — and they never want to leave.
Go vertical and deep.
Find a vertical where judgment is high-value, tribal knowledge is specific, and edge cases are numerous. That’s where both levels of the flywheel spin fastest — the most to learn at Level 1, the most to teach at Level 2, and the most to lose if you switch.
Be prepared to be headless.
You may not be the agent or interface your customer directly uses. A customer might use Claude or another super-agent as their primary interface — one that orchestrates across your product, their calendar, their email, their phone, everything. In that world, your product is a skill that gets called, not an app that gets opened. And that's fine — because all of the rules above still apply. Your Level 1 domain skills still matter. Your Level 2 customer-specific judgment still matters. The FEED ME loop still works — feedback just flows through a different interface. Think Stripe: nobody sees Stripe's UI, but Stripe's accumulated judgment about fraud, routing, and risk is deeply embedded and extraordinarily hard to replace. Build your product so that it's indispensable whether or not you own the screen.
For Investors
Five questions I’m now asking every company we evaluate:
How strong are the Level 1 domain skills — and are they compounding? If the product isn’t meaningfully better for customer 100 than it was for customer 10, there’s no cross-customer learning happening. That’s a product without a network effect.
How deep is the Level 2 judgment capture at month six? If a company’s product isn’t meaningfully smarter about each customer’s specific way of doing things after six months, the FEED ME loop isn’t working.
What’s the switching terror metric? If you pulled the agent out tomorrow, how much institutional knowledge walks out the door? The higher this number, the more durable the business.
How sophisticated is the feedback mechanism? I don’t mean data integrations. I mean the mechanism for capturing judgment and delivering on it immediately. Is it thoughtful and embedded, or is it an afterthought?
How much do they own the orchestration and judgment layer? If the answer is “we use [platform X] for that,” that’s a yellow flag. The judgment layer is the moat. You can’t outsource your moat.
The Comet
There are two sets of context that create real, durable value for software companies in this new era — and if you’ve been following along, you already know what they are.
The first is the vendor’s accumulated cross-customer wisdom — Level 1. What works for companies like yours. What processes drive results in your industry. What pitfalls to avoid. Software that arrives smarter on day one because of everything it’s learned across thousands of deployments.
The second is your specific customer’s organizational context — Level 2. Their preferences, their processes, their tribal knowledge, their exceptions to the rule, their “we’ve always done it this way and here’s the story behind why.”
Both create value. Both create stickiness. Together, they create something that looks a lot less like software and a lot more like an indispensable member of the team — one who has the wisdom of seeing how hundreds of similar companies operate AND the specific knowledge of how YOUR company wants things done.
The speed at which all of this is happening feels like a comet — beautiful and terrifying, moving way faster than anyone expected. What I’ve learned from 18 months of chasing the moats — being wrong about some things, evolving on others, and getting a few big calls right early — is that the game has fundamentally shifted, yet again.
The context era taught us what information matters. The judgment era will teach us who wins.


You wrote “Tell the agent how you want something done, and it learns”… Doesn’t this assumes that human knows what they want - Ford car vs horse story if you ask customers what they want?
Also there is no guarantee what they want is the best for the business. In my experience, humans are the weakest link in the decision making. AI sees entire context better and makes better decisions.
So the “tell them what you want” will have to move higher up - extreme being “make me the most money, these are the resources you have and these are constraints ” and AI will decide what the best way is.
Great read Brett, your team of agents nailed it 😋
Although I was left wondering where this all ends? As you clearly stated, it's happening faster than anyone ever thought. The impact on the workforce as we know it is undeniable. Not just Saas companies, every company. Massive amounts of people are no longer needed to do the work. This rapid shift and the economic impact it has is yet to be fully realized.
But I guess that's for another blog...