
In investing, some ideas age gracefully. The notion that data can be a source of competitive advantage is one of them. But as artificial intelligence (AI) and large language models (LLMs) reshape the landscape, and, in particular, as market sentiment has shifted violently over the past few weeks against asset-light, software-based businesses, it’s worth asking: what does a true “data moat” look like today…and is it still as defensible as it once was?
The Four Pillars of a Data Advantage
Not all data is created equal. When evaluating companies through the lens of data moats, four categories stand out, each with its own history, mechanics, and vulnerabilities.
1. Proprietary Data: The Unreplicable Asset
Proprietary data is the classic moat: unique, exclusive, and hard to replicate. In the digital era, the most familiar example is Meta’s social graph: an intricate web of Facebook, Instagram, and WhatsApp user relationships, preferences, and behaviors. With enough interactions, these platforms can predict your interests, sometimes before you’re consciously aware of them. A 2015 study from the researchers at the universities of Cambridge and Stanford famously found that with just 300 Facebook “likes,” the platform could know you better than your spouse does!
But proprietary data isn’t limited to social media. Pharmaceutical companies, for example, amass vast troves of experimental results: outcomes from clinical trials, molecular folding experiments, and real-world patient data. This information is not only expensive to generate but also protected by regulatory and competitive barriers. Similarly, Tesla’s self-driving program is built on billions of miles of sensor data collected from real-world driving, not available to would-be competitors without years of effort and scale.
The common thread: proprietary data is an intangible asset, often invisible on the balance sheet, but central to a company’s ability to innovate, target, and defend its market position.
2. Continuously Refreshed Data: The Pulse of the Business
If proprietary data is the castle, continuously refreshed data is the moat’s flowing water, dynamic, always moving, and essential for defense. In sectors like lending and fintech, the value of data is directly tied to its freshness. A credit score from three years ago is about as useful as last year’s weather forecast. Lenders now ingest streams of real-time data—transaction histories, geolocation, even mobile phone usage—to assess risk with greater precision.
Online marketplaces and classified platforms are another case in point. The utility of a housing or car listing depends on its recency; stale data is not just useless, it’s misleading. Companies that can ingest, process, and act on the latest information—think Baltic Classifieds Group, Zillow, AutoTrader, or Alibaba—are better positioned to serve customers and outmaneuver rivals.
The shift to dynamic data isn’t new, but AI has raised the stakes. Models now thrive on up-to-the-minute information, and the ability to refresh, clean, and integrate data at scale is itself a competitive differentiator.
3. High-Dimensional Interactive Data: The Messy Goldmine
Some of the most valuable data is also the messiest. High-dimensional interactive data refers to datasets with many variables and two-way interactions, e.g., customer service logs, buyer-supplier communications, exception handling records, and more. Unlike static transaction logs, these datasets capture the nuance of real-world business: e.g., how problems are escalated, how negotiations unfold, how exceptions are resolved.
For example, a B2B software company might collect detailed logs of every support ticket, chat, and escalation. Over time, this creates a rich, proprietary dataset that not only improves customer service but also trains AI models to anticipate and resolve issues more effectively. In supply chain management, the interplay between buyers and suppliers (e.g., negotiations, delays, substitutions) generates a high-dimensional record that can be mined for insights and optimization.
The challenge is that this data is rarely “clean.” It is unstructured, context-dependent, and often siloed. But for those who can harness it, the rewards are substantial: better products, stickier customer relationships, and a feedback loop that’s hard for competitors to replicate.
4. Closed-Loop Data: The Feedback Advantage
Closed-loop data links actions to outcomes, creating a feedback loop that is invaluable for learning and improvement. In procurement, for example, tracking which suppliers’ components lasted longer or failed sooner allows companies to refine their sourcing strategies and negotiate better terms. In healthcare, linking treatment protocols to patient outcomes enables more effective, personalized care.
This feedback loop is especially powerful in the age of AI. Models trained on closed-loop data can not only predict outcomes but also recommend actions with a higher degree of confidence. For instance, an industrial equipment manufacturer that tracks maintenance actions and subsequent machine performance can optimize service schedules, reduce downtime, and build a moat around its analytics capabilities.
Closed-loop data is often the hardest to assemble—it requires discipline, integration across functions, and a long-term mindset. But once established, it becomes a self-reinforcing source of advantage.
When Moats Erode: The Limits of Data Defensibility and Why Humans Still Hold the Veto
Many data advantages are not built to last. Static or slow moving datasets—e.g., scientific journals, legal databases, widely available medical information—are increasingly vulnerable as LLMs ingest and recombine the world’s knowledge. The “mosaic theory” applies; even without direct access to proprietary sources, a model can often piece together enough from public data to approximate much of the value.
This is already visible in areas like legal research and medical diagnostics, where general purpose AI models can match the performance of specialized, pay-walled databases. As the cost of synthesizing and reasoning over large datasets falls, the bar for data only defensibility rises and pure information moats erode.
Yet in high stakes fields such as law and medicine, even subtle differences in quality still matter enormously. Trust, accountability, and precision are not optional. Here, the moat is less about exclusive datasets and more about relationships and reliability. Firms such as Wolters Kluwer and RELX pair extensive data assets with decades of trust and deep integration into client workflows. That embeddedness makes them harder to dislodge, even as their underlying information becomes more commoditized.
The more immediate constraint, however, is not model capability but human and organizational behavior. Enterprise software adoption is still a career risk management exercise. The old cliché persists because it captures this reality: nobody ever got fired for choosing IBM. Even when an AI-first product is objectively better, it must survive security and governance reviews, procurement cycles measured in quarters, and pervasive fears about data leakage, compliance failures, or an autonomous “agent” behaving unpredictably in production without a clean audit trail.
Incentives often cut against efficiency as well. In large organizations, headcount can translate into status and influence. The executive who “owns” a function may be reluctant to embrace a tool that appears to shrink their domain, especially when the downside risk is personal and the upside is shared across the firm.
Incumbent data providers cannot stand still in this environment. They need to move from selling raw data to selling outcomes and integrated solutions, using AI to automate some of the analysis that used to require manual effort. AI native entrants like Harvey and Open Evidence have gained early traction by moving quickly and serving enthusiastic adopters while larger vendors retool. But winning pilots is different from winning procurement. At scale, long sales cycles, compliance requirements, and integration work tend to favour vendors with existing contracts, embedded distribution, and established trust.
Meanwhile, opensource projects and API access to frontier models continue to flatten the “intelligence” layer. As core model capabilities become broadly available, advantage shifts away from model quality alone and go towards market strength, governance readiness, workflow integration, and operational reliability. In that world, the ultimate veto still sits with humans: the risk committees, buyers, and users whose preferences—and fears—determine which tools actually make it into production.
The Usual Suspects (and the Usual Risks)
So who is best positioned? Unsurprisingly, the incumbents. Tencent, for example, sits atop a mountain of proprietary, dynamic, interactive, and closed-loop data spanning social, payments, e-commerce, and more. With 1.3 billion users, a unified platform, and control over both the top and bottom of the customer funnel, Tencent can target, personalize, and transact with unmatched precision. Its “mini-program” ecosystem, payment rails, and chat logs create a closed loop that few can rival.
Google, too, combines vast data sets with control over the hardware and infrastructure that power AI. Its reach extends from search and advertising to cloud computing and custom AI chips (TPUs), giving it a multi-layered moat that’s both broad and deep.
But even giants aren’t invincible. Platform shifts, like the rise of new interfaces (think AR/VR), could change how users interact with data and AI, threatening even the most entrenched players. Having a head start is not the same as a guaranteed win. The history of technology is littered with incumbents who failed to adapt when the ground shifted beneath them.
The Bottom Line
Data moats still matter. But in a world where AI can synthesize, infer, and reason with astonishing speed, the nature of the moat is shifting. Proprietary, dynamic, interactive, and closed-loop data remain the gold standard. Yet trust, execution, and reliability are just as critical.
For investors, it’s about understanding not just what data a company has, but how—and how well—it uses it. The best moats are built on a foundation of unique data, refreshed continuously, enriched by real-world interactions, and closed by feedback loops. But the moats don’t end with data alone; the walls of the castle of B2B enterprise sales are defended by the human elements of trust, reputation, and good old corporate inertia.
In the end, the companies that win will be those that treat data not as a static asset, but as a living, breathing source of insight, one that must be cultivated, protected, and, above all, put to work.
