Gotchas

The Shelf Life of Private Company Data (And Why Most of What You're Paying For Is Already Wrong)

Private company databases advertise millions of profiles. But how much of that data is actually current? We dug into the freshness problem and found that the majority of paid data decays within 6 months of collection.

V

Vlad Shostak

· 7 min read

Chart showing data freshness decay curve for private company financial information

A $35,000 Annual Subscription With a 6-Month Expiration Date

A PE associate at a well-known lower-middle-market fund told me something last quarter that stuck. She pulled a target's revenue from a paid database, used it in her IC memo, and then discovered during QofE that the actual number was 40 percent higher. The database figure was from a 2022 survey response. The company had grown significantly in the interim. The memo went to committee with stale data underpinning the valuation range.

Nobody caught it until diligence. The deal nearly died over the gap between expected and actual EBITDA because the team had anchored on wrong numbers from a "trusted" source.

This is not an isolated incident. It is the default experience for anyone relying on static private company databases in the lower middle market.

How Private Company Data Actually Gets Collected

Most people do not think about the supply chain behind the numbers they are using. It matters.

The major private company data providers collect financial data through a mix of methods, each with a different reliability and freshness profile.

Self-reported surveys. Companies receive annual emails asking them to update their profiles. Response rates are typically 15 to 30 percent. The companies that respond tend to be larger, more institutionally-oriented, and more interested in being found by investors. The ones that do not respond (the majority) retain whatever figure was last captured, which might be 2 or 3 years old.

Algorithmic inference. When survey data is unavailable, providers use models that estimate revenue from headcount, web traffic, and other proxy signals. These models are updated on varying schedules. Some refresh quarterly. Some annually. The model outputs carry a timestamp but often display without any indicator of confidence or freshness.

News and filing extraction. Some data comes from press releases, SEC filings (for companies with public debt), state filings, or court records. This data is accurate at the time of capture but rarely updated once captured.

Human research teams. Premium tiers sometimes involve analysts manually researching companies. This produces the highest quality data but scales poorly. Even the largest providers cannot manually refresh more than a fraction of their total coverage annually.

The Freshness Problem by the Numbers

We looked at this systematically across several hundred lower-middle-market companies (the $10M to $150M revenue band that most PE firms in our ICP target).

Data SourceMedian Age of Revenue FigureCoverage in $10M to $150M Band
Paid databases (survey-based)14 to 18 months~20 to 35% have any revenue data
Paid databases (model-based)12 to 16 months~45% have a figure, many are inferred
Business registries and aggregators20+ months~60% have something but accuracy is low
LinkedIn (employees)Real-time~90% have countable profiles
USASpending (contracts)Real-time~12% have government contracts

The critical observation is that median age. Fourteen months might not sound terrible, but for a growing company at 20 to 30 percent annual growth (common in the LMM), fourteen months of stale data means the actual revenue is 20 to 30 percent higher than what the database shows. That is the difference between a $40M company and a $52M company. It changes the multiple math. It changes the comp set. It changes whether the company fits your mandate at all.

The coverage illusion

Providers advertise "millions of company profiles" but the relevant number for a PE buyer is what percentage of their actual target universe has current, accurate revenue data. In the $10M to $150M band, that number is typically 20 to 35 percent for any single provider.

Five Gotchas to Watch For

1. "Revenue" Might Mean Different Things

Some databases report total revenue. Others report estimated ARR for SaaS companies. Others report "revenue range" which is a bucket (e.g., "$25M to $50M") rather than a point estimate. When you see a revenue figure, you often cannot tell whether it includes one-time project revenue, whether it is net or gross, whether it accounts for recent divestitures or acquisitions. The number looks precise but the methodology behind it is opaque.

2. Employee Count Figures Lag Reality

Most databases scrape LinkedIn on a monthly or quarterly schedule and cache the result. But LinkedIn itself shows real-time data. If you are relying on a database for headcount, you are seeing last month's (or last quarter's) snapshot. For rapidly growing or shrinking companies, this matters. Go to the source directly.

3. "Last Updated" Does Not Mean "Verified"

Many platforms show a "Last Updated" timestamp that refreshes whenever their algorithm re-runs, even if the underlying data did not change. A company profile that says "Updated January 2025" might simply mean the algorithm looked at it in January and decided the old figure was still its best guess. It does not mean anyone verified the number or that new information was incorporated.

4. Growth Rates Are Calculated From Stale Baselines

If a database shows "25% YoY growth" it is typically calculated from two stale data points rather than from real-time information. A revenue figure from 18 months ago compared to one from 30 months ago gives you the growth rate from two and a half years ago. Not current growth. This is worse than useless for deal evaluation because it creates false confidence in a precise-looking but outdated number.

5. Survivorship Bias in Coverage

Companies that actively maintain their profiles on data platforms tend to be companies seeking investor attention. Companies that do not (founder-owned, family-owned, profitable businesses that do not need capital) are exactly the targets most LMM PE firms want, and they are exactly the ones with the worst coverage. The database coverage is inversely correlated with attractiveness for buy-and-hold acquirers.

The Alternative Mental Model

Instead of thinking about private company data as something you look up in a database, think about it as something you triangulate from real-time signals. The distinction matters because triangulation gets fresher over time (as underlying signals update) while databases get staler.

Real-time signals that remain current.

  • LinkedIn employee count (updates daily as people join and leave)
  • Job postings (real-time hiring velocity)
  • Government contract awards (updated within days of award)
  • State filings (annual but predictable refresh cycles)
  • Press releases and news (captured as published)
  • Web traffic trends (continuously measured)

The key insight is that these signals are always available, always current, and in combination they produce revenue estimates that are more accurate than a single stale database figure. This is not controversial. It is just arithmetic applied to the freshness problem.

When Paid Databases Still Make Sense

This is not a "paid data is useless" argument. Paid databases provide genuine value in several scenarios.

For breadth-first screening when you need to identify the universe (all IT services companies in Texas between 100 and 500 employees), databases with structured filters save enormous time. They are good for defining the population even if individual figures need verification.

For historical context, knowing that a company reported $20M in 2021 gives you a useful baseline, even if the current figure is higher. It bounds the problem.

For contact information and org charts, sales intelligence platforms remain the fastest path to executive names, titles, and direct contact details. The financial data on these platforms might be stale but the people data is usually more current.

The gotcha is when you use a database figure as your primary source of truth for a specific company you are actively evaluating. That is where stale data creates real problems in IC memos, valuation models, and LOI pricing.

The practical rule

Use databases for universe definition and historical context. Use real-time triangulation for any company you are actively evaluating or monitoring. The cost of getting this wrong on a live deal is orders of magnitude higher than the time cost of running a fresh estimate.

What We Built

Internal Insight synthesizes real-time signals (employee data, contracts, filings, press, job postings) into a fresh revenue estimate every time you run a query. No caching, no stale figures, no algorithms running on last year's inputs. The data is as current as the underlying sources, which in most cases means today.

Run a fresh estimate →

V

Vlad Shostak

Founder, Internal Insight

Writing on private company valuation, deal sourcing, and the mechanics of financial estimation for lower middle market dealmakers.

TopicsData QualityPrivate MarketsFinancial DataDue Diligence
All articles

Keep reading