Borrowing Predictive Strength, In Practice
TL;DR & connection to the prior post
In Borrowing Predictive Strength we argued that manager → fund → deal hierarchies make private-equity forecasting saner than trying to divine truth from noisy fund IRRs. This post picks up where that one ended: we synthesize a realistic PE universe, run the PyMC version of the hierarchy, and look at what the posterior actually says about variance shares, shrinkage, and “skill.”
Everything below mirrors that stack.
Data structure: managers → funds → deals
To stress the hierarchy with something that feels like PE reality, I spun up a tiny SQLite universe of acronym-style managers. Each one carries a strategy tilt (venture, buyout, multi) that bleeds into fund stages, deal volumes, and eventual IRRs. SQLModel keeps the schema lightweight, but the interesting bits are the distributions: ≈6 funds per manager, 15–30 deals per fund depending on stage, and stage-aware check sizes so venture funds really do look noisier.
Managers. Metadata plus the strategy lever that determines later stage mix.
Funds. Vehicles inherit their manager and stage bias, track regions/sectors, and keep the return levers the model consumes.
Deals. Each fund spawns a pile of deals with sector tags and entry/exit dates—the raw material the hierarchy shrinks.
Strategy-aware seeding
Managers draw from {Venture, Buyout, Multi-Strategy} and each choice pushes both fund stages and deal check sizes into different bands. That means the hierarchy has to reconcile genuinely different populations instead of uniform noise.
Managers get a truncated Normal number of funds (mean ≈ 6, capped at 12) and each fund draws a stage-aware deal count (venture 15–35, buyout 10–20, growth 12–24). The end result is an intentionally messy training set where venture gunslingers and buyout grinders coexist, giving the Bayes stack something lifelike to chew on.
Building the PyMC hierarchy
The PyMC model is a straight translation of the hierarchy we introduced earlier: the market-wide anchor , manager offsets scaled by , fund offsets scaled by , and residual deal noise all show up verbatim. In code:
Once sampled, those coordinates give the familiar shrinkage ladder:
while the manager variance contracts according to
Sample it (small numbers shown; increase draws/tune for real inference) and the CLI will report posterior variance components, manager anchors, and shrunken fund means—the same ingredients we visualized in the prior article. In the current run, for example, it prints
manager share ≈ 17.1%, fund share ≈ 12.9%, deal share ≈ 70.0%; top posterior fund mean = UIR Holdings Fund 09 (36.1%); manager anchors ≈ 16.4%, 19.9%, 23.3%, 27.6%, 17.0%, 21.3%
Those numbers line up exactly with the tables in the snapshot section, and the JSON powering the React components is just a serialized copy of that output.
Posterior variance shares
Where uncertainty comes from
Shrinkage from raw fund means to hierarchical posteriors
Cross-level shrinkage (top movers)
Predicting a new fund under the hierarchy
New fund predictor (posterior vs global)
Curves reflect the actual PyMC posterior means/dispersion for each manager after sharing strength with the hierarchy.
Holdout predictive performance
Predictive scorecard (real holdout deals)
These scores come directly from the PyMC posterior predictive vs. the actual held-out deals (75/25 split).
Snapshot of the latest run
The tables below mirror the dataset that powers the React plots so the prose and visuals stay in sync.
Variance decomposition (new data)
| Layer | Share (%) |
|---|---|
| Manager (τₘ²) | 17.06 |
| Fund (τ_f²) | 12.94 |
| Deal (σ_d²) | 70.00 |
Variance shares confirm what intuition hinted at: almost 70% of the uncertainty lives at the deal level, leaving ~17% for persistent manager effects and ~13% for fund-to-fund drift within a manager. That’s exactly the regime where partial pooling should shine.
Biggest shrinkage moves (top 5, new data)
| Manager / Fund | Raw mean (%) | Posterior (%) | Manager anchor (%) | Train deals |
|---|---|---|---|---|
| OTUI Investments / Fund 12 | 39.28 | 32.41 | 16.36 | 14 |
| UIR Holdings / Fund 09 | 42.06 | 36.14 | 23.26 | 13 |
| OTUI Investments / Fund 20 | 1.99 | 6.95 | 16.36 | 11 |
| OTUI Investments / Fund 15 | 4.99 | 9.12 | 16.36 | 10 |
| ZANO Investments / Fund 22 | 12.66 | 16.39 | 21.34 | 8 |
Shrinkage reorders the leaderboard immediately: venture rockets like OTUI Fund 12 give up seven percentage points once they borrow strength from a mediocre manager anchor, while underdogs like Fund 20 gain five points simply because the hierarchy refuses to believe 2% IRR is destiny.
Manager anchors (new data)
| Manager | Posterior mean (%) | Posterior sd (%) |
|---|---|---|
| OTUI Investments | 16.36 | 2.89 |
| XNRX Holdings | 19.93 | 5.47 |
| UIR Holdings | 23.26 | 2.75 |
| RQF Investments | 27.56 | 3.06 |
| UOQ Partners | 17.01 | 3.66 |
| ZANO Investments | 21.34 | 2.54 |
Manager anchors settle into clean tiers: RQF around 27½%, UIR in the low 20s, OTUI languishing in the mid-teens. The posterior sds tell us who’s still volatile (XNRX with only a handful of funds) versus who has enough history to pin down a skill estimate.
New-fund predictive vs global baseline (OTUI Investments)
| Deals | Manager mean (%) | Manager sd (%) | P(r̄_new < 0) | Global mean (%) | Global sd (%) | Global P(r̄_new < 0) |
|---|---|---|---|---|---|---|
| 6 | 16.36 | 9.88 | 4.79% | 22.57 | 6.50 | 0.03% |
| 12 | 16.36 | 8.75 | 3.01% | 22.57 | 4.60 | <0.01% |
| 24 | 16.36 | 8.12 | 2.19% | 22.57 | 3.25 | ≈0% |
Walking the OTUI numbers through the new-fund predictor shows why hierarchy matters: a six-deal OTUI vehicle carries a ~5% chance of going negative, while the pooled global baseline is practically never below zero. That spread is the cost of insisting on manager-specific priors.
Holdout predictive scores (171 deals)
| Model | Avg log score ↑ | Brier (r < 0) ↓ |
|---|---|---|
| No pooling[1] | 0.3933 | 0.0782 |
| Complete pooling[2] | 0.2639 | 0.0850 |
| Hierarchical[3] | 0.3978 | 0.0784 |
On held-out deals the hierarchy delivers the best average log score while no-pooling still edges it on Brier[4] by a whisker (0.0782 vs 0.0784). Both easily beat complete pooling, and the hierarchy remains interpretable at the manager/fund level.
What this model buys you
Structured shrinkage is the headline: the hierarchy refuses to crown a fund champion because of one hot streak and drags tiny samples back toward their manager anchor. That behavior surfaced the OTUI funds that looked heroic in raw form but now sit squarely in the middle of the pack.
Because each manager gets its own posterior, the predictive story finally respects strategy differences. A new RQF venture fund is allowed to be adventurous (and still likely positive) while a global baseline remains far more conservative. Those probability curves are exactly what deal teams ask for when debating capital allocations.
We also get transparent trade-offs. The variance table quantifies why deal noise dominates, and the holdout scorecard shows that the hierarchy earns its keep on log score while conceding a hair of Brier to no-pooling. That explicit accounting is far easier to sell internally than “trust me, Bayes works.”
Taken together, the hierarchy is no longer a thought experiment—it’s a runnable PyMC module with a repeatable data pipeline, interpretable posteriors, and ready-made hooks for experimentation. Let me know what extension you want to stress-test next (macro shocks, co-invest overlays, cashflow timing, etc.).
Stress testing the hierarchy
To keep myself honest, I perturb the posterior in three simple ways and recompute the predictive metrics:
- Volatility spike: inflate the deal-level noise () by 50%.
- Sector downturn: translate every manager/fund mean down by 600 bps.
- Data scarcity: pretend we only observed a quarter as many training deals (and inflate deal noise by 20% to mimic the added uncertainty).
The visualization below shows how the hierarchical log score and Brier error[4] move relative to the baseline and how those shifts translate into the predictive curve for our reference manager (RQF Investments with a 6-deal fund).
Stress testing the hierarchy
Positive log-score bars mean the hierarchical model holds up under the scenario; positive Brier bars mean lower (better) error relative to baseline.
Effect on RQF Investments (n ≈ 6 deals):
| Scenario | Mean (%) | Δ Mean (pp) | sd (%) | Δ sd (pp) | P(r̄ < 0) (%) | Δ P (pp) |
|---|---|---|---|---|---|---|
| Volatility spike | 27.56 | 0.00 | 12.31 | 2.38 | 1.33 | 0.96 |
| Sector downturn | 21.56 | -6.00 | 9.93 | 0.00 | 1.64 | 1.27 |
| Data scarcity | 27.56 | 0.00 | 21.00 | 11.07 | 9.39 | 9.02 |
Turning posteriors into manager skill
The same hierarchy that shrinks fund noise also gives us a principled “skill curve” for each manager. We treat manager skill as the posterior of relative to the complete-pooling anchor :
Sampling from PyMC makes those integrals trivial: each draw of just becomes a Bernoulli trial for “beats pooling”, and we can tally how often a manager is the global leader by checking which is largest per draw. That yields:
- Credible lifts — posterior mean ± 80 % interval for every manager anchor, expressed in net IRR percentage points above the pooled baseline.
- Skill probabilities — and straight from the joint posterior draws.
- Data depth context — the same table keeps the number of funds and training deals that informed each anchor, so high skill scores with thin data are easy to flag.
The component below pulls directly from the new JSON payload exported by the PyMC script:
Posterior view of manager skill
Red line = complete pooling anchor 22.6%; shaded ridge area represents posterior mass where μ exceeds that anchor.
| Manager | Posterior (%) | Lift vs pool (pp)[5] | P(μ>μ_pool) | P(top mgr) | Expected rank[6] | Funds | Deals |
|---|---|---|---|---|---|---|---|
| RQF Investments | 27.6 | 5.0 | 94.3% | 78.7% | 1.3 | 7 | 162 |
| UIR Holdings | 23.3 | 0.7 | 59.1% | 10.2% | 2.6 | 7 | 82 |
| ZANO Investments | 21.3 | -1.2 | 31.0% | 2.7% | 3.3 | 9 | 109 |
| XNRX Holdings | 19.9 | -2.6 | 30.8% | 7.5% | 3.8 | 1 | 12 |
| UOQ Partners | 17.0 | -5.6 | 6.0% | 0.6% | 4.9 | 4 | 49 |
| OTUI Investments | 16.4 | -6.2 | 2.1% | 0.4% | 5.2 | 8 | 101 |
The ridge chart shows each manager’s posterior density, with the shaded region highlighting the probability mass above the pooled anchor (). That makes “skill” literally the area of the curve past red, while the table still gives ranks and odds. In this seed, RQF Investments keeps ~79 % of the mass above the anchor (posterior mean 27.6 %), UIR Holdings sits near a coin-flip at 59 %, and OTUI Investments barely clears 2 % despite having plenty of data—exactly the nuance we wanted when translating the hierarchy into an investable ranking.
Next experiment: slicing skill by stage, geo, and sector
The fun part about having the hierarchy in place is that we can start slicing it in richer ways without redesigning the entire model. Three experiments are on deck:
- Manager × stage/geo skill curves. Rather than a single , give each manager stage- and region-specific anchors (venture vs buyout, North America vs Europe). Funds already carry
stageandfocus_region, so we can write
Skill then becomes —“does this manager outperform the global benchmark for stage ?” That would let us paint a ridge chart per stage and show, for example, that RQF’s venture funds are the real edge while its buyout funds simply match market medians.
- Deal × sector effects. At the deal layer we can bolt on sector offsets
and score the probability that a sector effect is positive globally or within a manager. Think of it as a sector attribution view: “this manager’s healthcare deals outperform their own anchor with Y% probability,” or “fund 14 has negative exposure to SaaS” even if the overall fund still looks strong.
- Scenario overlays. With posterior draws in hand we can stress-test macro scenarios directly. Examples: shock all venture funds down 300 bps to mimic a liquidity freeze, inflate for a single sector, or ask “what happens to each manager’s ranking if we remove their largest fund?” Mathematically it’s just shifting or zeroing out subsets of draws, but the business story becomes concrete.
These sketches are parked here so we can pick them up in a future post. Each one only needs a couple of latent parameters in PyMC, a new aggregation in the JSON payload (manager×stage, manager×geo, manager×sector, scenario deltas), and another ridge/table component like the ones above to make the story visible.
- Note [1][back]
No pooling = estimate each fund in isolation using only its own sample mean.
- Note [2][back]
Complete pooling = collapse the entire dataset into one global mean, ignoring manager and fund structure.
- Note [3][back]
Hierarchical = partial pooling; funds borrow strength from their manager anchor and the global prior.
- Note [4][back]
The Brier score is the mean squared error between predicted probabilities and binary outcomes (lower is better).
- Note [5][back]
Lift vs pool = posterior manager mean minus the complete-pooling baseline (positive values indicate outperformance).
- Note [6][back]
Expected rank = posterior expectation of each manager’s ordering (1 = most likely top performer).