NOV9

SUN2025

Borrowing Predictive Strength, In Practice

Walking the data flow from

p(\theta \mid y)

draws to JSON moments

m=\mathbb{E}[g(\theta)\mid y]

private equitybayesianpymchierarchicalsqlmodelsynthetic datastress testing

TL;DR & connection to the prior post

In Borrowing Predictive Strength we argued that manager → fund → deal hierarchies make private-equity forecasting saner than trying to divine truth from noisy fund IRRs. This post picks up where that one ended: we synthesize a realistic PE universe, run the PyMC version of the hierarchy, and look at what the posterior actually says about variance shares, shrinkage, and “skill.”

\begin{aligned} r_{d,f,m} &\sim \mathcal{N}(\mu_{f,m}, \sigma_d^2) \\ \mu_{f,m} &\sim \mathcal{N}(\mu_m, \tau_f^2) \\ \mu_m &\sim \mathcal{N}(\mu_0, \tau_m^2) \end{aligned}

Everything below mirrors that stack.

Data structure: managers → funds → deals

To stress the hierarchy with something that feels like PE reality, I spun up a tiny SQLite universe of acronym-style managers. Each one carries a strategy tilt (venture, buyout, multi) that bleeds into fund stages, deal volumes, and eventual IRRs. SQLModel keeps the schema lightweight, but the interesting bits are the distributions: ≈6 funds per manager, 15–30 deals per fund depending on stage, and stage-aware check sizes so venture funds really do look noisier.

Managers. Metadata plus the strategy lever that determines later stage mix.

class Manager(SQLModel, table=True):
    __tablename__ = "managers"

    id: int | None = Field(default=None, primary_key=True)
    name: str = Field(index=True, max_length=128)
    strategy_focus: str | None = Field(default=None, max_length=64)
    capital_under_management_bil: float | None = None

    funds: List["Fund"] = Relationship(back_populates="manager")

Funds. Vehicles inherit their manager and stage bias, track regions/sectors, and keep the return levers the model consumes.

class Fund(SQLModel, table=True):
    __tablename__ = "funds"

    id: int | None = Field(default=None, primary_key=True)
    name: str = Field(index=True, max_length=128)
    manager_id: int = Field(foreign_key="managers.id")
    vintage_year: int = Field(index=True)
    stage: str | None = Field(default=None, max_length=64)
    focus_region: str | None = Field(default=None, max_length=64)
    focus_sector: str | None = Field(default=None, max_length=64)
    invested_capital_musd: float | None = None
    realized_value_musd: float | None = None
    unrealized_value_musd: float | None = None
    irr_net: float | None = Field(default=None)

    manager: "Manager" = Relationship(back_populates="funds")
    deals: List["Deal"] = Relationship(back_populates="fund")

Deals. Each fund spawns a pile of deals with sector tags and entry/exit dates—the raw material the hierarchy shrinks.

class Deal(SQLModel, table=True):
    __tablename__ = "deals"

    id: int | None = Field(default=None, primary_key=True)
    fund_id: int = Field(foreign_key="funds.id")
    company_name: str = Field(index=True, max_length=128)
    deal_stage: str | None = Field(default=None, max_length=32)
    entry_date: date
    exit_date: date | None = None
    cost_musd: float | None = None
    value_musd: float | None = None
    irr_net: float | None = Field(default=None, description="Deal IRR in decimals")

    fund: "Fund" = Relationship(back_populates="deals")

Strategy-aware seeding

Managers draw from {Venture, Buyout, Multi-Strategy} and each choice pushes both fund stages and deal check sizes into different bands. That means the hierarchy has to reconcile genuinely different populations instead of uniform noise.

# python_experiments/python_experiments/data_layer/populate_fake_data.py
MANAGER_STRATEGIES = ["Venture", "Buyout", "Multi-Strategy"]

STAGE_WEIGHTS_BY_STRATEGY = {
    "Venture": {"Venture": 0.65, "Growth": 0.25, "Buyout": 0.07, "Infrastructure": 0.03},
    "Buyout": {"Buyout": 0.7, "Growth": 0.15, "Infrastructure": 0.1, "Venture": 0.05},
    "Multi-Strategy": {"Buyout": 0.35, "Growth": 0.2, "Venture": 0.35, "Infrastructure": 0.1},
}

def build_funds(managers: list[Manager], rng: Random) -> list[Fund]:
    for manager in managers:
        strategy = manager.strategy_focus or "Multi-Strategy"
        stage = _weighted_choice(STAGE_WEIGHTS_BY_STRATEGY[strategy], rng)
        ...

Managers get a truncated Normal number of funds (mean ≈ 6, capped at 12) and each fund draws a stage-aware deal count (venture 15–35, buyout 10–20, growth 12–24). The end result is an intentionally messy training set where venture gunslingers and buyout grinders coexist, giving the Bayes stack something lifelike to chew on.

Building the PyMC hierarchy

The PyMC model is a straight translation of the hierarchy we introduced earlier: the market-wide anchor $\mu_0$ , manager offsets scaled by $\tau_m$ , fund offsets scaled by $\tau_f$ , and residual deal noise $\sigma_d$ all show up verbatim. In code:

# python_experiments/python_experiments/models/hierarchical_bayes.py
with pm.Model() as model:
    mu0 = pm.Normal("mu0", mu=global_mean, sigma=global_sd * 2)
    tau_m = pm.HalfNormal("tau_m", sigma=max(global_sd, 0.2))
    tau_f = pm.HalfNormal("tau_f", sigma=max(global_sd / 2, 0.1))
    sigma_d = pm.HalfNormal("sigma_d", sigma=max(global_sd, 0.2))

    manager_offset = pm.Normal("manager_offset", mu=0.0, sigma=1.0, shape=n_managers)
    mu_manager = pm.Deterministic("mu_manager", mu0 + manager_offset * tau_m)

    fund_offset = pm.Normal("fund_offset", mu=0.0, sigma=1.0, shape=n_funds)
    mu_fund = pm.Deterministic("mu_fund", mu_manager[manager_of_fund] + fund_offset * tau_f)

    pm.Normal("deal_returns", mu=mu_fund[fund_idx], sigma=sigma_d, observed=returns)

Once sampled, those coordinates give the familiar shrinkage ladder:

\hat{\mu}_{f} = w_f \bar{y}_f + (1-w_f)\hat{\mu}_m, \qquad w_f = \frac{n_f / \sigma_d^2}{n_f / \sigma_d^2 + 1 / \tau_f^2},

while the manager variance contracts according to

\mathrm{Var}(\mu_m \mid \text{data}) = \left[ \frac{1}{\tau_m^2} + \sum_f \frac{1}{\tau_f^2 + \sigma_d^2 / n_f} \right]^{-1}.

Sample it (small numbers shown; increase draws/tune for real inference) and the CLI will report posterior variance components, manager anchors, and shrunken fund means—the same ingredients we visualized in the prior article. In the current run, for example, it prints

manager share ≈ 17.1%, fund share ≈ 12.9%, deal share ≈ 70.0%; top posterior fund mean = UIR Holdings Fund 09 (36.1%); manager anchors ≈ 16.4%, 19.9%, 23.3%, 27.6%, 17.0%, 21.3%

Those numbers line up exactly with the tables in the snapshot section, and the JSON powering the React components is just a serialized copy of that output.

Posterior variance shares

Where uncertainty comes from

Shrinkage from raw fund means to hierarchical posteriors

Cross-level shrinkage (top movers)

Avg |shrink| ≈ 3.22 pp

Sample: 14 largest moves

Predicting a new fund under the hierarchy

New fund predictor (posterior vs global)

Manager

Deals in new fund

Manager-informed mean ≈ 16.36%

sd ≈ 9.88%

P(r̄_new < 0): mgr 4.8% vs global 0.0%

Curves reflect the actual PyMC posterior means/dispersion for each manager after sharing strength with the hierarchy.

Holdout predictive performance

Predictive scorecard (real holdout deals)

Log best ≈ 0.398

Brier best ≈ 0.078

Holdout N = 171

These scores come directly from the PyMC posterior predictive vs. the actual held-out deals (75/25 split).

Snapshot of the latest run

The tables below mirror the dataset that powers the React plots so the prose and visuals stay in sync.

Variance decomposition (new data)

Layer	Share (%)
Manager (τₘ²)	17.06
Fund (τ_f²)	12.94
Deal (σ_d²)	70.00

Variance shares confirm what intuition hinted at: almost 70% of the uncertainty lives at the deal level, leaving ~17% for persistent manager effects and ~13% for fund-to-fund drift within a manager. That’s exactly the regime where partial pooling should shine.

Biggest shrinkage moves (top 5, new data)

Manager / Fund	Raw mean (%)	Posterior (%)	Manager anchor (%)	Train deals
OTUI Investments / Fund 12	39.28	32.41	16.36	14
UIR Holdings / Fund 09	42.06	36.14	23.26	13
OTUI Investments / Fund 20	1.99	6.95	16.36	11
OTUI Investments / Fund 15	4.99	9.12	16.36	10
ZANO Investments / Fund 22	12.66	16.39	21.34	8

Shrinkage reorders the leaderboard immediately: venture rockets like OTUI Fund 12 give up seven percentage points once they borrow strength from a mediocre manager anchor, while underdogs like Fund 20 gain five points simply because the hierarchy refuses to believe 2% IRR is destiny.

Manager anchors (new data)

Manager	Posterior mean (%)	Posterior sd (%)
OTUI Investments	16.36	2.89
XNRX Holdings	19.93	5.47
UIR Holdings	23.26	2.75
RQF Investments	27.56	3.06
UOQ Partners	17.01	3.66
ZANO Investments	21.34	2.54

Manager anchors settle into clean tiers: RQF around 27½%, UIR in the low 20s, OTUI languishing in the mid-teens. The posterior sds tell us who’s still volatile (XNRX with only a handful of funds) versus who has enough history to pin down a skill estimate.

New-fund predictive vs global baseline (OTUI Investments)

Deals	Manager mean (%)	Manager sd (%)	P(r̄_new < 0)	Global mean (%)	Global sd (%)	Global P(r̄_new < 0)
6	16.36	9.88	4.79%	22.57	6.50	0.03%
12	16.36	8.75	3.01%	22.57	4.60	<0.01%
24	16.36	8.12	2.19%	22.57	3.25	≈0%

Walking the OTUI numbers through the new-fund predictor shows why hierarchy matters: a six-deal OTUI vehicle carries a ~5% chance of going negative, while the pooled global baseline is practically never below zero. That spread is the cost of insisting on manager-specific priors.

Holdout predictive scores (171 deals)

Model	Avg log score ↑	Brier (r < 0) ↓
No pooling^[1]	0.3933	0.0782
Complete pooling^[2]	0.2639	0.0850
Hierarchical^[3]	0.3978	0.0784

On held-out deals the hierarchy delivers the best average log score while no-pooling still edges it on Brier^[4] by a whisker (0.0782 vs 0.0784). Both easily beat complete pooling, and the hierarchy remains interpretable at the manager/fund level.

What this model buys you

Structured shrinkage is the headline: the hierarchy refuses to crown a fund champion because of one hot streak and drags tiny samples back toward their manager anchor. That behavior surfaced the OTUI funds that looked heroic in raw form but now sit squarely in the middle of the pack.

Because each manager gets its own posterior, the predictive story finally respects strategy differences. A new RQF venture fund is allowed to be adventurous (and still likely positive) while a global baseline remains far more conservative. Those probability curves are exactly what deal teams ask for when debating capital allocations.

We also get transparent trade-offs. The variance table quantifies why deal noise dominates, and the holdout scorecard shows that the hierarchy earns its keep on log score while conceding a hair of Brier to no-pooling. That explicit accounting is far easier to sell internally than “trust me, Bayes works.”

Taken together, the hierarchy is no longer a thought experiment—it’s a runnable PyMC module with a repeatable data pipeline, interpretable posteriors, and ready-made hooks for experimentation. Let me know what extension you want to stress-test next (macro shocks, co-invest overlays, cashflow timing, etc.).

Stress testing the hierarchy

To keep myself honest, I perturb the posterior in three simple ways and recompute the predictive metrics:

Volatility spike: inflate the deal-level noise ( $\sigma_d$ ) by 50%.
Sector downturn: translate every manager/fund mean down by 600 bps.
Data scarcity: pretend we only observed a quarter as many training deals (and inflate deal noise by 20% to mimic the added uncertainty).

The visualization below shows how the hierarchical log score and Brier error^[4] move relative to the baseline and how those shifts translate into the predictive curve for our reference manager (RQF Investments with a 6-deal fund).

Stress testing the hierarchy

Volatility spike: Deal noise +50% (σ_d × 1.5)

Sector downturn: Shift returns −600 bps (μ−0.06)

Data scarcity: Quarter as many training deals (n × 0.25)

Positive log-score bars mean the hierarchical model holds up under the scenario; positive Brier bars mean lower (better) error relative to baseline.

Effect on RQF Investments (n ≈ 6 deals):

Scenario	Mean (%)	Δ Mean (pp)	sd (%)	Δ sd (pp)	P(r̄ < 0) (%)	Δ P (pp)
Volatility spike	27.56	0.00	12.31	2.38	1.33	0.96
Sector downturn	21.56	-6.00	9.93	0.00	1.64	1.27
Data scarcity	27.56	0.00	21.00	11.07	9.39	9.02

Turning posteriors into manager skill

The same hierarchy that shrinks fund noise also gives us a principled “skill curve” for each manager. We treat manager skill as the posterior of $\mu_m$ relative to the complete-pooling anchor $\mu_0$ :

\Delta_m = \mu_m - \mu_0, \qquad \Pr(\mu_m > \mu_0 \mid y) = \int \mathbf{1}\{\mu_m > \mu_0\} \, p(\mu_m, \mu_0 \mid y) \, d\mu_m \, d\mu_0.

Sampling from PyMC makes those integrals trivial: each draw of $(\mu_m, \mu_0)$ just becomes a Bernoulli trial for “beats pooling”, and we can tally how often a manager is the global leader by checking which $\mu_m$ is largest per draw. That yields:

Credible lifts — posterior mean ± 80 % interval for every manager anchor, expressed in net IRR percentage points above the pooled baseline.
Skill probabilities — $P(\mu_m > \mu_0)$ and $P(\text{manager } m \text{ is top})$ straight from the joint posterior draws.
Data depth context — the same table keeps the number of funds and training deals that informed each anchor, so high skill scores with thin data are easy to flag.

The component below pulls directly from the new JSON payload exported by the PyMC script:

Posterior view of manager skill

Red line = complete pooling anchor 22.6%; shaded ridge area represents posterior mass where μ exceeds that anchor.

Manager	Posterior (%)	Lift vs pool (pp)^[5]	P(μ>μ_pool)	P(top mgr)	Expected rank^[6]	Funds	Deals
RQF Investments	27.6	5.0	94.3%	78.7%	1.3	7	162
UIR Holdings	23.3	0.7	59.1%	10.2%	2.6	7	82
ZANO Investments	21.3	-1.2	31.0%	2.7%	3.3	9	109
XNRX Holdings	19.9	-2.6	30.8%	7.5%	3.8	1	12
UOQ Partners	17.0	-5.6	6.0%	0.6%	4.9	4	49
OTUI Investments	16.4	-6.2	2.1%	0.4%	5.2	8	101

The ridge chart shows each manager’s posterior density, with the shaded region highlighting the probability mass above the pooled anchor ( $\Pr(\mu_m > \mu_0)$ ). That makes “skill” literally the area of the curve past red, while the table still gives ranks and odds. In this seed, RQF Investments keeps ~79 % of the mass above the anchor (posterior mean 27.6 %), UIR Holdings sits near a coin-flip at 59 %, and OTUI Investments barely clears 2 % despite having plenty of data—exactly the nuance we wanted when translating the hierarchy into an investable ranking.

Next experiment: slicing skill by stage, geo, and sector

The fun part about having the hierarchy in place is that we can start slicing it in richer ways without redesigning the entire model. Three experiments are on deck:

Manager × stage/geo skill curves. Rather than a single $\mu_m$ , give each manager stage- and region-specific anchors $\mu_{m,s}$ (venture vs buyout, North America vs Europe). Funds already carry stage and focus_region, so we can write

\mu_{m,s} \sim \mathcal{N}(\mu_m, \tau_{\text{stage}}^2), \qquad \mu_{f} \mid s_f = s \sim \mathcal{N}(\mu_{m,s}, \tau_f^2).

Skill then becomes $\Pr(\mu_{m,s} > \mu_s^{\text{pool}})$ —“does this manager outperform the global benchmark for stage $s$ ?” That would let us paint a ridge chart per stage and show, for example, that RQF’s venture funds are the real edge while its buyout funds simply match market medians.

Deal × sector effects. At the deal layer we can bolt on sector offsets

r_{d} \sim \mathcal{N}(\mu_{f} + \beta_{\text{sector}(d)}, \sigma_d^2), \qquad \beta_{c} \sim \mathcal{N}(0, \tau_{\text{sector}}^2),

and score the probability that a sector effect is positive globally or within a manager. Think of it as a sector attribution view: “this manager’s healthcare deals outperform their own anchor with Y% probability,” or “fund 14 has negative exposure to SaaS” even if the overall fund still looks strong.

Scenario overlays. With posterior draws in hand we can stress-test macro scenarios directly. Examples: shock all venture funds down 300 bps to mimic a liquidity freeze, inflate $\sigma_d$ for a single sector, or ask “what happens to each manager’s ranking if we remove their largest fund?” Mathematically it’s just shifting $(\mu_m, \tau_m, \tau_f, \sigma_d)$ or zeroing out subsets of draws, but the business story becomes concrete.

These sketches are parked here so we can pick them up in a future post. Each one only needs a couple of latent parameters in PyMC, a new aggregation in the JSON payload (manager×stage, manager×geo, manager×sector, scenario deltas), and another ridge/table component like the ones above to make the story visible.

Note [1]
No pooling = estimate each fund in isolation using only its own sample mean.
[back]
Note [2]
Complete pooling = collapse the entire dataset into one global mean, ignoring manager and fund structure.
[back]
Note [3]
Hierarchical = partial pooling; funds borrow strength from their manager anchor and the global prior.
[back]
Note [4]
The Brier score is the mean squared error between predicted probabilities and binary outcomes (lower is better).
[back]
Note [5]
Lift vs pool = posterior manager mean minus the complete-pooling baseline (positive values indicate outperformance).
[back]
Note [6]
Expected rank = posterior expectation of each manager’s ordering (1 = most likely top performer).
[back]