NOV9
SUN2025

Borrowing Predictive Strength, In Practice

Walking the data flow from p(θy)p(\theta \mid y) draws to JSON moments m=E[g(θ)y]m=\mathbb{E}[g(\theta)\mid y].
private equitybayesianpymchierarchicalsqlmodelsynthetic datastress testing

TL;DR & connection to the prior post

In Borrowing Predictive Strength we argued that manager → fund → deal hierarchies make private-equity forecasting saner than trying to divine truth from noisy fund IRRs. This post picks up where that one ended: we synthesize a realistic PE universe, run the PyMC version of the hierarchy, and look at what the posterior actually says about variance shares, shrinkage, and “skill.”

rd,f,mN(μf,m,σd2)μf,mN(μm,τf2)μmN(μ0,τm2)\begin{aligned} r_{d,f,m} &\sim \mathcal{N}(\mu_{f,m}, \sigma_d^2) \\ \mu_{f,m} &\sim \mathcal{N}(\mu_m, \tau_f^2) \\ \mu_m &\sim \mathcal{N}(\mu_0, \tau_m^2) \end{aligned}

Everything below mirrors that stack.


Data structure: managers → funds → deals

To stress the hierarchy with something that feels like PE reality, I spun up a tiny SQLite universe of acronym-style managers. Each one carries a strategy tilt (venture, buyout, multi) that bleeds into fund stages, deal volumes, and eventual IRRs. SQLModel keeps the schema lightweight, but the interesting bits are the distributions: ≈6 funds per manager, 15–30 deals per fund depending on stage, and stage-aware check sizes so venture funds really do look noisier.

Managers. Metadata plus the strategy lever that determines later stage mix.

class Manager(SQLModel, table=True):
    __tablename__ = "managers"

    id: int | None = Field(default=None, primary_key=True)
    name: str = Field(index=True, max_length=128)
    strategy_focus: str | None = Field(default=None, max_length=64)
    capital_under_management_bil: float | None = None

    funds: List["Fund"] = Relationship(back_populates="manager")

Funds. Vehicles inherit their manager and stage bias, track regions/sectors, and keep the return levers the model consumes.

class Fund(SQLModel, table=True):
    __tablename__ = "funds"

    id: int | None = Field(default=None, primary_key=True)
    name: str = Field(index=True, max_length=128)
    manager_id: int = Field(foreign_key="managers.id")
    vintage_year: int = Field(index=True)
    stage: str | None = Field(default=None, max_length=64)
    focus_region: str | None = Field(default=None, max_length=64)
    focus_sector: str | None = Field(default=None, max_length=64)
    invested_capital_musd: float | None = None
    realized_value_musd: float | None = None
    unrealized_value_musd: float | None = None
    irr_net: float | None = Field(default=None)

    manager: "Manager" = Relationship(back_populates="funds")
    deals: List["Deal"] = Relationship(back_populates="fund")

Deals. Each fund spawns a pile of deals with sector tags and entry/exit dates—the raw material the hierarchy shrinks.

class Deal(SQLModel, table=True):
    __tablename__ = "deals"

    id: int | None = Field(default=None, primary_key=True)
    fund_id: int = Field(foreign_key="funds.id")
    company_name: str = Field(index=True, max_length=128)
    deal_stage: str | None = Field(default=None, max_length=32)
    entry_date: date
    exit_date: date | None = None
    cost_musd: float | None = None
    value_musd: float | None = None
    irr_net: float | None = Field(default=None, description="Deal IRR in decimals")

    fund: "Fund" = Relationship(back_populates="deals")

Strategy-aware seeding

Managers draw from {Venture, Buyout, Multi-Strategy} and each choice pushes both fund stages and deal check sizes into different bands. That means the hierarchy has to reconcile genuinely different populations instead of uniform noise.

# python_experiments/python_experiments/data_layer/populate_fake_data.py
MANAGER_STRATEGIES = ["Venture", "Buyout", "Multi-Strategy"]

STAGE_WEIGHTS_BY_STRATEGY = {
    "Venture": {"Venture": 0.65, "Growth": 0.25, "Buyout": 0.07, "Infrastructure": 0.03},
    "Buyout": {"Buyout": 0.7, "Growth": 0.15, "Infrastructure": 0.1, "Venture": 0.05},
    "Multi-Strategy": {"Buyout": 0.35, "Growth": 0.2, "Venture": 0.35, "Infrastructure": 0.1},
}

def build_funds(managers: list[Manager], rng: Random) -> list[Fund]:
    for manager in managers:
        strategy = manager.strategy_focus or "Multi-Strategy"
        stage = _weighted_choice(STAGE_WEIGHTS_BY_STRATEGY[strategy], rng)
        ...

Managers get a truncated Normal number of funds (mean ≈ 6, capped at 12) and each fund draws a stage-aware deal count (venture 15–35, buyout 10–20, growth 12–24). The end result is an intentionally messy training set where venture gunslingers and buyout grinders coexist, giving the Bayes stack something lifelike to chew on.


Building the PyMC hierarchy

The PyMC model is a straight translation of the hierarchy we introduced earlier: the market-wide anchor μ0\mu_0, manager offsets scaled by τm\tau_m, fund offsets scaled by τf\tau_f, and residual deal noise σd\sigma_d all show up verbatim. In code:

# python_experiments/python_experiments/models/hierarchical_bayes.py
with pm.Model() as model:
    mu0 = pm.Normal("mu0", mu=global_mean, sigma=global_sd * 2)
    tau_m = pm.HalfNormal("tau_m", sigma=max(global_sd, 0.2))
    tau_f = pm.HalfNormal("tau_f", sigma=max(global_sd / 2, 0.1))
    sigma_d = pm.HalfNormal("sigma_d", sigma=max(global_sd, 0.2))

    manager_offset = pm.Normal("manager_offset", mu=0.0, sigma=1.0, shape=n_managers)
    mu_manager = pm.Deterministic("mu_manager", mu0 + manager_offset * tau_m)

    fund_offset = pm.Normal("fund_offset", mu=0.0, sigma=1.0, shape=n_funds)
    mu_fund = pm.Deterministic("mu_fund", mu_manager[manager_of_fund] + fund_offset * tau_f)

    pm.Normal("deal_returns", mu=mu_fund[fund_idx], sigma=sigma_d, observed=returns)

Once sampled, those coordinates give the familiar shrinkage ladder:

μ^f=wfyˉf+(1wf)μ^m,wf=nf/σd2nf/σd2+1/τf2,\hat{\mu}_{f} = w_f \bar{y}_f + (1-w_f)\hat{\mu}_m, \qquad w_f = \frac{n_f / \sigma_d^2}{n_f / \sigma_d^2 + 1 / \tau_f^2},

while the manager variance contracts according to

Var(μmdata)=[1τm2+f1τf2+σd2/nf]1.\mathrm{Var}(\mu_m \mid \text{data}) = \left[ \frac{1}{\tau_m^2} + \sum_f \frac{1}{\tau_f^2 + \sigma_d^2 / n_f} \right]^{-1}.

Sample it (small numbers shown; increase draws/tune for real inference) and the CLI will report posterior variance components, manager anchors, and shrunken fund means—the same ingredients we visualized in the prior article. In the current run, for example, it prints

manager share ≈ 17.1%, fund share ≈ 12.9%, deal share ≈ 70.0%; top posterior fund mean = UIR Holdings Fund 09 (36.1%); manager anchors ≈ 16.4%, 19.9%, 23.3%, 27.6%, 17.0%, 21.3%

Those numbers line up exactly with the tables in the snapshot section, and the JSON powering the React components is just a serialized copy of that output.

Posterior variance shares

Where uncertainty comes from
17.1%12.9%70.0%Manager τₘ²Fund τ_f²Deal σ_d²020406080100
Variance shares from PyMC posteriorShare (%)

Shrinkage from raw fund means to hierarchical posteriors

Cross-level shrinkage (top movers)
010203040OTUI 12 (Venture)UIR 09 (Growth)OTUI 20 (Buyout)OTUI 15 (Buyout)ZANO 22 (Buyout)UIR 21 (Buyout)UIR 11 (Buyout)ZANO 22 (Buyout)UOQ 22 (Buyout)UOQ 19 (Buyout)ZANO 22 (Growth)OTUI 10 (Buyout)RQF 11 (Venture)ZANO 17 (Buyout)
Raw meanPosteriorManager anchorShrinkage from noisy fund means toward manager anchorsReturn (%)
Avg |shrink| ≈ 3.22 pp
Sample: 14 largest moves

Predicting a new fund under the hierarchy

New fund predictor (posterior vs global)
−20−10010203040500123456
Global-onlyManager-informedPredictive distribution for a brand-new fundFund average return r̄_new (%)Density
Manager-informed mean ≈ 16.36%
sd ≈ 9.88%
P(r̄_new < 0): mgr 4.8% vs global 0.0%

Curves reflect the actual PyMC posterior means/dispersion for each manager after sharing strength with the hierarchy.

Holdout predictive performance

Predictive scorecard (real holdout deals)
0.3930.2640.398No poolCompleteHierarchical00.10.20.30.4
Avg log predictive density (holdout)log score (higher better)
0.0780.0850.078No poolCompleteHierarchical00.020.040.060.08
Brier score for r < 0 (holdout)Brier (lower better)
Log best ≈ 0.398
Brier best ≈ 0.078
Holdout N = 171

These scores come directly from the PyMC posterior predictive vs. the actual held-out deals (75/25 split).


Snapshot of the latest run

The tables below mirror the dataset that powers the React plots so the prose and visuals stay in sync.

Variance decomposition (new data)

LayerShare (%)
Manager (τₘ²)17.06
Fund (τ_f²)12.94
Deal (σ_d²)70.00

Variance shares confirm what intuition hinted at: almost 70% of the uncertainty lives at the deal level, leaving ~17% for persistent manager effects and ~13% for fund-to-fund drift within a manager. That’s exactly the regime where partial pooling should shine.

Biggest shrinkage moves (top 5, new data)

Manager / FundRaw mean (%)Posterior (%)Manager anchor (%)Train deals
OTUI Investments / Fund 1239.2832.4116.3614
UIR Holdings / Fund 0942.0636.1423.2613
OTUI Investments / Fund 201.996.9516.3611
OTUI Investments / Fund 154.999.1216.3610
ZANO Investments / Fund 2212.6616.3921.348

Shrinkage reorders the leaderboard immediately: venture rockets like OTUI Fund 12 give up seven percentage points once they borrow strength from a mediocre manager anchor, while underdogs like Fund 20 gain five points simply because the hierarchy refuses to believe 2% IRR is destiny.

Manager anchors (new data)

ManagerPosterior mean (%)Posterior sd (%)
OTUI Investments16.362.89
XNRX Holdings19.935.47
UIR Holdings23.262.75
RQF Investments27.563.06
UOQ Partners17.013.66
ZANO Investments21.342.54

Manager anchors settle into clean tiers: RQF around 27½%, UIR in the low 20s, OTUI languishing in the mid-teens. The posterior sds tell us who’s still volatile (XNRX with only a handful of funds) versus who has enough history to pin down a skill estimate.

New-fund predictive vs global baseline (OTUI Investments)

DealsManager mean (%)Manager sd (%)P(r̄_new < 0)Global mean (%)Global sd (%)Global P(r̄_new < 0)
616.369.884.79%22.576.500.03%
1216.368.753.01%22.574.60<0.01%
2416.368.122.19%22.573.25≈0%

Walking the OTUI numbers through the new-fund predictor shows why hierarchy matters: a six-deal OTUI vehicle carries a ~5% chance of going negative, while the pooled global baseline is practically never below zero. That spread is the cost of insisting on manager-specific priors.

Holdout predictive scores (171 deals)

ModelAvg log score ↑Brier (r < 0) ↓
No pooling[1]0.39330.0782
Complete pooling[2]0.26390.0850
Hierarchical[3]0.39780.0784

On held-out deals the hierarchy delivers the best average log score while no-pooling still edges it on Brier[4] by a whisker (0.0782 vs 0.0784). Both easily beat complete pooling, and the hierarchy remains interpretable at the manager/fund level.


What this model buys you

Structured shrinkage is the headline: the hierarchy refuses to crown a fund champion because of one hot streak and drags tiny samples back toward their manager anchor. That behavior surfaced the OTUI funds that looked heroic in raw form but now sit squarely in the middle of the pack.

Because each manager gets its own posterior, the predictive story finally respects strategy differences. A new RQF venture fund is allowed to be adventurous (and still likely positive) while a global baseline remains far more conservative. Those probability curves are exactly what deal teams ask for when debating capital allocations.

We also get transparent trade-offs. The variance table quantifies why deal noise dominates, and the holdout scorecard shows that the hierarchy earns its keep on log score while conceding a hair of Brier to no-pooling. That explicit accounting is far easier to sell internally than “trust me, Bayes works.”

Taken together, the hierarchy is no longer a thought experiment—it’s a runnable PyMC module with a repeatable data pipeline, interpretable posteriors, and ready-made hooks for experimentation. Let me know what extension you want to stress-test next (macro shocks, co-invest overlays, cashflow timing, etc.).


Stress testing the hierarchy

To keep myself honest, I perturb the posterior in three simple ways and recompute the predictive metrics:

  • Volatility spike: inflate the deal-level noise (σd\sigma_d) by 50%.
  • Sector downturn: translate every manager/fund mean down by 600 bps.
  • Data scarcity: pretend we only observed a quarter as many training deals (and inflate deal noise by 20% to mimic the added uncertainty).

The visualization below shows how the hierarchical log score and Brier error[4] move relative to the baseline and how those shifts translate into the predictive curve for our reference manager (RQF Investments with a 6-deal fund).

Stress testing the hierarchy
−0.12−0.1−0.08−0.06−0.04−0.020Volatility spikeSector downturnData scarcity
Δ log score (hierarchical)Δ Brier (hierarchical)Stress test deltas vs baseline (hierarchical model)Change vs baseline
Volatility spike: Deal noise +50% (σ_d × 1.5)
Sector downturn: Shift returns −600 bps (μ−0.06)
Data scarcity: Quarter as many training deals (n × 0.25)

Positive log-score bars mean the hierarchical model holds up under the scenario; positive Brier bars mean lower (better) error relative to baseline.

Effect on RQF Investments (n ≈ 6 deals):

ScenarioMean (%)Δ Mean (pp)sd (%)Δ sd (pp)P(r̄ < 0) (%)Δ P (pp)
Volatility spike27.560.0012.312.381.330.96
Sector downturn21.56-6.009.930.001.641.27
Data scarcity27.560.0021.0011.079.399.02

Turning posteriors into manager skill

The same hierarchy that shrinks fund noise also gives us a principled “skill curve” for each manager. We treat manager skill as the posterior of μm\mu_m relative to the complete-pooling anchor μ0\mu_0:

Δm=μmμ0,Pr(μm>μ0y)=1{μm>μ0}p(μm,μ0y)dμmdμ0.\Delta_m = \mu_m - \mu_0, \qquad \Pr(\mu_m > \mu_0 \mid y) = \int \mathbf{1}\{\mu_m > \mu_0\} \, p(\mu_m, \mu_0 \mid y) \, d\mu_m \, d\mu_0.

Sampling from PyMC makes those integrals trivial: each draw of (μm,μ0)(\mu_m, \mu_0) just becomes a Bernoulli trial for “beats pooling”, and we can tally how often a manager is the global leader by checking which μm\mu_m is largest per draw. That yields:

  1. Credible lifts — posterior mean ± 80 % interval for every manager anchor, expressed in net IRR percentage points above the pooled baseline.
  2. Skill probabilitiesP(μm>μ0)P(\mu_m > \mu_0) and P(manager m is top)P(\text{manager } m \text{ is top}) straight from the joint posterior draws.
  3. Data depth context — the same table keeps the number of funds and training deals that informed each anchor, so high skill scores with thin data are easy to flag.

The component below pulls directly from the new JSON payload exported by the PyMC script:

Posterior view of manager skill

Red line = complete pooling anchor 22.6%; shaded ridge area represents posterior mass where μ exceeds that anchor.

0102030401. RQF Investments2. UIR Holdings3. ZANO Investments4. XNRX Holdings5. UOQ Partners6. OTUI Investments
Manager skill ridges (shaded area = P(μ>μ_pool))Net IRR (%)
ManagerPosterior (%)Lift vs pool (pp)[5]P(μ>μ_pool)P(top mgr)Expected rank[6]FundsDeals
RQF Investments27.65.094.3%78.7%1.37162
UIR Holdings23.30.759.1%10.2%2.6782
ZANO Investments21.3-1.231.0%2.7%3.39109
XNRX Holdings19.9-2.630.8%7.5%3.8112
UOQ Partners17.0-5.66.0%0.6%4.9449
OTUI Investments16.4-6.22.1%0.4%5.28101

The ridge chart shows each manager’s posterior density, with the shaded region highlighting the probability mass above the pooled anchor (Pr(μm>μ0)\Pr(\mu_m > \mu_0)). That makes “skill” literally the area of the curve past red, while the table still gives ranks and odds. In this seed, RQF Investments keeps ~79 % of the mass above the anchor (posterior mean 27.6 %), UIR Holdings sits near a coin-flip at 59 %, and OTUI Investments barely clears 2 % despite having plenty of data—exactly the nuance we wanted when translating the hierarchy into an investable ranking.


Next experiment: slicing skill by stage, geo, and sector

The fun part about having the hierarchy in place is that we can start slicing it in richer ways without redesigning the entire model. Three experiments are on deck:

  1. Manager × stage/geo skill curves. Rather than a single μm\mu_m, give each manager stage- and region-specific anchors μm,s\mu_{m,s} (venture vs buyout, North America vs Europe). Funds already carry stage and focus_region, so we can write
μm,sN(μm,τstage2),μfsf=sN(μm,s,τf2).\mu_{m,s} \sim \mathcal{N}(\mu_m, \tau_{\text{stage}}^2), \qquad \mu_{f} \mid s_f = s \sim \mathcal{N}(\mu_{m,s}, \tau_f^2).

Skill then becomes Pr(μm,s>μspool)\Pr(\mu_{m,s} > \mu_s^{\text{pool}})—“does this manager outperform the global benchmark for stage ss?” That would let us paint a ridge chart per stage and show, for example, that RQF’s venture funds are the real edge while its buyout funds simply match market medians.

  1. Deal × sector effects. At the deal layer we can bolt on sector offsets
rdN(μf+βsector(d),σd2),βcN(0,τsector2),r_{d} \sim \mathcal{N}(\mu_{f} + \beta_{\text{sector}(d)}, \sigma_d^2), \qquad \beta_{c} \sim \mathcal{N}(0, \tau_{\text{sector}}^2),

and score the probability that a sector effect is positive globally or within a manager. Think of it as a sector attribution view: “this manager’s healthcare deals outperform their own anchor with Y% probability,” or “fund 14 has negative exposure to SaaS” even if the overall fund still looks strong.

  1. Scenario overlays. With posterior draws in hand we can stress-test macro scenarios directly. Examples: shock all venture funds down 300 bps to mimic a liquidity freeze, inflate σd\sigma_d for a single sector, or ask “what happens to each manager’s ranking if we remove their largest fund?” Mathematically it’s just shifting (μm,τm,τf,σd)(\mu_m, \tau_m, \tau_f, \sigma_d) or zeroing out subsets of draws, but the business story becomes concrete.

These sketches are parked here so we can pick them up in a future post. Each one only needs a couple of latent parameters in PyMC, a new aggregation in the JSON payload (manager×stage, manager×geo, manager×sector, scenario deltas), and another ridge/table component like the ones above to make the story visible.


  1. Note [1]

    No pooling = estimate each fund in isolation using only its own sample mean.

    [back]
  2. Note [2]

    Complete pooling = collapse the entire dataset into one global mean, ignoring manager and fund structure.

    [back]
  3. Note [3]

    Hierarchical = partial pooling; funds borrow strength from their manager anchor and the global prior.

    [back]
  4. Note [4]

    The Brier score is the mean squared error between predicted probabilities and binary outcomes (lower is better).

    [back]
  5. Note [5]

    Lift vs pool = posterior manager mean minus the complete-pooling baseline (positive values indicate outperformance).

    [back]
  6. Note [6]

    Expected rank = posterior expectation of each manager’s ordering (1 = most likely top performer).

    [back]