Borrowing Predictive Strength: Hierarchical Bayes across Managers, Funds, and Deals
10/12/2025
Belief is a team sport. A deal lives inside a fund, which lives inside a manager. Hierarchical Bayes turns that nesting into math so thin data can lean on thicker neighbors without collapsing into one-size-fits-all averages. This post explains the interactives with equations and step-by-step reading guides.
TL;DR
- Model deal returns inside funds, and funds inside managers.
- Use partial pooling to shrink noisy estimates toward manager and global anchors.
- Score predictions on held-out deals to confirm that strength sharing pays.
Quick Bayesian reminder
Updating beliefs is multiplication in disguise:
- The prior encodes what you believed before any evidence.
- The likelihood says how probable the observed data are if were true.
- The posterior blends the two by comparative uncertainty.
In a hierarchy, priors are built from higher levels, so information flows up and down.
Model in one picture
We use a three-level Normal model:
- Deal:
- Fund:
- Manager:
Interpretation:
- is deal noise.
- is dispersion across funds within a manager.
- is dispersion across managers.
- is the global sector anchor.
We simulate a full hierarchy and let you control its knobs. These parameters act both as data-generators and as priors the model uses to share strength.
Interactive - Hierarchy controls
Hierarchy controls
These parameters generate a synthetic hierarchy and serve as priors for pooling. Bigger tau values imply more real dispersion to respect; bigger sigma_d implies noisier deals to shrink.
What to change and what happens:
- Increase managers, funds per manager, or deals per fund to thicken the dataset. Scores should stabilize.
- Increase or to encode more true dispersion. Pooling should respect real differences more.
- Increase to make deals noisier. Shrinkage should increase.
- Change the train fraction to alter how much data is held out for scoring.
Where the variance actually lives
For a fresh deal drawn from the hierarchy, the unconditional variance decomposes as
- If dominates, managers differ a lot.
- If dominates, funds differ inside each manager.
- If dominates, deals are noisy even within a fund.
Interactive - Variance decomposition
Where uncertainty comes from
How to read the chart:
- Bars show the percent share of total variance attributed to manager, fund, and deal components. They sum to 100 percent.
- If the manager bar grows while others shrink, cross-manager dispersion is the main thing to model. Expect bigger gains from manager-level pooling.
Shrinkage: raw to posterior at the fund level
Given a fund with training deals and sample mean , the posterior mean for the fund-level parameter shrinks toward its manager:
The manager posterior aggregates fund evidence with its own prior:
Smaller or larger makes smaller, so the fund pulls harder toward the manager anchor.
Interactive - Shrinkage ladder
Cross-level shrinkage
Data-poor or high-vol funds move the most. The triangle shows the manager anchor each fund leans toward.
How to read the chart:
- Each row is a fund. The circle is the raw mean . The square is the shrunken posterior . The triangle is its manager anchor .
- The line between circle and square is the amount of shrinkage. Long lines are data-poor or high-volatility funds.
- Sorting emphasizes the largest movers. If many long lines all point toward their manager, partial pooling is doing work.
What to try:
- Increase or decrease deals per fund; lines should lengthen.
- Increase ; funds get more autonomy, so lines shorten.
Predict a brand-new fund under a known manager
For a new fund under manager with expected deals, the predictive distribution of its average return is
Contrast with the global-only baseline that ignores manager identity:
where is the global training mean. Tail odds follow from the Normal cdf, e.g. .
Interactive - New fund predictor
New fund predictor
Hierarchical prediction respects manager identity and fund dispersion. Global-only is optimistic when cross-fund spread is real.
How to read the chart:
- Two curves: manager-informed vs global-only. Means can differ; the manager-informed curve is often a bit wider because it admits cross-fund dispersion and manager uncertainty.
- The badges report and , plus under each model.
- If the manager has strong evidence, the manager-informed mean will move away from the global baseline and the sd may shrink.
What to try:
- Increase ; both curves tighten by but only the manager-informed curve keeps and .
- Pick a manager with many and strong funds; the manager-informed curve should shift meaningfully relative to global.
Out-of-sample scoring: does strength sharing pay?
We hold out a subset of deals and compare three models:
- No pool: predict each deal using only its fund training mean.
- Complete pooling: predict every deal with the single global training mean.
- Hierarchical: borrow strength across manager and fund structure.
We score with two proper scoring rules:
Average log predictive density
and Brier score for the event
Interactive - Predictive scorecard
Predictive scorecard
Absolute view shows the raw metrics. Delta view re-expresses both metrics so higher is better: for log score, improvement = model - baseline; for Brier, improvement = baseline - model. The dotted line marks zero improvement.
Reading the scorecard
Two metrics, both proper scoring rules:
- Avg log predictive density (higher is better).
- Brier score for the event (lower is better).
The menu lets you switch between Absolute values and Delta vs a baseline (Complete or No pool). In Delta view both charts are normalized so that higher is better:
- Log score improvement = .
- Brier improvement = .
The dotted horizontal line marks zero improvement.
What the log score measures
For each holdout deal with realized return and a predictive density , the contribution is . The chart displays the average:
This rewards forecasts that put high probability mass near the truth, and it penalizes overconfidence. Reporting your true predictive distribution maximizes expected score.
What the Brier score is and why we use it
The Brier score evaluates probability forecasts for a binary event. Here the event is a loss: . For each prediction we compute the model’s probability and the realized outcome . The Brier is the mean squared error of those probabilities:
Why this matters here:
- Action relevance. Many portfolio actions hinge on a tail event (loss vs no loss, breach vs no breach). Brier directly scores the quality of those event probabilities, not just point predictions.
- Calibration sensitive. If your probabilities are systematically off (say you predict 30% loss but it happens 50% of the time), Brier will punish you in proportion to the miscalibration.
- Proper rule. Like the log score, Brier is proper: your expected score is best when you report your true probability, so it discourages hedged or exaggerated forecasts.
- Complement to log score. Log score cares about full density shape; Brier isolates the decision boundary. Seeing both gives a fuller picture of sharpness and calibration.
Interpretation tips:
- Lower Brier is better in Absolute view.
- In Delta view we flip it to an improvement scale so higher is better: .
- A Brier of 0.25 corresponds to flipping a fair coin for a balanced event; competent models on financial returns should target far lower than that for meaningful edges.
How to diagnose with the charts
- If No pool beats Complete on both metrics, your cross-fund differences are real and large.
- If Hierarchical beats both, pooled estimates are balancing variance (No pool) and bias (Complete).
- If Hierarchical wins on log score but not on Brier, the densities may be sharp but miscalibrated near the boundary; consider heavier tails or revisiting variance terms.
Implementation notes
- Work in log-return units for additivity. Convert to percents only for display.
- Treat , , and as learnable scale parameters in production; here they are dials for pedagogy.
- Posterior predictive variance must include parameter uncertainty. For a new deal, .
- Unit test edge cases: empty funds, tiny , and very large values.
Connecting the dots
- The variance decomposition explains why partial pooling should help: when or is sizable, sharing information reduces estimation error.
- The shrinkage ladder shows the micro-mechanism that creates the macro win in scores: data-poor funds move toward more reliable anchors, stabilizing out-of-sample predictions.
- The new fund predictor translates the structure into actionable odds for underwriting or pacing, via and .
Notes: multi-strategy managers and whether to share parameters
Some managers run multiple strategies—buyout, growth, credit, special opportunities. The question is how much information should flow across strategies.
Model extension. Add a strategy index and let funds live inside :
- lets deal noise differ by strategy.
- lets fund dispersion differ by strategy.
- is an -vector of manager means across strategies.
- with and a correlation matrix (e.g., LKJ prior). Correlations in encode how much a house style carries across strategies.
When to keep shared parameters. Keep cross-strategy sharing if posteriors for off-diagonal correlations in are materially positive and concentrated. A practical rule:
- If is moderately large (0.2–0.6) and the 80% interval stays away from 0, keep sharing across and .
- If is near 0 with wide uncertainty, treat strategies as independent: set or separate the models.
What shrinks where. For a new fund in strategy under manager :
If has positive correlations, benefits from evidence in the other strategies through the multivariate posterior. If strategies are truly distinct, the posterior naturally shuts down that pathway.
Diagnostics to add later.
- Posterior of with uncertainty bands.
- Per-strategy shrinkage weights .
- Out-of-sample scorecards stratified by strategy.
Notes: why hierarchical Bayes beats normalization and lasso here
People often normalize returns (demean by sector or vintage, z-score by volatility) or fit a lasso on flattened data. Both moves are helpful, but they do not solve the nesting or the uncertainty accounting.
Normalization limits. Demeaning and z-scoring remove average level and scale, but they do not adaptively shrink noisy group means. They also do not propagate parameter uncertainty into predictions.
Lasso limits. The lasso solves an optimization with a global penalty:
which encourages some coefficients to be exactly zero. That is great for sparse feature selection, not for borrowing strength across nested groups whose reliability varies with sample size.
What the hierarchy does instead. The posterior for a fund mean is a data-weighted average:
- The weight depends on and : thin and noisy funds shrink more.
- The manager posterior itself is a shrinkage estimate that pools across funds, with uncertainty that flows down into .
Ridge connection, lasso contrast. If you write a random effect , the posterior mean of equals a ridge solution with penalty . Hierarchical Bayes learns from the data and adjusts shrinkage per group through and . Lasso has a fixed that does not care whether a fund had 6 or 60 deals and does not yield full predictive distributions.
Prediction matters. For a new deal, the hierarchical predictive variance includes both deal noise and parameter uncertainty:
Normalization and lasso give you a point estimate plus residual variance, but they do not decompose or propagate group-level uncertainty in a principled way.
Notes: from hierarchy to systematic manager factors and a scorecard
The same scaffolding gives you cleaner style estimates and a defendable manager scorecard.
Hierarchical factor model. Let be a set of systematic factors (market, rates, credit, sector). For returns indexed by time :
with priors like
- is a manager-by-strategy intercept that shrinks to the strategy base rate.
- are manager style loadings that shrink to a cross-manager mean.
- are fund idiosyncratic tilts that shrink to 0.
You can make the loadings dynamic with a random walk if you need time variation:
Systematic factors for a manager. The posterior of is a stable estimate of the manager’s systematic tilts. You can:
- Use as the manager’s style vector.
- Track over time for drift.
- Form a manager factor-mimicking portfolio by regressing fund returns on with the hierarchical prior to stabilize exposures.
Manager scorecard blueprint. Build tiles from posterior and predictive objects:
- Skill and uncertainty: , , and .
- Cross-fund dispersion: and a league table of shrinkage weights .
- Tail risk: for user threshold and predictive quantiles for next period.
- Calibration: average log score and Brier on rolling holdouts.
- Stability: change in over time, e.g., .
- House style: correlation matrix of across strategies to quantify internal consistency.
Why this is systematic. Shrinkage keeps exposures from overfitting thin histories, and the hierarchical prior aligns managers to a common yardstick. Scores are not just numbers; they are posterior quantities with uncertainty that you can audit and track.
Manager benchmark: a simple map from hierarchy to decisions
We benchmark a manager against a reference .
- Today: (the cross‑manager base rate ).
- Factor‑ready: once you estimate exposures.
1) Skill vs benchmark
Manager skill relative to a benchmark is the difference in pooled means:
We report the posterior mean and a band, plus a credibility number:
- Today, (global).
- Factor‑ready note: when styles are in play, set and for your next‑period factor view .
2) New‑fund odds (underwriting lens)
For a new fund average with deals, the manager‑informed predictive is
We show the density, shade the tail , and display:
Factor‑ready note: replace as above, and optionally add to the variance if you want factor‑view uncertainty.
3) Pooling anatomy (why the estimates are stable)
Funds shrink toward their manager with precision weights
A quick MoM estimate for within‑manager dispersion helps sanity‑check the prior:
4) Calibration vs a baseline
We evaluate the hierarchical predictive on holdouts with two proper scores:
-
Avg log predictive density (higher better)
-
Brier for event (lower better)
The scorecard also shows Delta vs global for quick benchmarking:
Why Brier here? Many actions hinge on the loss event. Brier directly tests the quality of , punishing miscalibration at the threshold even if the mean looks fine.
5) What the tiles mean at a glance
- Skill tile: and .
- Dispersion tile: prior vs .
- Underwriting tile: with deals.
- Calibration tiles: avg log and Brier, plus deltas vs global.
- Shrinkage table: top with weights .
Manager scorecard
| Fund | n | raw (%) | post (%) | weight w | |shrink| (pp) |
|---|---|---|---|---|---|
| Fund 1 | 9 | 9.34 | 6.63 | 0.50 | 2.71 |
| Fund 3 | 9 | 0.51 | 2.21 | 0.50 | 1.71 |
| Fund 6 | 9 | 6.67 | 5.29 | 0.50 | 1.38 |
| Fund 5 | 9 | 2.16 | 3.04 | 0.50 | 0.88 |
| Fund 2 | 9 | 4.46 | 4.19 | 0.50 | 0.27 |
| Fund 4 | 9 | 3.86 | 3.89 | 0.50 | 0.03 |
Alpha is the manager mean minus the global base rate. tau_f compares the spread of fund means within this manager to the measurement noise implied by sigma_d and fund sample sizes. Calibration uses only holdout deals and the hierarchical predictive; the Brier event is r < L. Use the toggle to estimate tau_f per manager rather than the prior.
Factor-ready checklist
To upgrade the benchmark from global to style‑matched:
- Estimate and optionally with hierarchical shrinkage.
- Swap means in formulas: , .
- Optionally add to predictive variance.
- Reuse the same tiles and scores. Only the benchmark changed.