Borrowing Predictive Strength: Hierarchical Bayes across Managers, Funds, and Deals

10/12/2025

Belief is a team sport. A deal lives inside a fund, which lives inside a manager. Hierarchical Bayes turns that nesting into math so thin data can lean on thicker neighbors without collapsing into one-size-fits-all averages. This post explains the interactives with equations and step-by-step reading guides.

TL;DR

Model deal returns inside funds, and funds inside managers.
Use partial pooling to shrink noisy estimates toward manager and global anchors.
Score predictions on held-out deals to confirm that strength sharing pays.

Quick Bayesian reminder

Updating beliefs is multiplication in disguise: $p(r \mid y) \propto p(y \mid r)\,p(r).$

The prior $p(r)$ encodes what you believed before any evidence.
The likelihood $p(y \mid r)$ says how probable the observed data are if $r$ were true.
The posterior $p(r \mid y)$ blends the two by comparative uncertainty.

In a hierarchy, priors are built from higher levels, so information flows up and down.

Model in one picture

We use a three-level Normal model:

Deal: $r_{d,f,m} \sim \mathcal{N}(\mu_{f,m}, \sigma_d^2)$
Fund: $\mu_{f,m} \sim \mathcal{N}(\mu_m, \tau_f^2)$
Manager: $\mu_m \sim \mathcal{N}(\mu_0, \tau_m^2)$

Interpretation:

$\sigma_d$ is deal noise.
$\tau_f$ is dispersion across funds within a manager.
$\tau_m$ is dispersion across managers.
$\mu_0$ is the global sector anchor.

We simulate a full hierarchy and let you control its knobs. These parameters act both as data-generators and as priors the model uses to share strength.

Interactive - Hierarchy controls

Hierarchy controls

Managers (M)

Funds per manager (F)

Deals per fund (D)

30%

90%

mu0 global mean (%)

tau_m manager sd (%)

tau_f fund sd (%)

sigma_d deal sd (%)

Seed

Hier log score ≈ 1.392

vs no-pool Δ ≈ 0.024

vs complete Δ ≈ 0.077

Holdout N = 252

These parameters generate a synthetic hierarchy and serve as priors for pooling. Bigger tau values imply more real dispersion to respect; bigger sigma_d implies noisier deals to shrink.

What to change and what happens:

Increase managers, funds per manager, or deals per fund to thicken the dataset. Scores should stabilize.
Increase $\tau_m$ or $\tau_f$ to encode more true dispersion. Pooling should respect real differences more.
Increase $\sigma_d$ to make deals noisier. Shrinkage should increase.
Change the train fraction to alter how much data is held out for scoring.

Where the variance actually lives

For a fresh deal drawn from the hierarchy, the unconditional variance decomposes as

\mathrm{Var}(r_{\text{new}}) \approx \tau_m^2 + \tau_f^2 + \sigma_d^2.

If $\tau_m$ dominates, managers differ a lot.
If $\tau_f$ dominates, funds differ inside each manager.
If $\sigma_d$ dominates, deals are noisy even within a fund.

Interactive - Variance decomposition

Where uncertainty comes from

How to read the chart:

Bars show the percent share of total variance attributed to manager, fund, and deal components. They sum to 100 percent.
If the manager bar grows while others shrink, cross-manager dispersion is the main thing to model. Expect bigger gains from manager-level pooling.

Shrinkage: raw to posterior at the fund level

Given a fund with $n_f$ training deals and sample mean $\bar{y}_f$ , the posterior mean for the fund-level parameter shrinks toward its manager:

\hat{\mu}_{f} \;=\; w_f\,\bar{y}_f + (1-w_f)\,\hat{\mu}_{m}, \quad w_f \;=\; \frac{n_f/\sigma_d^2}{n_f/\sigma_d^2 + 1/\tau_f^2}.

The manager posterior aggregates fund evidence with its own prior:

\mathrm{Var}(\mu_m \mid \text{data}) \;=\; \left[\frac{1}{\tau_m^2} + \sum_f \frac{1}{\tau_f^2 + \sigma_d^2/n_f}\right]^{-1},

\hat{\mu}_{m} \;=\; \mathrm{Var}(\mu_m \mid \text{data}) \left[ \frac{\mu_0}{\tau_m^2} + \sum_f \frac{\bar{y}_f}{\tau_f^2 + \sigma_d^2/n_f} \right].

Smaller $n_f$ or larger $\sigma_d$ makes $w_f$ smaller, so the fund pulls harder toward the manager anchor.

Interactive - Shrinkage ladder

Cross-level shrinkage

Funds shown

Avg |shrink| ≈ 1.08 pp

Funds sampled from largest shrink

Data-poor or high-vol funds move the most. The triangle shows the manager anchor each fund leans toward.

How to read the chart:

Each row is a fund. The circle is the raw mean $\bar{y}_f$ . The square is the shrunken posterior $\hat{\mu}_f$ . The triangle is its manager anchor $\hat{\mu}_m$ .
The line between circle and square is the amount of shrinkage. Long lines are data-poor or high-volatility funds.
Sorting emphasizes the largest movers. If many long lines all point toward their manager, partial pooling is doing work.

What to try:

Increase $\sigma_d$ or decrease deals per fund; lines should lengthen.
Increase $\tau_f$ ; funds get more autonomy, so lines shorten.

Predict a brand-new fund under a known manager

For a new fund under manager $m$ with expected $n_{\text{new}}$ deals, the predictive distribution of its average return $\bar{y}_{\text{new}}$ is

\bar{y}_{\text{new}} \mid \text{data} \sim \mathcal{N}\!\left(\hat{\mu}_m,\; \tau_f^2 + \frac{\sigma_d^2}{n_{\text{new}}} + \mathrm{Var}(\mu_m \mid \text{data})\right).

Contrast with the global-only baseline that ignores manager identity:

\bar{y}_{\text{new}} \mid \text{global} \sim \mathcal{N}\!\left(\hat{\mu}_0,\; \frac{\sigma_d^2}{n_{\text{new}}}\right),

where $\hat{\mu}_0$ is the global training mean. Tail odds follow from the Normal cdf, e.g. $P(\bar{y}_{\text{new}} < 0) = \Phi\!\left(-\frac{\mu}{\sigma}\right)$ .

Interactive - New fund predictor

New fund predictor

Manager index

Deals in new fund (n)

Manager-informed mean ≈ 3.92%

sd ≈ 3.33%

P(r_bar_new < 0): hier 12.0% vs global 56.1%

Hierarchical prediction respects manager identity and fund dispersion. Global-only is optimistic when cross-fund spread is real.

How to read the chart:

Two curves: manager-informed vs global-only. Means can differ; the manager-informed curve is often a bit wider because it admits cross-fund dispersion and manager uncertainty.
The badges report $\mathbb{E}[\bar{y}_{\text{new}}]$ and $\mathrm{sd}(\bar{y}_{\text{new}})$ , plus $P(\bar{y}_{\text{new}} < 0)$ under each model.
If the manager has strong evidence, the manager-informed mean will move away from the global baseline and the sd may shrink.

What to try:

Increase $n_{\text{new}}$ ; both curves tighten by $\sigma_d^2/n_{\text{new}}$ but only the manager-informed curve keeps $\tau_f^2$ and $\mathrm{Var}(\mu_m \mid \text{data})$ .
Pick a manager with many and strong funds; the manager-informed curve should shift meaningfully relative to global.

Out-of-sample scoring: does strength sharing pay?

We hold out a subset of deals and compare three models:

No pool: predict each deal using only its fund training mean.
Complete pooling: predict every deal with the single global training mean.
Hierarchical: borrow strength across manager and fund structure.

We score with two proper scoring rules:

Average log predictive density

\text{LogScore} \;=\; \frac{1}{N}\sum_{i=1}^{N} \log p_i(r_i),

and Brier score for the event $r < 0$

\text{Brier} \;=\; \frac{1}{N}\sum_{i=1}^{N} \big(p_i - y_i\big)^2, \quad p_i = P(r_i < 0), \quad y_i = \mathbf{1}\{r_i < 0\}.

Interactive - Predictive scorecard

Predictive scorecard

Absolute

Delta

Baseline: Complete

Baseline: No pool

Log winner: Hierarchical

Brier winner: Hierarchical

Holdout N = 252

Hier vs Complete: +0.077 log

Hier vs Complete: +0.034 Brier improvement

Absolute view shows the raw metrics. Delta view re-expresses both metrics so higher is better: for log score, improvement = model - baseline; for Brier, improvement = baseline - model. The dotted line marks zero improvement.

Reading the scorecard

Two metrics, both proper scoring rules:

Avg log predictive density (higher is better).
Brier score for the event $r < 0$ (lower is better).

The menu lets you switch between Absolute values and Delta vs a baseline (Complete or No pool). In Delta view both charts are normalized so that higher is better:

Log score improvement = $\text{logscore}_{model} - \text{logscore}_{baseline}$ .
Brier improvement = $\text{Brier}_{baseline} - \text{Brier}_{model}$ .

The dotted horizontal line marks zero improvement.

What the log score measures

For each holdout deal with realized return $r_i$ and a predictive density $p_i(r)$ , the contribution is $\log p_i(r_i)$ . The chart displays the average:

\frac{1}{N}\sum_{i=1}^{N} \log p_i(r_i).

This rewards forecasts that put high probability mass near the truth, and it penalizes overconfidence. Reporting your true predictive distribution maximizes expected score.

What the Brier score is and why we use it

The Brier score evaluates probability forecasts for a binary event. Here the event is a loss: $r < 0$ . For each prediction we compute the model’s probability $p_i = P(r_i < 0)$ and the realized outcome $y_i = \mathbf{1}\{r_i < 0\}$ . The Brier is the mean squared error of those probabilities:

\text{Brier} = \frac{1}{N}\sum_{i=1}^{N} (p_i - y_i)^2.

Why this matters here:

Action relevance. Many portfolio actions hinge on a tail event (loss vs no loss, breach vs no breach). Brier directly scores the quality of those event probabilities, not just point predictions.
Calibration sensitive. If your probabilities are systematically off (say you predict 30% loss but it happens 50% of the time), Brier will punish you in proportion to the miscalibration.
Proper rule. Like the log score, Brier is proper: your expected score is best when you report your true probability, so it discourages hedged or exaggerated forecasts.
Complement to log score. Log score cares about full density shape; Brier isolates the decision boundary. Seeing both gives a fuller picture of sharpness and calibration.

Interpretation tips:

Lower Brier is better in Absolute view.
In Delta view we flip it to an improvement scale so higher is better: $\text{Brier}_{baseline} - \text{Brier}_{model}$ .
A Brier of 0.25 corresponds to flipping a fair coin for a balanced event; competent models on financial returns should target far lower than that for meaningful edges.

How to diagnose with the charts

If No pool beats Complete on both metrics, your cross-fund differences are real and large.
If Hierarchical beats both, pooled estimates are balancing variance (No pool) and bias (Complete).
If Hierarchical wins on log score but not on Brier, the densities may be sharp but miscalibrated near the $r < 0$ boundary; consider heavier tails or revisiting variance terms.

Implementation notes

Work in log-return units for additivity. Convert to percents only for display.
Treat $\sigma_d$ , $\tau_f$ , and $\tau_m$ as learnable scale parameters in production; here they are dials for pedagogy.
Posterior predictive variance must include parameter uncertainty. For a new deal, $\mathrm{Var}(r_{\text{new}} \mid \text{data}) \approx \sigma_d^2 + \mathrm{Var}(\mu_f \mid \text{data})$ .
Unit test edge cases: empty funds, tiny $n_f$ , and very large $\tau$ values.

Connecting the dots

The variance decomposition explains why partial pooling should help: when $\tau_m^2$ or $\tau_f^2$ is sizable, sharing information reduces estimation error.
The shrinkage ladder shows the micro-mechanism that creates the macro win in scores: data-poor funds move toward more reliable anchors, stabilizing out-of-sample predictions.
The new fund predictor translates the structure into actionable odds for underwriting or pacing, via $P(\bar{y}_{\text{new}} < 0)$ and $\mathbb{E}[\bar{y}_{\text{new}}]$ .

Notes: multi-strategy managers and whether to share parameters

Some managers run multiple strategies—buyout, growth, credit, special opportunities. The question is how much information should flow across strategies.

Model extension. Add a strategy index $s \in \{1,\dots,S\}$ and let funds live inside $(m,s)$ :

r_{d,f,m,s} \sim \mathcal{N}(\mu_{f,m,s}, \sigma_{d,s}^2), \quad \mu_{f,m,s} \sim \mathcal{N}(\mu_{m,s}, \tau_{f,s}^2), \quad \mu_{m,\cdot} \sim \mathcal{N}_S(\mu_{0,\cdot}, \Sigma_m).

$\sigma_{d,s}$ lets deal noise differ by strategy.
$\tau_{f,s}$ lets fund dispersion differ by strategy.
$\mu_{m,\cdot}$ is an $S$ -vector of manager means across strategies.
$\Sigma_m = D\,R\,D$ with $D = \mathrm{diag}(\tau_{m,1},\dots,\tau_{m,S})$ and $R$ a correlation matrix (e.g., LKJ prior). Correlations in $R$ encode how much a house style carries across strategies.

When to keep shared parameters. Keep cross-strategy sharing if posteriors for off-diagonal correlations in $R$ are materially positive and concentrated. A practical rule:

If $\mathbb{E}[\rho_{s,s'} \mid \text{data}]$ is moderately large (0.2–0.6) and the 80% interval stays away from 0, keep sharing across $s$ and $s'$ .
If $\rho_{s,s'}$ is near 0 with wide uncertainty, treat strategies as independent: set $R \approx I$ or separate the models.

What shrinks where. For a new fund in strategy $s$ under manager $m$ :

\bar y_{\text{new}} \mid \text{data} \sim \mathcal{N}\!\Big(\hat{\mu}_{m,s},\ \tau_{f,s}^2 + \tfrac{\sigma_{d,s}^2}{n_{\text{new}}} + \mathrm{Var}(\mu_{m,s} \mid \text{data})\Big).

If $R$ has positive correlations, $\hat{\mu}_{m,s}$ benefits from evidence in the other strategies through the multivariate posterior. If strategies are truly distinct, the posterior naturally shuts down that pathway.

Diagnostics to add later.

Posterior of $R$ with uncertainty bands.
Per-strategy shrinkage weights $w_{f,s} = \frac{n_f/\sigma_{d,s}^2}{n_f/\sigma_{d,s}^2 + 1/\tau_{f,s}^2}$ .
Out-of-sample scorecards stratified by strategy.

Notes: why hierarchical Bayes beats normalization and lasso here

People often normalize returns (demean by sector or vintage, z-score by volatility) or fit a lasso on flattened data. Both moves are helpful, but they do not solve the nesting or the uncertainty accounting.

Normalization limits. Demeaning and z-scoring remove average level and scale, but they do not adaptively shrink noisy group means. They also do not propagate parameter uncertainty into predictions.

Lasso limits. The lasso solves an optimization with a global penalty:

\min_{\beta} \ \sum_i (y_i - x_i^\top \beta)^2 + \lambda \lVert \beta \rVert_1,

which encourages some coefficients to be exactly zero. That is great for sparse feature selection, not for borrowing strength across nested groups whose reliability varies with sample size.

What the hierarchy does instead. The posterior for a fund mean is a data-weighted average:

\hat{\mu}_f = w_f\,\bar y_f + (1 - w_f)\,\hat{\mu}_m, \quad w_f = \frac{n_f/\sigma_d^2}{n_f/\sigma_d^2 + 1/\tau_f^2}.

The weight $w_f$ depends on $n_f$ and $\sigma_d^2$ : thin and noisy funds shrink more.
The manager posterior $\hat{\mu}_m$ itself is a shrinkage estimate that pools across funds, with uncertainty that flows down into $\mathrm{Var}(\mu_f \mid \text{data})$ .

Ridge connection, lasso contrast. If you write a random effect $u_f \sim \mathcal{N}(0, \tau_f^2)$ , the posterior mean of $u_f$ equals a ridge solution with penalty $\lambda = 1/\tau_f^2$ . Hierarchical Bayes learns $\tau_f$ from the data and adjusts shrinkage per group through $n_f$ and $\sigma_d^2$ . Lasso has a fixed $\lambda$ that does not care whether a fund had 6 or 60 deals and does not yield full predictive distributions.

Prediction matters. For a new deal, the hierarchical predictive variance includes both deal noise and parameter uncertainty:

\mathrm{Var}(r_{\text{new}} \mid \text{data}) \approx \sigma_d^2 + \mathrm{Var}(\mu_f \mid \text{data}).

Normalization and lasso give you a point estimate plus residual variance, but they do not decompose or propagate group-level uncertainty in a principled way.

Notes: from hierarchy to systematic manager factors and a scorecard

The same scaffolding gives you cleaner style estimates and a defendable manager scorecard.

Hierarchical factor model. Let $X_t$ be a set of systematic factors (market, rates, credit, sector). For returns indexed by time $t$ :

r_{d,f,m,s,t} = \alpha_{m,s} + \beta_{m}^\top X_t + \gamma_{f}^\top Z_{d,t} + \varepsilon_{d,f,m,s,t},

with priors like

\alpha_{m,s} \sim \mathcal{N}(\mu_{0,s}, \tau_{\alpha,s}^2), \quad \beta_{m} \sim \mathcal{N}(\beta_0, \mathrm{diag}(\tau_{\beta}^2)), \quad \gamma_f \sim \mathcal{N}(0, \mathrm{diag}(\tau_{\gamma}^2)).

$\alpha_{m,s}$ is a manager-by-strategy intercept that shrinks to the strategy base rate.
$\beta_m$ are manager style loadings that shrink to a cross-manager mean.
$\gamma_f$ are fund idiosyncratic tilts that shrink to 0.

You can make the loadings dynamic with a random walk if you need time variation:

\beta_{m,t} = \beta_{m,t-1} + \eta_{m,t}, \quad \eta_{m,t} \sim \mathcal{N}(0, Q).

Systematic factors for a manager. The posterior of $\beta_m$ is a stable estimate of the manager’s systematic tilts. You can:

Use $\mathbb{E}[\beta_m \mid \text{data}]$ as the manager’s style vector.
Track $\beta_{m,t}$ over time for drift.
Form a manager factor-mimicking portfolio by regressing fund returns on $X_t$ with the hierarchical prior to stabilize exposures.

Manager scorecard blueprint. Build tiles from posterior and predictive objects:

Skill and uncertainty: $\mathbb{E}[\alpha_{m,\cdot}]$ , $\mathrm{sd}(\alpha_{m,\cdot})$ , and $P(\alpha_{m,\cdot} > 0)$ .
Cross-fund dispersion: $\mathbb{E}[\tau_{f,\cdot}]$ and a league table of shrinkage weights $w_{f,\cdot}$ .
Tail risk: $P(r < L)$ for user threshold $L$ and predictive quantiles for next period.
Calibration: average log score and Brier on rolling holdouts.
Stability: change in $\beta_m$ over time, e.g., $\lVert \beta_{m,t} - \beta_{m,t-1} \rVert_2$ .
House style: correlation matrix of $\mu_{m,s}$ across strategies to quantify internal consistency.

Why this is systematic. Shrinkage keeps exposures from overfitting thin histories, and the hierarchical prior aligns managers to a common yardstick. Scores are not just numbers; they are posterior quantities with uncertainty that you can audit and track.

Manager benchmark: a simple map from hierarchy to decisions

We benchmark a manager $eqn m$ against a reference $eqn b$ .

Today: $eqn b = \text{global}$ (the cross‑manager base rate $eqn \mu_0$ ).
Factor‑ready: $eqn b = \text{style‑matched}$ once you estimate exposures.

1) Skill vs benchmark

Manager skill relative to a benchmark is the difference in pooled means:

\alpha_m(b) \;=\; \mu_m \;-\; \mu_b.

We report the posterior mean and a band, plus a credibility number:

\Pr[\alpha_m(b) > 0] \;=\; \Phi\!\Big(\frac{\mu_m - \mu_b}{\mathrm{sd}(\mu_m)}\Big).

Today, $eqn \mu_b = \mu_0$ (global).
Factor‑ready note: when styles are in play, set $eqn \mu_m \leftarrow \alpha_{m,s} + \beta_m^\top \bar X$ and $eqn \mu_b \leftarrow \mu_{0,s} + \beta_0^\top \bar X$ for your next‑period factor view $eqn \bar X$ .

2) New‑fund odds (underwriting lens)

For a new fund average $eqn \bar r_{\text{new}}$ with $eqn n$ deals, the manager‑informed predictive is

\bar r_{\text{new}} \mid \text{data} \sim \mathcal{N}\!\left(\mu_m,\; \tau_f^2 \;+\; \frac{\sigma_d^2}{n} \;+\; \mathrm{Var}(\mu_m)\right).

We show the density, shade the tail $eqn r<L$ , and display:

$eqn \mathbb{E}[\bar r_{\text{new}}] = \mu_m$
$eqn \mathrm{sd}(\bar r_{\text{new}}) = \sqrt{\tau_f^2 + \sigma_d^2/n + \mathrm{Var}(\mu_m)}$
$eqn \Pr(\bar r_{\text{new}} < L) = \Phi\!\Big(\frac{L - \mu_m}{\mathrm{sd}}\Big)$

Factor‑ready note: replace $eqn \mu_m$ as above, and optionally add $eqn \bar X^\top \mathrm{Var}(\beta_m)\bar X$ to the variance if you want factor‑view uncertainty.

3) Pooling anatomy (why the estimates are stable)

Funds shrink toward their manager with precision weights

w_f \;=\; \frac{n_f/\sigma_d^2}{n_f/\sigma_d^2 + 1/\tau_f^2}, \qquad \hat\mu_f \;=\; w_f\,\bar y_f + (1-w_f)\,\mu_m.

A quick MoM estimate for within‑manager dispersion helps sanity‑check the prior:

\hat\tau_{f,\text{MoM}}^2 \;=\; \max\!\left\{0,\ \mathrm{Var}(\bar y_f) - \mathbb{E}\!\Big[\frac{\sigma_d^2}{n_f}\Big]\right\}.

4) Calibration vs a baseline

We evaluate the hierarchical predictive on holdouts with two proper scores:

Avg log predictive density (higher better) $\frac{1}{N}\sum_{i=1}^{N} \log p_i(r_i).$
Brier for event $eqn r<L$ (lower better) $\frac{1}{N}\sum_{i=1}^{N} (p_i - y_i)^2,\quad p_i = \Pr(r_i<L),\ y_i=\mathbf{1}\{r_i<L\}.$

The scorecard also shows Delta vs global for quick benchmarking:

\Delta \text{Log} = \text{Log}_m - \text{Log}_{\text{global}},\qquad \Delta \text{Brier} = \text{Brier}_{\text{global}} - \text{Brier}_m.

Why Brier here? Many actions hinge on the loss event. Brier directly tests the quality of $eqn \Pr(r<L)$ , punishing miscalibration at the threshold even if the mean looks fine.

5) What the tiles mean at a glance

Skill tile: $eqn \alpha_m(\text{global}) = \mu_m - \mu_0$ and $eqn \Pr[\alpha_m>0]$ .
Dispersion tile: prior $eqn \tau_f$ vs $eqn \hat\tau_{f,\text{MoM}}$ .
Underwriting tile: $eqn \Pr(\bar r_{\text{new}}<L)$ with $eqn n$ deals.
Calibration tiles: avg log and Brier, plus deltas vs global.
Shrinkage table: top $eqn |\hat\mu_f - \bar y_f|$ with weights $eqn w_f$ .

Manager scorecard

Manager index

Deals in new fund (n)

Loss threshold L (%)

Use estimated tau_f per manager

Calibration: absolute

Calibration: delta vs global

alpha ≈ 2.72% ± 1.05%

E[r_bar_new] ≈ 3.92%

sd[r_bar_new] ≈ 3.10%

P(r_bar_new < L) ≈ 10.3%

tau_f (prior) ≈ 2.00%

tau_f (est) ≈ 2.45%

Log score ≈ 1.387

Brier(L) ≈ 0.144

Holdout N = 42

Fund	n	raw (%)	post (%)	weight w	\|shrink\| (pp)
Fund 1	9	9.34	6.63	0.50	2.71
Fund 3	9	0.51	2.21	0.50	1.71
Fund 6	9	6.67	5.29	0.50	1.38
Fund 5	9	2.16	3.04	0.50	0.88
Fund 2	9	4.46	4.19	0.50	0.27
Fund 4	9	3.86	3.89	0.50	0.03

Alpha is the manager mean minus the global base rate. tau_f compares the spread of fund means within this manager to the measurement noise implied by sigma_d and fund sample sizes. Calibration uses only holdout deals and the hierarchical predictive; the Brier event is r < L. Use the toggle to estimate tau_f per manager rather than the prior.

Factor-ready checklist

To upgrade the benchmark from global to style‑matched:

Estimate $eqn \beta_m$ and optionally $eqn \beta_{m,t}$ with hierarchical shrinkage.
Swap means in formulas: $eqn \mu_m \rightarrow \alpha_{m,s} + \beta_m^\top \bar X$ , $eqn \mu_0 \rightarrow \mu_{0,s} + \beta_0^\top \bar X$ .
Optionally add $eqn \bar X^\top \mathrm{Var}(\beta_m)\bar X$ to predictive variance.
Reuse the same tiles and scores. Only the benchmark changed.