Stage Tilts to State Space: GP Skill the Bayesian Way

From yf,mN(μm,σ2)y_{f,m}\sim\mathcal{N}(\mu_m,\sigma^2) to stage offsets, Student-tt tails, μs\mu_s-anchored managers, γvintage\gamma_{\text{vintage}} cycles, and a μm,t\mu_{m,t} random walk—all inside one workflow.

11/16/2025

TL;DR & why another manager model?

In Borrowing Predictive Strength, In Practice we treated the manager → fund → deal hierarchy as a mostly-settled object: simulate a lifelike PE universe, drop a hierarchical Normal stack on top, and read off shrinkage, variance shares, and “skill.” That post was about what happens after you’ve chosen a model.

Synthetic data warning: every chart in this post uses generated data seeded from the Python experiments folder. The point is to walk the Bayesian workflow, not to claim these fits describe any actual asset class behavior. Treat the diagnostics as model-building scaffolding, not backtests.

This one rewinds to an earlier point in the story: how do you actually build that model in the first place, step by step, when you don’t know yet what structure you need? I lean on the standard Bayesian workflow:

  1. Write down a generative probability model.
  2. Condition on data and compute the posterior.
  3. Check model fit; expand or revise when it fails.

We’ll walk that workflow on a stripped-down version of the manager problem:

  • Managers m=1,,Mm = 1,\dots,M each run nmn_m funds.
  • Each fund belongs to a stage bucket (Buyout, Venture, Infrastructure, or Growth in this seed dataset).
  • Each fund has a net IRR outcome yf,my_{f,m} (measured in decimal terms).

The key is not to jump straight to the “correct” model with strategy effects, vintages, fat tails, and dynamic skill. Instead, we’ll start with an embarrassingly naive Normal model, let posterior predictive checks tell us how it fails, and only then add structure.

Formally, the zeroth-order guess looks like:

yf,mN(μm,σ2)μmN(μ0,τ2)μ0N(0,102),σ,τHalfNormal(5).\begin{aligned} y_{f,m} &\sim \mathcal{N}(\mu_m, \sigma^2) \\ \mu_m &\sim \mathcal{N}(\mu_0, \tau^2) \\ \mu_0 &\sim \mathcal{N}(0, 10^2), \quad \sigma, \tau \sim \text{HalfNormal}(5). \end{aligned}

Read that as the most boring PE story imaginable:

  • Every manager has a “true” return level μm\mu_m.
  • All funds around that manager are just noisy Normal draws with a single global volatility σ\sigma.
  • Managers themselves are exchangeable around a global mean μ0\mu_0.

Obviously wrong for real data (no stage effects, no cycles, no heavy tails), but it’s a clean starting point and easy to code in PyMC:

The rest of the post will live in the tension between this toy model and actual PE reality. We’ll:

  1. Sample from the posterior of this basic stack.
  2. Use posterior predictive checks to ask, “Does this generate fund outcomes that look anything like the Buyout/Venture/Infrastructure/Growth buckets?”
  3. Let the misfit drive the next refinement: stage-level offsets, strategy-specific volatilities, heavier tails, and eventually vintages or dynamic skill.

The point isn’t that this particular model is special; it’s that the Bayesian workflow gives you a repeatable way to iterate on it every time the data say “nope.”

What the naive fit thinks the data look like

The seed database gives us net IRR observations by fund and stage. Before we ask PyMC to learn anything, it’s helpful to look at the raw stage distributions. Buyout and Venture dominate the sample, Infrastructure creeps in with narrower dispersion, and the “Growth” bucket is the runt of the litter.

How the naive model sees the data
−20020406000.020.040.06−20020406000.050.1−20020406000.10.2−20020406000.050.10.150.2
Fund outcomes by stageNet IRR (%)BuyoutVentureGrowthInfrastructure

Even without a model you can see the problem: Venture and Growth funds both throw out fat right tails, while Infrastructure funds hug the mid-teens. A single Normal with a shared σ\sigma will have to smash those shapes together.

Parameter view: shrinkage and scatter

Now drop the naive model on top of those net IRRs. Because each manager’s funds are scarce, the posterior pulls everyone toward the global mean. That’s exactly what the shrinkage story says should happen; the question is whether it preserves any real stage structure.

Manager posterior vs raw means
1015202510152025
ManagersManager posterior vs raw mean net IRRRaw mean (%)Posterior mean (%)
Largest shrinkage moves
Top 8
FundStageRaw net IRRManager posteriorΔ (shrink)

XNRX Holdings Fund 16

XNRX Holdings

Venture35.1%19.1%-16.0%

UIR Holdings Fund 22

UIR Holdings

Growth32.2%18.5%-13.7%

RQF Investments Fund 18

RQF Investments

Venture36.3%24.1%-12.2%

RQF Investments Fund 15

RQF Investments

Venture34.2%24.1%-10.1%

UOQ Partners Fund 09

UOQ Partners

Venture28.8%20.0%-8.8%

RQF Investments Fund 13

RQF Investments

Venture15.6%24.1%+8.5%

UIR Holdings Fund 09

UIR Holdings

Infrastructure10.3%18.5%+8.2%

OTUI Investments Fund 16

OTUI Investments

Infrastructure0.4%8.6%+8.2%

The scatter plot compares raw fund means to the manager-level posterior for net IRR. Managers with wild funds (e.g., those with a single 35% net IRR Venture fund) get yanked back toward the population. The table shows the biggest fund-level moves—in every case the posterior is shouting “slow down, that one fund is an outlier.”

Posterior predictive checks: where it breaks

Posterior predictive draws sit nicely on top of the pooled sample because the model is literally a pooled Normal. That’s fine as a zeroth-order calibration check, but it hides the fact that Venture and Growth are supposed to be spikier than Infrastructure.

Overall posterior predictive check
−20020406000.010.020.030.040.050.06
ObservedPosterior predictivePosterior predictive vs observed net IRRNet IRR (%)Density

Break the same check apart by stage and the failure is obvious. The model can’t create heavy right tails for Venture or the tighter spread for Infrastructure because it doesn’t know stages exist. Every bucket borrows the same μ0\mu_0 and σ\sigma.

Stage-wise posterior predictive check
−20020406000.020.040.06−20020406000.050.1−20020406000.10.2−20020406000.050.10.150.2
ObservedPosterior predictiveStage-wise posterior predictive checkNet IRR (%)BuyoutVentureGrowthInfrastructure

That’s precisely the “ouch” moment we needed. The workflow is doing its job: start simple, check fit, and let misfit tell you what to add. Up next we’ll introduce stage-level offsets (so Venture can live above Buyout) and stage-specific volatilities (so Infrastructure doesn’t inherit Venture’s tails). That expansion is the on-ramp to more realistic manager → fund modeling.

Letting stages breathe: adding strategy-level structure

The stage-wise PPC was the first honest “nope” from the workflow. The naive model thought all funds were noisy draws around a manager anchor with a single global volatility. The data clearly disagreed: Venture and Growth throw fat right tails, Infrastructure hugs a tight mid-band, and Buyout sits in between. If the generative story doesn’t know stages exist, there’s no way it can reproduce that picture.

The smallest fix is to teach the model about stages without changing anything else. Concretely:

  • Each fund still lives under a manager-level mean μm\mu_m.
  • Each stage s{Buyout,Venture,Growth,Infrastructure}s \in \{\text{Buyout}, \text{Venture}, \text{Growth}, \text{Infrastructure}\} gets its own offset αs\alpha_s, so Venture can sit above Buyout, Growth can float somewhere in between, and Infra can sink a bit.
  • Each stage also gets its own volatility σs\sigma_s, so Venture can be genuinely louder than Infra.

In symbols:

yf,mN(μm+αs(f),σs(f)2)μmN(μ0,τ2)αsN(0,σα2),s{B,V,G,I}μ0N(0,102),τHalfNormal(5),σsHalfNormal(5).\begin{aligned} y_{f,m} &\sim \mathcal{N}(\mu_m + \alpha_{s(f)}, \,\sigma_{s(f)}^2) \\ \mu_m &\sim \mathcal{N}(\mu_0, \tau^2) \\ \alpha_s &\sim \mathcal{N}(0, \sigma_\alpha^2), \quad s \in \{\text{B}, \text{V}, \text{G}, \text{I}\} \\ \mu_0 &\sim \mathcal{N}(0, 10^2), \quad \tau \sim \text{HalfNormal}(5), \quad \sigma_s \sim \text{HalfNormal}(5). \end{aligned}

Same manager story as before, but now the mean for fund ff is μm+αs(f)\mu_m + \alpha_{s(f)} instead of just μm\mu_m, and the noise level is allowed to depend on stage. If Venture really is a regime of wild swings and Infra really is a sleepy coupon clipper, the posterior should push αVenture\alpha_{\text{Venture}} up and σVenture\sigma_{\text{Venture}} out, while shrinking σInfrastructure\sigma_{\text{Infrastructure}} down.

The PyMC translation is just a small extension of the previous model:

Nothing exotic: the stage offsets alpha_stage are just four extra Normal parameters, and the sigma_stage vector lets Venture and Infra choose their own noise scale. But this tiny structural change gives the posterior predictive a lot more room to match reality.

In the next step I reuse the same PPCs as before:

  • pooled outcomes vs pooled posterior predictive, and
  • stage-wise outcomes vs stage-wise posterior predictive,

now under the stage-aware model. If the workflow is doing its job, the pooled picture will look roughly the same, but the facet-by-stage view should finally let Venture breathe and Infra tighten up. From there, the remaining misfit (tails, cycles, and skill drift) tells us what to add next.

Stage-aware manager diagnostics

Loads the stage-aware posterior draws (~2 MB) and renders Plotly traces on demand.

Posterior predictive check #2

Loads the stage-aware posterior draws (~2 MB) and renders Plotly traces on demand.

Stage-wise posterior predictive check

Loads the stage-aware posterior draws (~2 MB) and renders Plotly traces on demand.

Posterior predictive check #2

The stage-aware model nails the easy win: posterior predictive draws finally track the spread between Venture/Growth and Infrastructure/Buyout. But even with strategy-level breathing room, the diagnostics still point to three gaps:

  1. Venture tails are too light. A Normal likelihood lets the posterior catch the mean/variance, but it can’t reproduce the occasional 3–4× outliers. We’ll need a heavier tail (Student-tt) or explicit mixture.
  2. Everything is symmetric. The PPC facets show nice centered bells, yet Venture in real allocations is right-skewed. That asymmetry needs a skewed likelihood or latent scale mixture.
  3. Still no vintages. Outcomes drift quietly across macro regimes, but this model is static; there’s no way for the posterior predictive to wobble with cycle changes. The next iteration will inject a vintage term so the workflow can flag cycle drift explicitly.

So the workflow verdict is “better, but not done.” Stage offsets and volatilities fix the glaring PPC #1 miss, which clears the runway for heavier tails and time structure next.

Letting tails speak: swapping Normal for Student-

The stage-aware model buys us a lot: Buyout vs Venture vs Growth vs Infrastructure finally show up as distinct noise regimes. But the PPC facets still have a very “Gaussian finance textbook” feel to them—nice symmetric bells, decently matched variances, and a conspicuous lack of the lopsided blow-ups we actually see in allocations.

In this seed, two features stand out:

  • Venture and Growth occasionally throw 3–4× winners that sit way out in the right tail.
  • Even Infrastructure can have the rare ugly drawdown, but most of the time it’s sleepy.

A Normal likelihood has to treat those events as near-impossible flukes. That pushes the posterior into a bind: either inflate σstage\sigma_{\text{stage}} to accommodate the outliers (and over-widen the mid-mass), or pull the tails in and pretend the outliers didn’t happen. Neither looks great in PPC space.

The standard Bayesian fix is to upgrade the likelihood, not the story about managers. Instead of

yf,mN(μm+αs(f),σs(f)2),y_{f,m} \sim \mathcal{N}(\mu_m + \alpha_{s(f)}, \sigma_{s(f)}^2),

we let each stage have a Student-tt error model:

yf,mtνs(f) ⁣(μm+αs(f),σs(f))μmN(μ0,τ2)αsN(0,σα2),s{B,V,G,I}νsExponential(λν),σsHalfNormal(5).\begin{aligned} y_{f,m} &\sim t_{\nu_{s(f)}}\!\big(\mu_m + \alpha_{s(f)}, \sigma_{s(f)}\big) \\ \mu_m &\sim \mathcal{N}(\mu_0, \tau^2) \\ \alpha_s &\sim \mathcal{N}(0, \sigma_\alpha^2), \quad s \in \{\text{B}, \text{V}, \text{G}, \text{I}\} \\ \nu_s &\sim \text{Exponential}(\lambda_\nu), \quad \sigma_s \sim \text{HalfNormal}(5). \end{aligned}

Now each stage gets:

  • a location shift αs\alpha_s (capturing “Venture lives higher than Infra”),
  • a scale σs\sigma_s (how loud the stage is), and
  • a degrees-of-freedom parameter νs\nu_s (how fat or skinny the tails are).

Low νs\nu_s means heavy tails; high νs\nu_s pushes the stage back toward Normal behavior. If the data agree with our story, we’d expect something like:

  • νVenture\nu_{\text{Venture}} small (fatter tails),
  • νInfrastructure\nu_{\text{Infrastructure}} larger (almost Gaussian),
  • Buyout and Growth somewhere in between.

In PyMC that’s just a small change to the likelihood and a couple of priors:

The workflow doesn’t change:

  1. Fit the tt version on the same seed.
  2. Draw posterior predictive funds.
  3. Re-run the pooled and stage-wise PPCs.

What changes is the way the posterior is allowed to explain “weird but plausible” funds. Venture no longer has to blow up σVenture\sigma_{\text{Venture}} to house a handful of rockets; it can keep a sensible core spread and let the tt tails take the hit. Similarly, Infrastructure can stay tight while still admitting the occasional drawdown in the far left.

Narratively, PPC #3 looks like this:

  • Pooled: almost unchanged. The overall distribution stays anchored around the same global mean and variance.
  • By stage: Venture’s right tail finally reaches into the 3–4× band with nonzero mass, Growth stops looking artificially well-behaved, and Infra keeps its narrow middle while admitting rare uglies.
  • Outliers: single extreme funds stop dragging entire stages or managers around; the tail model soaks up that energy.
Student-t manager diagnostics

Loads the Student-t posterior draws (~6.8 MB) and defers Plotly rendering until opened.

Posterior predictive check #3

Loads the Student-t posterior draws (~6.8 MB) and defers Plotly rendering until opened.

Stage-wise posterior predictive (Student-t model)

Loads the Student-t posterior draws (~6.8 MB) and defers Plotly rendering until opened.

The model is still missing vintage structure and any notion of manager skill evolving over time, but those are the next layers of the stack. At this point the likelihood is finally respectful of PE reality: noisy, skewed, and occasionally wild, especially in the venture corner. That’s exactly the sort of “domain-guided modification” the Bayesian workflow is built to support.

Shrinking managers toward their stage: hierarchical priors for skill

So far the workflow has mostly been about the fund layer: get the basic manager anchor, let stages breathe with their own offsets and volatilities, and then swap in a Student-tt to stop Venture rockets from blowing up the likelihood. All of that helps the posterior predictive align with PE reality at the fund level.

But the manager layer is still a little unsatisfying. In the current t-model every manager’s baseline μm\mu_m is drawn from the same global prior and then we splice in stage effects at the fund level via αs(f)\alpha_{s(f)}. A buyout grinder and a venture gunslinger both shrink toward the same μ0\mu_0, even though we know their “home base” should probably be different.

The natural next step is to let managers shrink toward stage-specific anchors, not a single global mean. In other words:

  • Each stage s{Buyout,Venture,Growth,Infrastructure}s \in \{\text{Buyout}, \text{Venture}, \text{Growth}, \text{Infrastructure}\} gets its own mean skill level μs\mu_s.
  • Managers inherit their prior from the stage they mostly live in.
  • The amount of shrinkage toward the stage mean is itself stage-specific: high-variance stages get looser shrinkage; boring stages get tighter.

Mathematically, that looks like:

μmN ⁣(μs(m),τs(m)2),yf,mtνs(f) ⁣(μm+αs(f),σs(f)),\begin{aligned} \mu_{m} &\sim \mathcal{N}\!\big(\mu_{s(m)}, \tau_{s(m)}^2\big), \\ y_{f,m} &\sim t_{\nu_{s(f)}}\!\big(\mu_m + \alpha_{s(f)}, \sigma_{s(f)}\big), \end{aligned}

with priors like

μsN(0,102),τsHalfNormal(5).\mu_s \sim \mathcal{N}(0, 10^2), \quad \tau_s \sim \text{HalfNormal}(5).

Here s(m)s(m) is the “home stage” for manager mm (e.g. the stage_focus in your seed data), and s(f)s(f) is still the fund’s stage. A Growth-heavy manager shrinks toward μGrowth\mu_{\text{Growth}}, a buyout house shrinks toward μBuyout\mu_{\text{Buyout}}, and so on. Stages with inherently wild manager dispersion (say, early-stage venture) will push τVenture\tau_{\text{Venture}} up so shrinkage is gentler; sleepy Infra will push τInfra\tau_{\text{Infra}} down so Infra managers bunch more tightly around their stage anchor.

One way to encode that in PyMC, building on the t-stage model, is:

The only new ingredient is manager_stage_idx: a length-MM integer array mapping each manager to its primary stage (in the seed I just take the manager’s declared strategy_focus and map it into {B, V, G, I}). Everything else is the same t-stage model with one extra hierarchy on top.

Once this is in place, the workflow loop repeats:

  1. Fit the hierarchical stage-aware t-model.
  2. Look at the posterior for:
    • μs\mu_s (stage anchors),
    • τs\tau_s (how spread-out managers are within each stage),
    • μm\mu_m (manager-specific skill, now clearly grouped by stage).
  3. Re-run the PPCs, but now with an emphasis on manager-level behavior rather than just fund distributions.

Posterior predictive check #4 looks a bit different conceptually:

  • At the fund level, nothing explodes; we’re still using the same Student-tt likelihood and stage-wise scales, so the stage-by-stage PPCs don’t radically change.
  • At the manager level, the simulation finally captures:
    • realistic clusters of managers by stage (Infra houses a tight band of anchors, Venture spreads more),
    • manager-to-manager variability that respects the stage they operate in,
    • a more believable ordering of manager “skill” distributions: Infra grinders might be tightly packed in the mid-teens while Venture gunslingers fan out with higher upside and downside.
Stage anchors and between-manager spread

Loads the stage-hierarchical posterior draws (~6.8 MB) before rendering.

Stage-hierarchical manager diagnostics

Loads the stage-hierarchical posterior draws (~6.8 MB) before rendering.

Posterior predictive check #4

Loads the stage-hierarchical posterior draws (~6.8 MB) before rendering.

Stage-wise posterior predictive (stage-hier)

Loads the stage-hierarchical posterior draws (~6.8 MB) before rendering.

The remaining misfit now lives in places we’ve explicitly chosen not to model yet:

  • Persistent drifts over time. Skill and opportunity aren’t static; 2010-vintage venture is not 2021-vintage venture.
  • Macro/market effects. Some vintages exist in ZIRP land, others in rate shock regimes; that seeps into outcomes in ways our static model can’t express.
  • Regime sensitivity. Certain managers or stages are more cyclical than others, and the current stack treats them as if the environment were constant.

Those gaps are what the next layer of the workflow will chew on: vintage-year effects, latent time processes, or fully dynamic manager skill. For this step, the win is simpler: managers aren’t just shrunk toward a single global mean anymore; they’re shrunk toward the part of the PE universe they actually live in.

Giving the cycle a voice: adding vintage-year effects

Reminder: the seed dataset in this section is synthetic and does not encode real-world macro behavior for the named vintages. Use the plots as directional guidance for model design, not as proof that this particular parameterization fits historical PE data. The fits below look especially cartoonish because the fake vintage bumps are aggressive—read them as a prompt to add time structure, not as evidence of a real deployment.

At this point the model has three big pieces in place:

  • Stage structure so Buyout / Venture / Growth / Infra don’t share a single volatility.
  • Heavy tails so Venture and Growth can throw the occasional 3–4× winner without wrecking the fit.
  • Stage-specific manager shrinkage so managers cluster around the part of the PE universe they actually live in.

What it doesn’t have yet is time. A 2009-vintage buyout fund and a 2019-vintage buyout fund currently look identical ex ante, which is not how the world works. Zero-interest regimes, liquidity waves, tech bubbles, and rate shocks all show up as quiet shifts in the background level of outcomes.

The easiest way to let the cycle speak is to add a vintage effect γv\gamma_v for each vintage year vv and plug it into the fund-level mean:

  • Each fund carries a vintage index v(f)v(f) (e.g. 2008–2022 mapped into 0,,V10,\dots,V-1).
  • Each vintage gets a latent offset γv\gamma_v that captures “how good was this macro environment?”
  • That offset sits alongside the manager anchor and the stage shift in the location parameter.

On the fund likelihood, that means we go from

yf,mtνs(f)(μm+αs(f),σs(f))y_{f,m} \sim t_{\nu_{s(f)}}\big(\mu_m + \alpha_{s(f)}, \sigma_{s(f)}\big)

to

yf,mtνs(f)(μm+αs(f)+γv(f),σs(f)),γvN(0,σγ2)or with time structure (AR(1) / random walk).\begin{aligned} y_{f,m} &\sim t_{\nu_{s(f)}}\big(\mu_m + \alpha_{s(f)} + \gamma_{v(f)}, \,\sigma_{s(f)}\big), \\ \gamma_v &\sim \mathcal{N}(0, \sigma_\gamma^2) \quad \text{or with time structure (AR(1) / random walk)}. \end{aligned}

In words:

  • μm\mu_m says “how good is this manager in general?”
  • αs(f)\alpha_{s(f)} says “how much does this stage tilt the mean up or down?”
  • γv(f)\gamma_{v(f)} says “how much did this particular vintage help or hurt everyone?”

For the first pass we can treat vintages as exchangeable random effects—each one gets its own Normal bump around zero with a shared variance σγ2\sigma_\gamma^2. If we want to acknowledge that 2011 is probably more like 2010 than 2002, we can upgrade γv\gamma_v into an AR(1) or a random walk in a later iteration.

A fixed-effects version in PyMC, layered on top of the hierarchical stage/manager t-model, looks like:

This is the “fixed effects” version: each vintage can float up or down, but they don’t talk to each other. If we want to encode the idea that the cycle drifts smoothly over time, a random-walk variant is only a couple more lines:

  • order vintages chronologically,

  • set

    gamma_0 ~ Normal(0, 10)
    gamma_t ~ Normal(gamma_{t-1}, tau_gamma)
    

and treat gamma_vintage as a latent time series rather than independent bumps.

Either way, the workflow loop is the same:

  1. Fit the vintage-aware model.
  2. Inspect the posterior over gamma_vintage as a function of calendar year.
  3. Run posterior predictive checks stratified by vintage: does the model now reproduce booms and busts?

Conceptually, PPC #5 looks like this:

  • Boom vintages (e.g. 2010–2014 in the seed) pull γv\gamma_v up; their posterior predictive funds sit noticeably higher, especially in Venture and Growth.
  • Bust vintages push γv\gamma_v down; the predictive distribution shifts left and the chance of mediocre or negative funds climbs.
  • Venture cycles look more realistic: the same stage and manager can deliver very different fund outcomes depending on when they were raised.
  • Credit / Infra remain relatively stable, with smaller swings in γv\gamma_v or weaker sensitivity in the tails.

Because the synthetic data has cartoonish boom/bust gaps, this particular fit is especially egregious: the γv\gamma_v line swings far harder than anything you’d expect from real PE vintages. The workflow is telling us exactly what we wanted to hear—“add time structure and be careful with these priors”—but don’t take the magnitudes literally.

Posterior mean macro/vintage effects

Loads the stage+vintage posterior draws (~13 MB) only when requested.

Posterior predictive check #5 (pooled)

Loads the stage+vintage posterior draws (~13 MB) only when requested.

Posterior predictive check #5 (vintage facets)

Loads the stage+vintage posterior draws (~13 MB) only when requested.

At this point the model is starting to line up with how PE actually feels in practice: managers anchored by stage, fat-tailed stage noise, and a macro dial that nudges everyone up or down by vintage. The remaining gaps are higher up the stack:

  • Latent factor structure. Some of what we call “vintage” is really underlying factors (rates, IPO windows, sector booms) that cut differently across stages and geos.
  • Skill evolution. A manager’s μm\mu_m probably isn’t static; teams change, playbooks evolve, and some shops genuinely improve (or decay) over time.

Those are the ingredients for a fully dynamic manager model—state-space skill, factor structure, maybe even stochastic volatility on top of the t-likelihood—that we can explore in a future iteration. For now, we’ve at least given time a voice and let the Bayesian workflow tell us which vintages were surfing a wave and which were swimming upstream.

Letting managers move: dynamic skill as a state-space model

Everything so far has treated manager skill as static:

  • each manager has a single anchor μm\mu_m,
  • possibly shrunk toward a stage-specific mean μs(m)\mu_{s(m)},
  • and that anchor applies to every fund they’ve ever raised.

That’s already better than eyeballing fund IRRs, but it’s still too rigid if you believe any of the following:

  • team composition changes over time,
  • playbooks get better (or stale),
  • cultures drift,
  • some managers genuinely level up, others burn bright once and fade.

Vintage effects γv\gamma_v gave the macro environment a voice. The next natural step is to let manager skill itself evolve through time. The simplest version is a random walk:

μm,t=μm,t1+ηm,t,\mu_{m,t} = \mu_{m,t-1} + \eta_{m,t},

where

  • tt indexes vintage (or coarse time buckets),
  • ηm,tN(0,τη2)\eta_{m,t} \sim \mathcal{N}(0, \tau_\eta^2) is a small drift term.

Interpretation:

  • if τη\tau_\eta is very small, managers are almost static around their initial μm,0\mu_{m,0},
  • if τη\tau_\eta is larger, managers can wander over time—some drift up, some drift down.

Pair that with the existing stage and vintage pieces and the fund-level mean turns into

yf,mtνs(f)(μm,v(f)+αs(f)+γv(f),σs(f)),y_{f,m} \sim t_{\nu_{s(f)}}\big(\mu_{m, v(f)} + \alpha_{s(f)} + \gamma_{v(f)}, \,\sigma_{s(f)}\big),

where μm,v(f)\mu_{m, v(f)} is the skill of manager mm in the vintage year of fund ff.

One way to encode this in PyMC, leaning on the discrete vintage grid, looks like:

This is very much “optional expansion” territory:

  • you need enough funds per manager across vintages to identify the drift,
  • the posterior gets higher-dimensional and more correlated,
  • and priors on τη\tau_\eta matter a lot (too loose and manager paths wander unrealistically; too tight and you’re back to static skill).

When it works, though, the payoff is a different kind of posterior predictive story. PPC #6 doesn’t just ask “do fund distributions look right?”; it asks:

  • Do we see realistic persistence patterns? Good managers tend to stay good, but not perfectly; bad ones sometimes regress.
  • Do we see random-walk skill trajectories that match the idea of gradual improvement or decay, rather than violent jumps?
  • Can the model distinguish evergreen GPs (skill paths that stay high across vintages) from one-hit wonders (a single spike in μm,t\mu_{m,t} followed by mean reversion)?
  • Does skill adapt to the cycle in a believable way? I.e., managers that leaned into a hot regime may look great in those vintages but less so in others.

You can visualize this in a few ways:

  • plot posterior draws of μm,t\mu_{m,t} for a handful of managers as time-series ribbons,
  • simulate new sequences of funds for each manager and look at how often they stay in the top quartile over multiple vintages,
  • compare static-skill vs dynamic-skill posterior predictive distributions for “future” funds.

The important meta-point isn’t that every production model needs a random-walk skill process. It’s that the Bayesian workflow gives you a principled way to bolt one on, check whether it buys you anything, and roll it back if the data just don’t support that level of complexity. For some managers, the posterior over τη\tau_\eta will effectively say “you’re flat—no evidence of meaningful drift.” For others, the state-space layer will tell a richer story about how their edge has evolved through different regimes.

Dynamic manager skill (top trajectories)

Loads the dynamic skill posterior draws (~6.8 MB) when this section opens.

Posterior predictive check #6 (pooled)

Loads the dynamic skill posterior draws (~6.8 MB) when this section opens.

Posterior predictive check #6 (vintage facets)

Loads the dynamic skill posterior draws (~6.8 MB) when this section opens.


A fully iterative Bayesian workflow recap

We just walked a cartoon manager → fund model from “embarrassingly naive” to “honestly complicated.” It’s worth pausing to notice the pattern, because this is the whole point of the Bayesian workflow:

  1. Start with a simple model. A single Normal around a manager mean, no stages, no tails, no time.

  2. Let posterior predictive checks reveal misfit. Pooled PPC looks fine; stage-wise PPC screams “you forgot strategy.”

  3. Add structure to address that misfit. Stage offsets and stage-specific volatilities give Buyout / Venture / Growth / Infra room to separate.

  4. Re-check. Stage-wise PPC improves, but tails and symmetry are still wrong.

  5. Inject domain knowledge. Student-tt likelihoods for fat tails, stage-specific manager anchors, and a macro vintage term all come directly from how PE actually feels to work in.

  6. Re-check again. Tails relax, strategy clusters look sane, booms and busts show up in the right vintages.

  7. Optionally, push into dynamic territory. Random-walk skill lets you ask whether managers evolve in time, and PPCs move from “do the histograms overlap?” to “do persistence patterns and trajectories look believable?”.

At each step:

  • the model grows organically, driven by what the data and diagnostics are complaining about,
  • posterior predictions improve in ways you can see, not just in a log-score summary,
  • uncertainty becomes more honest, because you’re acknowledging more of the structure that actually drives returns,
  • and the generative story aligns more closely with reality, instead of staying stuck in a toy world.

That loop—model → fit → check → expand—is the essence of modern Bayesian data science. The math changes as you add layers (conjugate Normal turns into hierarchical t-state-space), but the workflow doesn’t.


Why this workflow is inevitable in private markets

The reason this all feels so natural on a manager model is that private markets practically beg for this style of modeling.

  • Sparse data → hierarchical modeling. Most managers don’t have twenty funds; they have three. Hierarchy and partial pooling are the only way to get stable estimates without overfitting one hot or one dead fund.

  • Fat tails → heavy-tailed likelihoods. Venture and late-stage growth have real 3–5× winners and ugly zeros. Forcing them through a thin-tailed Normal either erases that behavior or lets a few outliers dominate everything. A t-likelihood is a minimal concession to that reality.

  • Heterogeneity → strategy-level structure. Buyout, venture, growth, and infrastructure are not the same population. Strategy-specific offsets and volatilities are the modeling translation of “we know these are different games.”

  • Cycles → vintage-year effects. A 2011 fund and a 2021 fund are not operating in the same macro environment. Vintage effects—and eventually more structured time processes—are how you let that into the model without hard-coding macro views.

  • Skill drift → latent time series. Some managers are evergreen; some had one good cycle plus a lot of marketing. A static μm\mu_m can’t tell those stories apart. A state-space layer at least gives the model the vocabulary to say “this GP drifted up, that one drifted down.”

  • Story-first modeling → posterior predictive checking. The point of writing down a generative story isn’t purity, it’s testability. PPCs are the reality check: “If this were how PE worked, would we see data that look like this?” When the answer is no, you don’t throw away Bayes—you update the story.

Bayesian modeling here isn’t about discovering the one true model of manager outcomes. It’s about building an evolving generative story that becomes more truthful every time you check it against reality:

  • start simple so you can see your assumptions,
  • let the data tell you where those assumptions fail,
  • add structure where it buys you explanatory or predictive power,
  • and stay willing to back off when the data don’t support your ambitions.

In a domain with thin samples, fat tails, structural heterogeneity, and macro cycles, this kind of iterative, story-driven modeling is a particularly natural fit. It doesn’t replace all the tools already in use, but it gives you one more way to make assumptions explicit, probe how fragile they are, and turn uncertainty into something you can reason about instead of something you have to hand-wave away.