Tool Calling Is Just Function Composition
In my previous post on agents, I framed intelligence as control under uncertainty: an agent maximizing expected return over a POMDP. That framing is correct, but it treats tool calling as a black box — "the action space gets bigger."
This post zooms in on the structure of tool calling itself. The punchline:
Tool calling is function composition. The "with uncertainty" part is what makes it a monad.
If you've written Haskell or Rust, this will feel familiar. If you haven't, don't worry — we'll build up from pure functions and see why the monadic structure emerges naturally when things can fail.
Pure composition: the happy path
In functional programming, composition is the bread and butter:
You pipe outputs to inputs. Types line up. The world is clean.
A three-tool pipeline looks like:
If each , then . Simple.
This is the mental model most people have of agent pipelines. "Call tool A, get result, call tool B, get result, ..."
But real tools don't work this way. They fail. They timeout. They return garbage. They cost money. The structure of composition survives, but wrapped in uncertainty.
The Unix philosophy: small tools, big pipes
Before we get to monads, let's look at an older precedent. The Unix philosophy says: write programs that do one thing well, and connect them via pipes.
cat server.log | grep "ERROR" | cut -d' ' -f3 | sort | uniq -c | sort -rn | head -10
This pipeline:
cat— reads the file (one thing)grep— filters lines (one thing)cut— extracts fields (one thing)sort— sorts lines (one thing)uniq -c— counts duplicates (one thing)sort -rn— sorts numerically descending (one thing)head— takes first N (one thing)
Each tool is tiny and composable. The pipe | is the composition operator. The shell handles the plumbing — buffering, process management, signal handling.
Sound familiar? MCP tools are the same pattern:
- Each tool does one thing well (query database, send email, execute code)
- The agent orchestrator is the shell
- Tool calls are connected via the agent's reasoning
The difference: Unix pipes are (mostly) deterministic. Tool calls can fail, timeout, or return unexpected results. We need machinery to handle that.
When tools fail: Enter the monad
Real tool calling is:
The tool might succeed and return , or fail and return an error. Now composition breaks — you can't just feed an Either into a function expecting a raw value.
The fix is monadic bind (>>= in Haskell, and_then in Rust, flatMap in Scala, .then() in JS promises):
In words: "If the first computation succeeded, unwrap the value and feed it to the next computation. If it failed, propagate the error."
Haskell
pipeline :: Input -> Either Error Output
pipeline x = tool1 x >>= tool2 >>= tool3
-- The >>= ("bind") operator handles the plumbing:
-- if tool1 fails, short-circuit
-- if tool1 succeeds, feed result to tool2
Rust
Rust makes this beautiful with the ? operator. Result<T, E> is Rust's Either:
fn pipeline(input: Input) -> Result<Output, Error> {
let a = tool1(input)?; // Early return on Err
let b = tool2(a)?; // Early return on Err
let c = tool3(b)?; // Early return on Err
Ok(c)
}
The ? is syntactic sugar for "if this is Err, return early; if it's Ok, unwrap and continue." It's monadic bind with better ergonomics.
Rust also has Option<T> for "might not exist" (like Haskell's Maybe):
fn find_tool_config(name: &str) -> Option<Config> {
let registry = load_registry()?; // None propagates
let entry = registry.get(name)?; // None propagates
let config = entry.parse_config()?; // None propagates
Some(config)
}
The ? works on both Result and Option. Same pattern, different error types.
The pattern
This is exactly what MCP tool calling does under the hood. Each tool call returns a result or an error. The agent (or framework) decides whether to continue, retry, or bail.
The monad laws (left identity, right identity, associativity) guarantee that composition is well-behaved — you can refactor pipelines without changing semantics. This is why functional programmers care about monads: they're a disciplined way to handle effects.
Reliability compounds multiplicatively
Here's where it gets quantitative. If each tool succeeds with probability , and failures are independent, the pipeline success probability is:
This is brutal. Five tools at 95% reliability each:
You've lost 23% of your runs to failures somewhere in the chain.
Interactive — Pipeline composition
Pipeline controls
Each tool in the pipeline has reliability pi = p0(1-decay)i. The theoretical pipeline reliability is the product: P(success) = ∏pi. With k retries, each tool succeeds with probability 1 - (1-pi)k+1.
Play with the parameters above. Notice how:
- More tools → lower success rate (multiplicative decay)
- Retries help, but cost money and time
- Reliability decay along the pipeline (later tools less reliable) shifts failure mass toward the end
Composition as funnel
The dashed line shows theoretical cumulative reliability: P(reach Tk) = ∏i≤k pi. The solid line shows observed reach from Monte Carlo. Each bar shows individual tool reliability.
The funnel chart shows cumulative reach: what fraction of runs make it to each stage. The theoretical line assumes independence; the observed line comes from Monte Carlo simulation.
Uncertainty propagation: where do failures cluster?
The monadic view gives us structure; Monte Carlo gives us numbers. Let's look at the distribution of outcomes.
Uncertainty propagation
Latency percentiles
Mean: 661ms ± 124ms
Mean cost: 5.42 units
Expected value: 88.7
E[V] = P(success) × 100 − P(fail) × 20 − E[cost]
Failures cluster early when reliability decays along the pipeline. The latency distribution shows how variance compounds through composition. Cost includes retries — more retries means higher cost on failure paths.
Key observations:
- Latency is right-skewed — retries add mass to the tail
- Failures cluster early when reliability decays along the pipeline
- Cost correlates with latency — failed attempts still cost money
- Expected value captures the tradeoff: P(success) × reward − P(fail) × penalty − E[cost]
This is the same variance decomposition logic from hierarchical Bayes: total variance = sum of component variances, but here the components are pipeline stages rather than manager/fund/deal levels.
Shrinkage for tool reliability: a worked example
Here's where the hierarchical Bayes connection gets concrete.
Say you're running an agent that calls tools from three vendors: vendor_A, vendor_B, vendor_C. Each vendor provides multiple tools. You've observed some success/failure data:
| Tool | Vendor | Calls | Successes | Raw rate |
|---|---|---|---|---|
A.query | A | 50 | 47 | 94% |
A.write | A | 12 | 11 | 92% |
A.delete | A | 3 | 3 | 100% |
B.query | B | 200 | 186 | 93% |
B.write | B | 80 | 71 | 89% |
C.query | C | 8 | 6 | 75% |
The problem: Should you trust that A.delete is 100% reliable? That C.query is only 75%?
No. The sample sizes are tiny. A.delete has 3 observations — it could easily fail 10% of the time and you just got lucky. C.query might be fine; 6/8 is within normal variance of a 90% tool.
The fix: shrinkage. Instead of using raw rates, pool information hierarchically:
where the weight depends on sample size:
- = number of observations for this tool
- = observation noise (binomial variance)
- = variance across tools within a vendor
Small → small → shrink toward vendor mean. Large → large → trust the data.
For our example, if vendor A's pooled rate is ~94% and vendor C's is ~85%:
| Tool | Raw rate | Shrunken estimate | Why |
|---|---|---|---|
A.delete | 100% | ~95% | Shrink toward vendor A mean (n=3 is tiny) |
C.query | 75% | ~82% | Shrink toward vendor C mean (n=8 is small) |
B.query | 93% | ~93% | Barely shrinks (n=200 is plenty) |
This is the same math from Borrowing Predictive Strength, but applied to tool reliability instead of fund returns. The hierarchy is:
The agent that uses shrunken estimates will make better decisions than one that trusts raw rates. It won't over-rely on tools with suspiciously high rates from tiny samples, and it won't abandon tools that had a few bad runs.
The OODA loop: why faster feedback wins
John Boyd was a fighter pilot and military strategist. His key insight: the side that cycles through Observe-Orient-Decide-Act faster wins, even with worse individual components.
| Phase | What happens | Agent equivalent |
|---|---|---|
| Observe | Gather data from environment | API calls, sensor reads, user input |
| Orient | Update mental model of reality | LLM processes context, updates beliefs |
| Decide | Choose action from options | Policy selects tool + arguments |
| Act | Execute the decision | MCP tool call |
The loop repeats. Each iteration updates your model and takes action. Faster loops = more iterations = better adaptation.
Why this matters for engineering organizations
Boyd's insight applies beyond dogfights. Consider two engineering teams:
Team Slow (monthly deploys):
- Observe: collect metrics monthly
- Orient: analyze in quarterly reviews
- Decide: plan features for next quarter
- Act: deploy once a month
- Cycle time: ~90 days
Team Fast (continuous deployment):
- Observe: real-time monitoring, feature flags
- Orient: daily standups, instant dashboards
- Decide: small batch decisions, A/B tests
- Act: deploy multiple times per day
- Cycle time: ~1 day
Team Fast runs 90× more OODA cycles per quarter. They:
- Detect problems faster (shorter observe latency)
- Update understanding faster (shorter orient latency)
- Course-correct faster (shorter decide-act latency)
- Learn faster (more iterations through the loop)
This is why CI/CD wins. It's why feature flags beat big-bang releases. It's why startups can outmaneuver incumbents: they're operating inside the incumbent's OODA loop.
For AI agents, the same logic applies
An agent with:
- Faster inference → more OODA cycles per task
- Better observation tools → lower observe noise
- Better world model → lower orient noise
- Better policy → lower decide noise
- More reliable tools → lower act noise
The compound effect is huge. An agent running 10 OODA cycles with 80% per-cycle accuracy outperforms one running 2 cycles with 95% accuracy:
10 cycles at 80%:
2 cycles at 95%:
More iterations beat higher per-iteration accuracy. Speed compounds.
OODA Loop: Latency + Uncertainty
Latency (ms)
Uncertainty (σ)
Faster loops enable quicker adaptation. Lower uncertainty means more reliable state estimates.
Latency shape
Observe: Gather sensor/API data. Orient: Update world model (LLM inference). Decide: Select action (policy evaluation). Act: Execute tool call. Uncertainty compounds through the loop: σtotal² ≈ σO² + (1+σO²)(σR² + ...).
The uncertainty compounding formula is:
This is not simple addition — later stages amplify earlier uncertainty because they operate on corrupted inputs. A 10% error in observation can become a 30% error in action after passing through a noisy world model and policy.
The cure: faster loops let you correct errors before they compound too far. Each new observation partially resets the error accumulation.
Connecting the threads
Let's tie this back to the POMDP framing. There we had:
Now we can be more precise about what "select action" means when is a tool call:
- Action selection is choosing which tool to call with which arguments
- Execution is the stochastic map
- Composition is chaining multiple tool calls via monadic bind (Rust's
?, Haskell's>>=) - Uncertainty propagates through the chain multiplicatively
The agent's job is to choose which composition to attempt, given:
- Estimated reliability of each tool (use shrinkage!)
- Latency and cost constraints
- Value of success vs. cost of failure
- How many OODA cycles it can afford
The Lisp connection
In Cyborg Lisps, I wrote about embedded Lisps in host languages — Clojure on the JVM, Hy on Python, Fennel on Lua. The appeal is metaprogramming: code that writes code, macros that transform syntax.
Tool calling has the same flavor. A tool schema (MCP's typed interface) is like a function signature. A tool call is like a function application. An agent orchestrator is like a macro system that generates and executes tool-calling code at runtime.
The difference is uncertainty. Macros expand deterministically (or fail to compile). Tool calls succeed probabilistically. The monadic wrapper handles what macros can't: runtime failure, retry logic, fallback strategies.
You could imagine a language where tool calls are first-class and composition is syntactically supported:
// Hypothetical Rust-like agent DSL
let result = tool1(x)?
.retry(3)
.timeout(Duration::from_secs(5))
.fallback(|| default_value)
.and_then(tool2)?
.and_then(tool3)?;
Some agent frameworks are converging on this. The functional programming community got here decades ago with IO, Either, Maybe. Rust brought it to systems programming with Result and Option. Agents are rediscovering the same abstractions.
Takeaways
- Tool calling is function composition with failure modes — Unix pipes, Haskell's
>>=, Rust's?are all the same pattern - Monads (Result, Option, Either, Maybe) are the right abstraction for handling effects and failures
- Reliability compounds multiplicatively — five 95% tools give you 77% end-to-end
- Use shrinkage to estimate tool reliability — don't trust raw rates from small samples
- OODA loops explain why faster feedback wins — more cycles beat higher per-cycle accuracy
- Engineering orgs with faster OODA loops (CI/CD, feature flags) learn faster than slow-cycle competitors
- Uncertainty compounds through the loop, but faster iterations let you correct before errors snowball
The agent revolution isn't inventing new math. It's applying old math — control theory, functional programming, Bayesian inference, Unix philosophy — to a new substrate: LLMs connected to tools via typed protocols.
The math doesn't care whether you're composing Haskell functions, Rust futures, Unix pipes, or MCP tool calls. It's the same diagram, the same laws, the same failure modes. That's what makes it beautiful.
Further reading
- Intelligence Increase as Control Under Uncertainty — the POMDP framing
- Borrowing Predictive Strength — hierarchical Bayes and shrinkage
- Cyborg Lisps — metaprogramming and embedded languages
- Boyd, John. Patterns of Conflict — the original OODA loop presentation
- Wadler, Philip. Monads for functional programming — the classic tutorial
- The Rust Book, Error Handling — Result, Option, and the ? operator