NOV12

WED2025

Intelligence Increase as Control Under Uncertainty

Why

\operatorname{Intelligence} \approx \arg\max_\pi \mathbb{E}[\sum_t \gamma^{t-1} R(s_t, a_t)]

under tool-augmented constraints.

aiagentscontrolphilosophysmilepompdmcpplotlysystems

Timothy Leary’s “Intelligence Increase” has the vibe of a mystical unlock. You can restate it in a much more boring, much more useful way:

Intelligence is the ability of an agent to select actions that drive the world into high-value states, under uncertainty and resource constraints.

Once you phrase it like that, the whole modern AI stack — language models, MCP-style tool servers, agents interacting with infrastructure and robots — looks like a very particular answer to a very old control problem.

This post is about that control problem:

how to model it,
how modern agents and tools fit into it,
why “touching the physical world” is just the same structure with messier physics,
and where “intelligence increase” shows up in actual numbers.

Along the way, we’ll use a few simple visuals and mental models to keep the math grounded.

The basic object: an agent in a messy world

We’ll start with a standard partially observable Markov decision process (POMDP):

Hidden state: $s_t \in \mathcal{S}$
Observation: $o_t \sim O(\cdot \mid s_t)$
Action: $a_t \in \mathcal{A}$
Transition: $s_{t+1} \sim T(\cdot \mid s_t, a_t)$
Reward: $r_t = R(s_t, a_t)$

An agent is a policy $\pi$ that picks actions from the full history:

a_t \sim \pi(\cdot \mid h_t), \quad h_t=(o_1,a_1,\dots,o_{t-1},a_{t-1},o_t).

The agent’s “intelligence,” in this formal sense, is how high it can drive the expected return:

\mathbb{E}_\pi \left[ \sum_{t=1}^{T} \gamma^{t-1} r_t \right],

subject to constraints: computation, information, safety, and so on. Here $\gamma \in (0,1]$ is the familiar discount factor from control theory: a knob that trades off patience vs. urgency by down-weighting far-future rewards (and keeps the infinite-horizon sum finite when $\gamma<1$ ). Thinking about “intelligence increase” is really thinking about how to make that discounted return larger under the same or tighter constraints.

When we talk about “intelligence increase” in 2025, we are really talking about pushing up this value for a whole distribution of tasks and agents:

better approximate world models $\hat{T}, \hat{O}, \hat{R}$ ,
better policies $\pi$ that can use those models,
larger and more expressive action spaces $\mathcal{A}$ via tools and actuators.

That skeleton fits everything from pure software agents to robots and labs.

Language models as approximate world models

A large language model is, at minimum, a conditional distribution

p_\theta(x_t \mid x_{<t})

trained to minimize cross-entropy on sequences from some data distribution.

You can reinterpret this in control-theory language:

The history $x_{\le t}$ is compressed into some latent state $z_t = f_\theta(x_{\le t})$ .
The model learns an approximate predictive distribution over “what happens next,” conditioned on that latent.

If the text you train on includes:

bug reports and code reviews,
tickets and resolutions,
lab notebooks and experimental outcomes,
contracts and counteroffers,

then the model is a learned world model over human procedures. It’s not just predicting grammar; it is predicting how humans tend to evolve states in task-space.

Call that learned model $\hat{P}_\theta$ . When you prompt it with:

“Given the following codebase and bug report, propose a fix and patch…”

you are implicitly querying

\hat{P}_\theta(\text{‘good patch’} \mid \text{‘bug + context’}).

On its own, that is a powerful but passive object: a stochastic simulator of plausible next moves in human workflows.

To get agents, you wrap it in a control loop.

From models to agents: policies, value, and tools

Imagine a simple agent built around an LLM:

At time $t$ , history $h_t$ is encoded as a prompt $x_t$ .
The model samples or searches over continuations, producing a candidate action description $\tilde{a}_t$ .
A parser maps $\tilde{a}_t$ into a concrete action $a_t \in \mathcal{A}$ .
The environment responds with a new observation and reward.

Formally, the policy is:

\pi_\theta(a_t \mid h_t) = \int_{\tilde{a}_t} \mathbf{1}\{\text{parse}(\tilde{a}_t)=a_t\} \\ \; p_\theta(\tilde{a}_t \mid \text{prompt}(h_t)) \, d\tilde{a}_t.

So far, the action space $\mathcal{A}$ is “emit text and hope someone else makes sense of it.” The intelligence increase here is limited by how many humans you can point at the outputs.

Now introduce tools.

Each tool $k$ is a conditional operator:

T_k: \mathcal{X}_k \rightarrow \mathcal{Y}_k,

which might be:

a database query,
a code-execution sandbox,
an internal HTTP API,
a job scheduler,
or a robot controller.

The agent can choose between:

a language action (produce text), and
a tool action specifying $(k, x_k)$ — “call tool $k$ with input $x_k$ ”.

The action space becomes:

\mathcal{A} = \mathcal{A}_{\text{text}} \;\cup\; \bigcup_k \{ (k, x_k) : x_k \in \mathcal{X}_k \}.

The Model Context Protocol (MCP) is a way to turn these tools into typed, discoverable, permissioned endpoints. From the agent’s perspective, MCP turns “the world of tools” into a graph of callable stochastic operators with schemas.

Mathematically, nothing mystical happens. You have:

expanded the set of admissible actions,
introduced new observations (tool outputs),
given the agent more ways to influence the environment.

Digital vs physical: same math, uglier kernel

From the agent’s point of view, there is no metaphysical difference between:

calling a repo tool to refactor a codebase, and
calling a robot to move a box in a warehouse.

Both are just actions with stochastic consequences:

s_{t+1} \sim T(s_{t+1} \mid s_t, a_t).

The difference is the shape of $T$ .

For purely digital tools (say deterministic code transforms or database reads), the transition kernel is almost deterministic and cheap:

s_{t+1} \approx f(s_t, a_t) \quad \text{with low variance and short latency.}

For physical tools (robots, lab equipment, vehicles), you get:

s_{t+1} \sim T_{\text{physics}}(\cdot \mid s_t, a_t)

with:

heavy-tailed noise (slip, friction, collisions),
partial observability (occlusions, sensor limits),
latency and dead time,
nonstationarities (wear, temperature, changing layout).

From the policy’s eye-view:

The expanded action set looks the same — “I can call move_box() just like I can call run_migration().”
The risk and uncertainty profiles are different — one lives in a low-noise, reversible subspace; the other in a high-noise, sometimes-irreversible subspace.

Robotics, autonomous labs, self-driving fleets — all of these are the same control problem with nastier $T$ , not a fundamentally different intelligence problem.

Why agents + tools feel like a phase transition

The last few years delivered three coupled changes:

Better approximate Bayesian predictors

LLMs are not exact Bayes filters, but empirically they are strong estimators of

p(x_{t+1} \mid x_{\leq t})

over human-generated trajectories.

That gives agents cheap access to powerful priors over what tends to work next in human task-space. Instead of planning from scratch, they can sample from “what competent humans do” and refine from there.

Larger, more structured action spaces (via MCP)

Each new MCP server adds a set of typed tools:

mcp://repo → code and version control,
mcp://infra → deployment, scaling, configuration,
mcp://crm → customer records, workflows,
mcp://warehouse → picking and routing,
mcp://lab → experiment design and execution.

In the abstract, each of these is a bundle of new actions and transition dynamics stitched into the global kernel $T$ .

\mathcal{A}_{\text{with MCP}} \supset \mathcal{A}_{\text{text only}}

The reachable state region $\mathcal{R}_\pi$ — all states the agent can drive the system into within a horizon — gets bigger and richer.

Cheap, compositional planning

Because the agent has a world model that is “good enough,” it can do cheap approximate planning:

sample candidate action sequences (plans),
evaluate them with a value head or a secondary model,
pick the best one under a heuristic return estimate.

This is Monte Carlo tree search with a learned policy/value prior, but the tree is spanned in natural language and tool calls instead of board moves.

Taken together:

the policy class $\pi_\theta$ we can realize got larger,
the learned dynamics model $\hat{P}_\theta$ we can query got richer,
the action space $\mathcal{A}$ got deeper and more compositional.

“Intelligence increase” is not one knob; it’s a correlated push along all three axes.

Callout: BCIs as just another channel

BCIs: extra I/O in the same POMDP

Brain–computer interfaces fit neatly into the same picture:

A recording BCI adds a new observation channel $o_t^{\text{BCI}}$ encoding some function of neural state.

A stimulation BCI adds an action component $a_t^{\text{BCI}}$ that influences the user’s brain state.

From the environment’s perspective, it is still:
$s_{t+1} \sim T(s_{t+1} \mid s_t, a_t), \quad a_t = (a_t^{\text{tools}}, a_t^{\text{BCI}}).$
Today’s BCIs mostly restore missing channels (e.g. letting paralyzed people emit meaningful actions again). That is a genuine intelligence increase at the system level: the human-agent pair has a larger, more reliable action and observation space than the injured human alone.

Long term, you can treat BCIs as yet another class of MCP servers — wired into the nervous system instead of a database, warehouse, or robot arm.

Where “intelligence increase” shows up in numbers

All of this is nice concept art, but where do you actually see intelligence increase?

You can track it in at least four quantitative ways.

Value uplift on a task distribution

Fix some distribution over tasks (bugs, tickets, analyses, experiments). Let:

$V_{\text{human}}$ = expected return with humans alone.
$V_{\text{aug}}$ = expected return with agents + tools + human oversight.

The first and most direct quantity is:

\Delta V = V_{\text{aug}} - V_{\text{human}}.

For some domains, this looks like “higher resolution rate”; for others, “more revenue,” “fewer safety incidents,” or “more high-quality hypotheses per week.”

Cost per correct action (learning curve)

Let:

$C_{\text{human}}$ be the cost per episode with humans alone.
$C_{\text{aug}}$ be the total cost (compute, infra, supervision) per episode with agents.

Then compare:

\frac{C_{\text{human}}}{V_{\text{human}}} \quad \text{vs} \quad \frac{C_{\text{aug}}}{V_{\text{aug}}}.

As you accumulate more episodes, you often see a learning-curve-like decline in cost per correct action for the augmented system, roughly:

\text{cost} \propto (\text{cumulative successful actions})^{-b}.

When the augmented system wins that race, you have not only more capability but structurally cheaper capability.

Share of agent-completed work

For a specific workflow, track:

the fraction of tasks completed end-to-end by agents (with human review), and
the rollback rate (cases where humans have to discard the agent’s work and redo it).

Plot the share of agent-completed work over time. You usually get an S-curve:

Assistant phase: agents help but rarely run workflows end-to-end.
Co-worker phase: agents own 20–50% of tasks, with humans mostly supervising.
Infrastructure phase: agent involvement is ubiquitous and boring, like databases.

Predictive uncertainty

For a task-relevant function $f(s_{t+k})$ (e.g. “is the customer satisfied?”, “is the system in a safe state?”), look at:

\text{Var}\big[ f(s_{t+k}) \mid h_t \big]

under your best model. If better agents + tools steadily reduce that variance (and your calibration checks say it’s honest), then you have tighter control over futures that matter.

Better models, richer tools, and smarter policies all contribute to shrinking those error bars.

So what does “intelligence increase” amount to?

Strip away the acronyms and diagrams, and the picture is:

We’ve trained big sequence models that are surprisingly good at predicting what competent humans do next in a huge variety of contexts.
We’ve wired those models into MCP servers that expose databases, code, infrastructure, and increasingly, physical actuators.
We’ve wrapped the whole thing in agent loops that can plan, call tools, observe results, and adjust.

From the agent’s point of view, hitting a Kubernetes cluster and hitting a robot arm are just two different tools with different noise models. The math is the same; physics just makes the transition kernel uglier, slower, and more expensive to explore.

What people experience as “AI getting smarter” is:

policies $\pi$ that can exploit better world models,
over richer action spaces,
under more realistic constraints,
on more and more of the state space we care about.

Leary talked about Intelligence Increase like a moment of awakening. What we seem to be building is a layered control stack:

Approximate world models over human behavior and environments,
Agents that use those models to plan and choose actions,
MCP tools that let those actions reshape both digital and physical reality.

The mathematics does not tell us what to value. It just gives us a sharper, cheaper way to steer whatever we decided to value in the first place. That is the real weight behind intelligence increase: not that we become godlike, but that we become dangerously competent at optimizing the particular reward functions we enshrine in code.