Explore vs Exploit in the Age of AI
Balancing exploration vs exploitation by tuning so stays above ship-it.
9/24/2025
When a team works on a greenfield project, every hour can go to one of two buckets:
- Exploit: double down on what we already know, refine execution, push features out the door.
- Explore: read broadly, try new tools, sketch architectures that might pay off later.
This is the classic explore–exploit tradeoff: spend too much time exploiting and you risk missing the higher hill next door; spend too much time exploring and nothing ships.
Building a simple model
Let’s build a toy model from first principles so we can reason about the policy knob (time share on exploration).
1) States and choices
At time a developer has:
- : execution skill (how fast and clean they ship).
- : breadth (how many patterns, tools, and mental models they can reach for).
They split the next unit of time:
- on exploration (reading, prototyping).
- on exploitation (building).
2) How skill and breadth evolve
Practice helps with diminishing returns; knowledge decays without reinforcement:
with and as simple monotone choices. Parameters .
3) Where output comes from
Two channels:
- Core execution (planned work), complementarity between skill and breadth:
- Opportunistic wins (serendipity), where idea arrivals rise with breadth and payoff grows with both:
with and .
Total output per step:
We track discounted payoff for .
How AI changes the picture
AI does not change the structure; it changes the slopes.
- Execution gets cheaper. Autocomplete, tests, scaffolding make routine work less differentiating. Parameters go down.
- Exploration pays faster. Prototypes get cheaper, so breadth turns into wins more directly. Parameters go up.
In symbols, same form with primed parameters:
Try it yourself
Controls
Output over time
Compare your chosen policy against a no-reading baseline. Toggle "Compare worlds" to overlay Pre‑AI and With‑AI.
State trajectories
Skill on the left axis, breadth on the right. Notice how a small floor on prevents from eroding.
Discounted payoff
Cumulative . This makes the long-horizon effect visible: with AI, exploration compounding shows up sooner.
Policy in plain English
- Front load exploration early in greenfield (20 to 30 percent), then taper.
- Maintain a floor (for example, 10 percent) so breadth does not decay under deadline pressure.
- Demand artifacts from exploration (memos, experiments, small frameworks).
- Measure step change wins: deletions that simplify code, cost cliffs, 2x speedups.
Why breadth multiplies execution
If breadth raises the number of viable options from to , the expected best choice improves roughly like the expected maximum of draws:
That “better choice” bump is what is standing in for.
Closing thought
Pre‑AI, explore vs exploit was a knife edge. With AI, execution is cheaper and exploration is more valuable. The trick is not to abandon building, but to raise the floor on reading and thinking so today’s breadth becomes tomorrow’s speed.