Home > Tech > Notes from the AI engineers Anthropic Meetup

Notes from the AI engineers Anthropic Meetup

January 3, 2026

Anthropic peakxv meetup

These are notes from the Anthropic AI engineers meetup I attended at the Bangalore PeakXV office(9th Oct 2025). This is a very rough note based off whatever was being spoken about. If I do get time I will refine and elaborate on these notes and add in my impressions. Hope this is helpful to whoever reads it! NOTE: This does not reflect my opinion of LLMs or Anthropic, all opinions are explicitly tagged and mentioned.

Speaker: Daniel Delaney

Multi agent systems

Anthropic agent SDK
single LLM features -> workflows -> agents
agents vs workflows there’s a diff
agents are LLMs using “tools” dynamically making decisions in their own environment
open ended problems where you need to iterate
predicts req num of steps
multi agents: open ended, parallelisable tasks, lot of info, lot of tools involved
multi agents: leverage collective intelligence
Opus as “lead agent” sonnet as “sub agents”
run tasks for longer -> what does that even mean :skull: more tokens used for them xD

Case Study: Multiulti agent research system

agentic search is open ended prob
multiple pathways to go down
clear sep of concerns
parallelise for speed
need to manage context intelligently
independent exploration
Lead agent and as many subagents are required as it thinks
memory system
citations agent for eg.
orchestrator workflow
similar to single agent system?
Lead does most of the thinking
spin up subagents as many as it needs
subagents iterate as much as required

Downfall

coordination complexity
simple queries resulted in 50 subagents apparently in early systems
solution: prompt engg leverage

Solutions:

Good prompt engineering, think like your agents
Build sims to understand prompt effects
early agents were way too verbose
- selected incorrect tools
effective prompting requires an accurate mental model of the agent
scale effort to complexity, don’t let agents overuse resources
the lead agent needs to estimate a good amount to understand where resources go and how

Scale effort to query complexity

Be explicit with what difficulty looks like

Teach the orchestrator how to delegate

explicitly direct subagents about how to operate
best prompts are frameworks for collaboration, define division of labour and budgets and prob solving approaches

Focus on good heuristics rather than rigid rules

decompose difficult decisions into smaller tasks
eval source quality(device db -> qdrant)
Operating heuristicc/ways to learn trumps rigid rules
Get the agent to think trust the general intelligence of the model use good heuristics to instruct
This has been only possible in the last 6 months, but the trust is hard to keep up with.
tool descriptions matter so that subagents can understand what to use

Build agents to improve other agents

agents are excellent prompt engineers
trained 1 agents to make tools more effective
said agent rewrote desc for the tools so subagents could understand it better

Evals

Test multi agent systems early, its apparently very often, small tweaks to the lead agent resulted in huge cascading changes
if they didn’t do this it would have been really hard to diagnose issues later on
don’t delay until you can build more through evals

LLM as a judge

LLMs are natural fits for grading outputs when given eval heuristics
LLM as a judge scales to hundreds of outputs and a range of use cases
(My personal opinion on this is that their testing methods were incredibly shaky at best)

Don’t discount human evals

People testing agents find things that evals miss

Costs

Very expensive
15x more tokens

←

Anatomy of a Wasm Runtime