Why We Chose Go for Our AI Agent Platform (When Everyone Else Picked Python)

Every AI agent framework you’ve heard of is written in Python — LangChain, LangGraph, AutoGen, CrewAI, Semantic Kernel (partially). When we started building an enterprise AI agent platform at StackGen, the obvious choice was Python.

We chose Go instead. Six months and 76,000 lines of production code later, here’s why — and the trade-offs we had to accept.

Update (July 3, 2026): The trade-off in §5 — Dynamic LLM JSON parsing is softer than when this post first published. We now route shared LLM JSON decode paths through github.com/kaptinlin/jsonrepair — a Go port of Jos de Jong’s jsonrepair, tuned for malformed model output (fences, single quotes, missing commas, Python literals). See the revised section below for how we use it in production.

The Problem We Were Solving

We weren’t building a chatbot. We were building an agent runtime — software that lets AI models invoke shell commands, call APIs, manage infrastructure, and delegate work to sub-agents. In production. On your servers. With your credentials.

The requirements:

Run 50+ agents concurrently per deployment, each with its own tools, memory, and policies
Embed as a library inside a larger platform (Aiden), not run as a separate service
Deploy as a single binary to air-gapped environments, edge nodes, and developer laptops
Sub-second tool routing — agents call 10-20 tools per task; overhead per call matters
Handle untrusted input — agents process user prompts that might contain injection attacks

Let’s walk through how Go handles each of these.

1. Concurrency: Goroutines Are the Perfect Model for Agents

An AI agent in production does many things at once:

Calls an LLM API and waits for streaming tokens
Executes a shell command on a remote server
Queries a vector database for relevant context
Listens for human approval on a pending tool call
Runs a cron job that triggers another agent

In Python, you’d use asyncio — which means every function in the call chain must be async, every library must support it, and debugging involves staring at coroutine tracebacks. Or you use threading, and then you’re fighting the GIL.

“But Python has No-GIL now!” Fair — Python 3.13+ introduced free-threaded mode (PEP 703), and asyncio.TaskGroup (3.11+) acts like Go’s errgroup. But even with No-GIL, Python still suffers from “colored function” syndrome: async functions can’t call sync functions without an executor, sync functions can’t call async without an event loop. Your entire dependency tree must agree on a color. In Go, concurrency is colorless — every function can be spawned as a goroutine, period.

In Go, we just… use goroutines:

g, gctx := errgroup.WithContext(ctx)

// Fetch context from vector store
g.Go(func() error {
    docs, err := vectorStore.Search(gctx, query)
    // ...
})

// Check for pending HITL approvals  
g.Go(func() error {
    approvals, err := hitlStore.GetPending(gctx, agentID)
    // ...
})

// Both run concurrently, errors collected cleanly
if err := g.Wait(); err != nil {
    return fmt.Errorf("context setup failed: %w", err)
}

No async/await coloring. No event loop. No GIL. Just functions that run concurrently, with structured error handling via errgroup.

Real example from our codebase: When an agent delegates to 3 sub-agents in parallel (BuildParallel control flow), each sub-agent runs as a goroutine with its own LLM session, tool set, and memory context. The parent waits for all three via errgroup.Wait(). If one fails, the context is cancelled and the others wind down cleanly.

What about Python’s asyncio?

It works. But it infects your entire codebase. If your ORM isn’t async-compatible, you’re blocking the event loop. If a library uses requests instead of httpx, you need a thread pool executor. In Go, everything is concurrent by default — net/http, database drivers, file I/O — because goroutines are how the language works, not an opt-in mode.

2. Single Binary: Ship One File, Not a Dependency Tree

Our agent runs on:

Developer laptops (macOS, Linux, Windows)
Kubernetes clusters (Alpine containers)
Air-gapped enterprise servers
CI/CD pipelines as a CLI tool

With Go:

$ GOOS=linux GOARCH=amd64 go build -o genie ./cmd/genie
$ ls -la genie
-rwxr-xr-x 1 sks staff 42M Jul 1 12:00 genie
$ scp genie prod-server:/usr/local/bin/
# Done.

One binary. 42MB. No Python interpreter, no pip, no virtualenv, no requirements.txt, no “but it works on my machine.” We use goreleaser + cosign for signed release binaries with SBOM — the whole release pipeline is a GitHub Action.

“Just use Docker / PyInstaller / uv!” Python bundlers (PyInstaller, Nuitka) create bloated self-extracting archives, not true machine-code binaries. A minimal Python Docker container with LangChain and its dependencies easily exceeds 1GB. Our Go binary is 42MB, statically linked, no runtime needed.

Compare with Python agent frameworks:

$ pip install langchain langchain-openai langchain-community
# ... 127 packages installed ...
$ du -sh venv/
892M    venv/

For enterprise customers running in regulated environments, “install Python 3.11, pip, and 127 packages” is a non-starter. “Copy this binary” is a conversation.

3. Embed as a Library, Not a Service

This is the decision that sealed it.

Our agent runtime needs to run inside our orchestration platform (Aiden), which uses Temporal for durable workflow orchestration. Temporal is itself built in Go, with a first-class Go SDK. Each agent session is a Temporal workflow; the agent runtime runs as a workflow activity — not a sidecar, not a subprocess, but a Go package import:

import "github.com/stackgenhq/agentruntime/pkg/app"

// Inside Aiden's Temporal workflow activity:
func (a *Activity) RunAgent(ctx context.Context, req RunAgentRequest) error {
    agent, err := app.NewApplication(ctx, app.Config{
        AgentName: req.AgentName,
        Tools:     req.Tools,
        // ...
    })
    return agent.Run(ctx, req.Prompt)
}

Aiden imports the agent runtime as a Go module. Same process, same memory space, shared types. No serialization overhead, no network hops, no container orchestration for the agent itself. If the agent runtime were Python, we’d need gRPC bridges, subprocess management, or a sidecar container — each adding latency, deployment complexity, and an entire class of serialization bugs.

In Python, you’d either:

Run the agent as a subprocess (serialization overhead, process management complexity)
Import it as a package (possible, but Python’s package management makes version conflicts a nightmare across large projects)
Run it as a separate service (network overhead, deployment complexity)

Go modules give you versioned, reproducible dependencies with a single go.mod file. No conflicting transitive dependencies. No “this package needs numpy 1.24 but that one needs 1.26.”

4. Compile-Time Safety: Catch Breaking Changes Before Production

We have 52 Go packages with interfaces like:

//counterfeiter:generate . IToolProvider
type IToolProvider interface {
    GetTools(ctx context.Context, req GetToolsRequest) ([]Tool, error)
    ExecuteTool(ctx context.Context, req ExecuteToolRequest) (*ToolResult, error)
}

When we change this interface — say, adding a field to GetToolsRequest — the compiler immediately tells us every place that needs updating. Every. Single. One.

“But we use Pydantic and mypy!” Yes, Python has type-checking now, and Pydantic is excellent. But Pydantic is runtime validation — it adds CPU overhead on every model instantiation, slows startup, and doesn’t catch mismatches until the code actually executes. mypy is compile-time-ish, but it’s optional, doesn’t cover all libraries, and many teams don’t enforce it in CI. Go gives you type safety natively at compile time with zero runtime cost. If it compiles, the types are correct.

We generate test doubles from interfaces using counterfeiter:

$ go generate ./...
# Auto-generates fake implementations for every interface

These fakes are type-safe. If the interface changes, the fake won’t compile, and every test using it fails with a clear compiler error — not a runtime AttributeError.

5. Performance: When Every Tool Call Counts

Agents are chatty. A single task might involve 15-25 tool calls. Each call goes through our middleware stack:

Tool call → Panic recovery → Logger → Audit → Loop detection
→ Failure limits → HITL approval → PII redaction → Context enrichment
→ Timeout → Rate limit → Circuit breaker → Actual tool execution

That’s 12 middleware layers per tool call. In Go, these are function closures — each one costs a stack frame and a function call. Total middleware overhead: < 1ms.

“But LLM API calls take 3-10 seconds — who cares about 1ms?” Fair point. For a single agent, middleware latency is noise. The real performance win is memory footprint at scale.

A goroutine starts with a 2KB stack (grows as needed). Running 50 concurrent agents in Go consumes roughly 50-100MB of RAM. The same workload in Python — loading LangChain, Pydantic models, tokenizers, and the async event loop per agent — easily consumes 2-4GB. On the edge nodes and developer laptops we deploy to, that’s the difference between “it runs” and “it thrashes.”

Here’s a back-of-envelope comparison for 50 concurrent agents:

Resource	Go	Python (LangChain)
Per-agent base RAM	~2MB	~40-80MB
50 agents total	~100MB	~2-4GB
Goroutine/thread stack	2KB (grows)	8MB (fixed, pthread)
Startup time (cold)	<100ms	2-5s (imports)

When your agents run on customer infrastructure with memory limits, this matters.

The Trade-offs We Accepted

It’s not all sunshine. Here’s what we gave up:

1. Smaller AI/ML ecosystem

Python has HuggingFace, PyTorch, scikit-learn, and thousands of AI libraries. Go doesn’t.

How we handled it: We don’t run ML models. We call LLM APIs (OpenAI, Anthropic, Gemini, Bedrock) via HTTP. The heavy ML work happens on the provider’s infrastructure. Our Go code handles orchestration, governance, and tool execution — none of which need PyTorch.

It’s worth noting that while Go lacks ML modeling libraries, AI infrastructure is already heavily Go-based: Ollama (local model serving), Kubernetes (container orchestration), Docker, and Temporal (workflow engine) are all written in Go. We’re in good company — models run in Python/C++/Rust, but the infrastructure and orchestration layer is Go’s sweet spot.

2. Fewer agent framework examples

Every “build an agent” tutorial is in Python. Our team had to translate concepts, not copy code.

How we handled it: This was actually a feature. Translating forced us to understand the algorithms deeply rather than cargo-culting. When we implemented ReAcTree, we found 6 production bugs that paper-to-Python implementations would have hidden behind duck typing — like silently passing a raw string instead of a structured Message into the context window, which compiled fine in Python but was caught immediately by Go’s type system (more on this in a dedicated post).

3. Prototyping speed

Python is faster for throwaway experiments. Go requires more upfront structure.

How we handled it: We accepted slower early iterations in exchange for dramatically faster late-stage development. By month 3, our type-safe interfaces and auto-generated fakes meant new features had fewer bugs on first commit than our Python prototypes did after a week.

4. Hiring

More ML engineers know Python than Go.

How we handled it: We’re building infrastructure, not ML models. Systems engineers who know Go are exactly the profile we need. And Go’s simplicity means Python developers ramp up in 2-3 weeks.

5. Dynamic LLM JSON parsing

This used to be Go’s genuine pain point for AI work. LLMs hallucinate schemas, change types mid-response, and return subtly malformed JSON. Python’s json.loads() swallows arbitrary JSON effortlessly — keys can be strings, ints, nested objects, whatever. Go’s encoding/json forces you to unmarshal into strict structs. If the LLM returns an integer where you expected a string, Go’s unmarshaler throws a hard error.

How we handled it (2026 update): We still use strict structs for typed boundaries — that’s a feature, not a bug. But for the syntax problems models create (markdown fences, single-quoted keys, trailing commas, None/True/False, truncated objects), we no longer hand-roll strip-and-retry logic everywhere.

We centralized repair in github.com/kaptinlin/jsonrepair — a high-performance Go library ported from the widely used JavaScript jsonrepair project. Our shared decode helpers try strict encoding/json first; when that fails, they call jsonrepair.Repair() and unmarshal the result:

import "github.com/kaptinlin/jsonrepair"

repaired, err := jsonrepair.Repair("```json\n{name: 'John'}\n```")
// → `{"name": "John"}`

That one dependency covers dozens of call sites — navigation gate responses, assist-me form prefill, skill-card enrichment, notification parsing, and more — without each package reimplementing fence stripping.

We still layer other tactics on top where semantics matter:

Lenient parsing via json.RawMessage for tool call arguments — parse the outer structure strictly but defer argument parsing until the tool handler knows its schema
tidwall/gjson for path-based extraction when we need specific values without defining full structs
Retry-with-format — when JSON is structurally valid but fails schema validation, re-prompt the LLM with the error (repair can’t fix wrong shapes)

Python’s json-repair library solves a similar problem on the other side of the fence. In Go, kaptinlin/jsonrepair closes most of the gap for syntax repair; you still choose your own types at the end.

It’s more ceremony than json.loads() in a REPL. The upside is unchanged: when JSON parses into your struct, you know it’s structurally valid — no silent mismatches hiding until production.

When You Should NOT Use Go for Agents

Be honest about when Python is the right choice:

You’re prototyping and need to test ideas in hours, not days
You’re running local models with HuggingFace/PyTorch and need direct GPU access
Your team is ML-first and everyone thinks in NumPy
You’re building on LangChain/LangGraph and the ecosystem matters more than performance
You’re a solo developer and can’t invest in the upfront structure Go requires

Go is for when your agent system is infrastructure — long-running, multi-tenant, governed, deployed across environments. If that’s not your situation, Python is genuinely the better choice.

The Scorecard

Criterion	Go	Python
Concurrency model	✅ Goroutines + errgroup (native, colorless)	⚠️ asyncio (opt-in, colored functions)
Deployment	✅ Single 42MB binary, cross-compile	❌ Interpreter + venv + 127 packages
Library embedding	✅ Module import, same process	⚠️ Possible but fragile at scale
Type safety	✅ Compile-time interfaces, zero-cost	⚠️ Pydantic (runtime cost) + mypy (optional)
Memory footprint (50 agents)	✅ ~100MB	❌ ~2-4GB
Middleware performance	✅ < 1ms for 12 layers	⚠️ Higher overhead at scale
Dynamic JSON parsing	⚠️ Strict structs; use `jsonrepair` for LLM syntax	✅ `json.loads()` handles anything
AI/ML ecosystem	❌ Minimal (but strong AI infra)	✅ Dominant
Prototyping speed	⚠️ Slower start	✅ Rapid iteration
Hiring pool (ML)	⚠️ Smaller	✅ Larger

What’s Next

In the next post, I’ll cover why we chose TOML over YAML and PKL for agent configuration — and why the config format you pick matters more than you think.

If you’re building agents in Go, I’d love to hear about your experience. Find me on GitHub or LinkedIn.

🚀 We’re building AI-powered SRE at StackGen. If you’re tired of 3 AM pages and want AI agents that triage incidents, run diagnostics, and draft RCA reports — check out ai.stackgen.com and try our new SRE offering.