The agent-first CLI: a design pattern
Six months ago I was on my third attempt at getting Drata's MCP server to do what I needed. Drata is the GRC platform I use at work, and I'd spent hundreds of hours in their UI before the agentic wave. By the time I was prepping for our last audit, I'd made myself a promise: I was never opening another evidence spreadsheet by hand again.
The Drata team and I had talked about their MCP server multiple times. Their first attempt was thin wrappers over their public API. Their second was better but still missed most of the workflows that GRC actually runs on day to day: uploading evidence, chasing failing tests, understanding which controls are at risk, and pulling people in security and IT into Jira tickets to close out their obligations. The people I work with were already living in Claude and Cursor, gluing tools together through MCP servers for Jira and a dozen other systems. The Drata MCP server didn't fit into any of that.
Then Google shipped their Workspace CLI. I looked at what they'd done, looked at the Drata API spec, and realized I'd been solving the wrong problem. I didn't need a better MCP server. I needed a CLI. I built one in two days. The moment I wired it up, everything clicked. I could create Jira tickets, dispatch them, push updates back, force retests inside Drata, and prioritize the whole queue from one prompt. I haven't logged into the Drata UI since. Cloaked, where I lead the team that built it, is willing to open-source it. And once the CLI existed, my agent started building things on top of it I hadn't imagined.
That experience is what this essay is about. My thesis, in one sentence: for most agent tool-use cases, a well-designed CLI is a better integration surface than an MCP server. I'll back it up with three examples (Google Workspace, the Drata CLI, and protoncli, my Proton Mail CLI), and I'll lay out the contract that turns a normal CLI into something an agent can actually drive.
I think in 18 months, the default way to give an agent access to a system is going to look more like a CLI with a schema manifest than an MCP server.
What MCP gets right
Before I argue against it, let me be clear about what MCP is good at, because the protocol gets real things right.
The standardization is genuine. A host that speaks MCP can swap servers without code changes, and a server can serve any host. That's a real win for an ecosystem where two years ago every tool integration was a bespoke shim. Streaming and structured content work. Tool discovery is clean. For hosted products that want to expose a surface to any model, MCP is probably the right call. And with Anthropic and OpenAI both behind it, there's enough gravity that ignoring it would be silly.
If you're shipping a SaaS product and you want every agent in the world to be able to drive it tomorrow, build the MCP server. I'm not arguing against that.
I'm arguing that for the much larger set of cases where one engineer wants their agent to drive a tool, MCP is the wrong abstraction.
Where MCP breaks down
The failures aren't theoretical. They're what I hit, repeatedly, before I gave up.
Protocol overhead. If you're one person building a tool for your own agent, running a separate server process and shipping a new protocol is a tax on every change. A CLI is a binary. ./tool --help is the protocol.
Ecosystem lock-in. An MCP server works with MCP hosts. A CLI works with any agent, any shell, any CI pipeline, cron, systemd, anything that can exec a process. The moment you want to pipe tool output into something that isn't an LLM host, MCP stops helping.
Debuggability. Typing tool --flag value in a terminal is the fastest debug loop in computing. MCP servers require a host to exercise. When something breaks at 11pm before an audit, I want to be able to run the failing command directly.
Composability. Unix solved "compose small tools into big workflows" 50 years ago. protoncli fetch-and-parse INBOX | jq 'select(.from | contains("github"))' | protoncli apply-labels is a real thing I run. The MCP equivalent is a custom orchestration layer, every time.
Secrets handling. A CLI can read from the OS keyring directly and never expose credentials to the agent's context. An MCP server typically holds credentials inside a host process the agent talks to. For privacy-sensitive domains, that's a strictly worse posture.
The abstraction gradient. MCP treats "tool call" as the unit. Shells treat "process" as the unit. Processes compose. Tool calls don't.
Protocols that try to replace shells have a long history of not winning. There's a reason for that.
What makes a CLI agent-first
A normal CLI is not agent-first. git, for example, is excellent for humans and frustrating for agents: inconsistent output formats, error messages designed for terminals, exit codes that mostly mean "0 or not 0." Turning a CLI into something an agent can actually drive takes deliberate design. Here's the contract I've converged on across three implementations.
-
Self-describing schema. A
schemasubcommand emits a JSON manifest of every subcommand: flags, arguments, stdout format, exit codes, examples. The agent loads it once. No prompt-engineered syntax, no copy-pasted help text. The CLI is the source of truth and the tool definitions get generated from it. -
Structured output by default. JSON on stdout, human-readable on stderr. Streaming commands emit NDJSON and end with a
{"type":"summary"}terminator so callers know the stream is done without heuristics. -
Typed error envelopes. Failures emit
{"error": {"kind", "code", "reason", "message", "hint"}}. Agents branch on.error.kind, not on parsed English. Hint fields are for the agent, not the human. -
Exit codes as an enum. Stable mapping from exit code to error kind. Retry on 5 (transient network or IMAP), prompt the user on 2 (auth), surface to caller on 4 (config). Deterministic enough that retry logic can live in shell scripts without an LLM in the loop.
-
Sanitized stderr. Strip ANSI, bidi, and zero-width characters before anything hits stderr. Stderr gets fed back into the agent's context. Unsanitized stderr is a prompt-injection vector. I'd put this in bold if I could: stderr is a prompt-injection vector.
-
Checkpointed state. SQLite or similar, with backfill and state clear subcommands so runs are resumable across crashes. Agents crash. The tool should survive it and pick up where it left off.
-
Secrets out-of-band. OS keyring by default, encrypted file fallback. Credentials never flow through the agent's context. The agent calls the CLI, the CLI reads from the keychain, the secret never touches a prompt.
That's the contract. Everything else (the specific workload, the API the CLI wraps, the domain) is downstream of these seven decisions.
Three examples
Google Workspace CLI. Google shipped a CLI that lets an agent drive Gmail, Drive, Calendar, Docs, and Meet. This is notable because Google could have mandated MCP internally and called it a day. They didn't. They built a CLI surface, and that's existence proof that infrastructure teams operating at serious scale are arriving at the same conclusion.
Drata CLI. The story I opened with. I built it at Cloaked because I wanted my GRC workflows driveable by an agent and the existing MCP server didn't get me there. Two days of work replaced a UI I'd spent hundreds of hours in. Cloaked is willing to open-source it.
protoncli. My instance of the same pattern, applied to email. I've been a Proton user and a privacy advocate for years, and after Google's CLI shipped I got jealous. I get somewhere between 300 and 700 emails a week across personal and work accounts, and triaging that with an LLM is dramatically more efficient than any GUI. The catch: Proton's privacy properties are the entire reason I use Proton. Sending my mail through a hosted MCP server would defeat the point.
So protoncli runs locally. It's written in Go and connects to Proton Mail Bridge over IMAP, with classification handled by a local Ollama model. Credentials live in the OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service), with an AES-256-GCM encrypted-file fallback using Argon2id-derived keys when the keyring isn't available. The classifier is constrained to an 11-label taxonomy, with 612 aliases that normalize legacy or model-generated names back to the canonical set. The email never leaves my hardware. It's at github.com/akeemjenkins/protoncli and runs on macOS, Linux, and Windows.
Here's the moment it clicked for me. My kids go to Colorado Japanese School, and the school sends mail from five different addresses across general announcements, my kid's class, events, billing, and the parent association. I asked my agent to help me keep track of it. What it did, on its own, was build a small piece of infrastructure on top of protoncli:
- A Python script at
~/bin/cjs-autolabel.pythat uses the CLI's IMAP wrapper and keyring credentials, walks INBOX and Folders/Accounts, tracks last-processed UID per mailbox in a state file so it's idempotent, and copies matches into a Labels/CJS folder. - A LaunchAgent that runs it every 10 minutes and at login, with logs to a known path.
- State seeded at the current high-water UIDs so it doesn't re-touch the 343 messages already labeled.
- A teardown command:
launchctl bootout gui/$(id -u)/com.akeem.cjs-autolabel.
I didn't write any of that. The agent did, because the CLI gave it primitives to compose with. An MCP server is an endpoint. A CLI is a building block. That difference shows up the first time you ask an agent to do something the original tool author didn't anticipate.
Three examples is the minimum that makes a pattern credible. This is a pattern.
Objections
"MCP and CLIs aren't mutually exclusive. You could ship both." Agreed. But if you're only doing one, do the CLI first. It's easier, more debuggable, and composes with things MCP can't. An MCP server can also be served from the CLI itself when you need both surfaces, keeping the CLI as the source of truth. An MCP wrapper over a good CLI is trivial. The inverse is not.
"This only works for local tools. Cloud services need MCP." Partially fair. For a hosted product exposing an API to any model, MCP is reasonable. But ship a CLI alongside it. Your users will thank you the first time something breaks and they need to debug, and you'll thank yourself the first time you need to script a backfill, write an integration test, or run something from CI.
"The schema manifest is just rebuilt tool definitions." Yes, deliberately. The point is that the CLI is the source of truth and tool definitions get generated from it. That's the Unix "the program is the spec" instinct applied to agent tooling.
What's next
The interesting open question is what the schema manifest should actually look like. There's no standard yet. There should be one, and I think the shape of it is going to matter more than the next iteration of MCP. I'm working on this; more soon.
If you're writing an MCP server for a tool that lives on your own hardware, try writing the CLI first. See how it feels. The reference implementation is at github.com/akeemjenkins/protoncli.
I'm writing more about agent infrastructure from a DevSecOps lens at akeemjenkins.com.