Most companies have a harness problem
Most companies do not really have an AI model problem. What they have is an AI harness problem. They hand employees a powerful chat interface, usage grows, experimentation spreads, and then the bill shows up. That is usually the moment a company realizes access and architecture are not the same thing.
Signing up for ChatGPT, Claude, Cursor, Gemini, or any other lab surface can be a great way to start. It is fast, it is useful, and it gets people moving. But if every user and every workflow hits premium inference by default, the company has not built an AI strategy. It has built a spending pattern.
The missing piece is an internal harness: a company-controlled chat and workflow interface that routes each task by job type, risk, benchmark performance, and cost per accepted task.
End users should not manage model economics
Most employees will never know which model should handle which task. They are not going to think about input-token cost, output-token cost, tokenizer differences, context length, tool-call overhead, prompt caching, reasoning effort, hidden system prompts, retry rates, schema adherence, model regressions, latency ceilings, review burden, or escalation thresholds.
And they should not have to. An employee should just be able to ask for help. The system should be the thing that decides whether the task belongs with a small model, a mid-size one, an open-weight model, a frontier model, a deterministic tool, or a human reviewer.
That is the whole case for an internal AI harness. The user picks the job. The system picks the model.
Usage scales faster than governance
AI usage can get expensive fast once employees have powerful tools in hand, especially when agentic workflows and coding assistants enter the picture. The bill can move even when the pricing page does not.
Plenty of things move it. The default model changes. Users delegate more work. The model starts writing longer answers. The tokenizer changes. Tools pile on context. Agent loops become routine. The harness itself changes. Any one of these can push the number up, and they tend to stack.
Governance has to move upstream. The organization needs to understand the economics of its tasks before premium inference quietly becomes the default path for every request.
Cost per accepted task is the right unit
A company does not really buy tokens. It buys completed work. So cost per accepted task is the right economic unit: model cost plus tool cost plus infrastructure cost plus retry cost plus review cost plus failure cost, divided by the outputs you accept.
A model can be cheap per token and expensive per task if it needs extra retries, runs long, botches formatting, misses intent, or has to be corrected by a person. Another can be pricey per token and still worth it if it solves a high-value task correctly on the first pass and cuts the review burden.
Tool use adds another layer. Tool definitions, retrieval results, prior context, retries, and verbose outputs can all turn a simple-looking agent workflow into a much bigger economic event. A mature harness measures the whole task, not just the model call.
Classify the job before choosing the model
The first job of an internal harness is task classification. A prompt like help with this is hopelessly ambiguous. The harness has to infer the actual job: drafting, rewriting, summarizing, extracting, classifying, calculating, coding, researching, analyzing, planning, running tools, or high-risk decision support.
Then it should classify the risk: low-risk internal productivity, sensitive internal data, customer-facing output, financial analysis, legal or compliance-adjacent work, regulated data, external publication, code execution, production-system access, or high-impact decision support.
Only then should it route. A routine email rewrite can go to a small model. Invoice extraction can go to a small model with schema validation bolted on. A financial calculation check needs a benchmark-cleared model plus deterministic validation. A board recommendation needs frontier capability and a human reviewer. Agentic research needs tool budgets and token budgets.
The harness is the economic control plane
A good internal harness needs task classification, risk classification, model routing, token budgeting, validation, escalation, observability, continuous evals, and governance. None of that is an engineering nice-to-have. These are the mechanisms that keep enterprise AI economically sane.
Without a harness, every user interaction is a direct line to vendor defaults. With one, the organization decides which model handles which job, which tasks may use frontier models, which require review, which tools are available, how much context is allowed, when a workflow should stop, and what counts as an accepted output.
The companies that win this will not be the ones that blindly standardize on the newest model. They will be the ones that understand the work, measure the outcomes, and route each task to the right model at the right cost. Access is not architecture.