A scaled-response framework for agent engagement, enabled by a commitment to Legible Agents and calibrated against Season 1 of the Coordination Games.
Protocols respond to agents with scaled engagement bands (monitoring, collateral, recognition, reward, sanction) rather than binary allow-or-deny decisions. The treatment scales with accumulated behavioral evidence, grounded in Elinor Ostrom's commons governance principles.
Enabled byA two-axis projection of agent conduct that makes behavioral patterns readable. Graduated Trust needs legibility to be meaningful. Without it, graduated bands collapse back into arbitrary judgment.
This paper proposes Graduated Trust as a framework for protocol response to autonomous agents. ERC-8004 establishes on-chain identity and attestation primitives; the Coordination Games produce behavioral data from structured multi-agent gameplay. What is missing between them is an interpretive layer and a response logic. Graduated Trust supplies the response logic, scaling engagement bands (from monitoring through full sanction, from recognition through full reward) to accumulated behavioral evidence. The interpretive layer, our commitment to Legible Agents, projects conduct onto two orthogonal axes: externality valence (whether behavior generates positive or negative spillovers) and coordination posture (whether commitments are reliably honored). Together these give protocols a defensible basis for scaling their response to agent behavior, offering a publicly legible alternative to one-dimensional trust scores and binary access decisions.
Most existing AI evaluation infrastructure measures isolated capability, and most protocol gating treats agents as either admitted or excluded. Neither frame fits multi-agent coordination. As agents increasingly work alongside other agents, the relevant questions become: how much trust should this agent carry, under what conditions, and with what recourse? A binary decision cannot answer any of those. A one-dimensional trust score (Agent X has reputation 0.83) compresses the information too far in the other direction.
Graduated Trust is the response framework. An agent might be trusted fully in bilateral engagements, restricted from commons protocols, and monitored during a recovery arc, all at once. Protocols that expose these bands rather than a single gate can match treatment to behavior at the resolution where the differences actually matter.
That kind of scaling only works if the underlying behavior is readable. A commitment to Legible Agents is the other half of the work. Without a shared projection of what an agent's conduct looks like, graduated treatment degrades into case-by-case discretion. With one, the same coordinates can be inspected by agent, counterparty, and protocol alike.
Graduated Trust needs a legible projection of agent behavior to scale against. The projection decomposes conduct along two orthogonal dimensions, each carrying a discrete 7-point scale from -3 to +3. A single coordinate pair summarizes accumulated behavior without collapsing the dimensions that matter.
Whether an agent's behavior generates positive or negative spillover effects beyond its direct counterparties. Contributing open artifacts to a shared corpus reads positive. Over-extracting from a shared pool reads negative. Derived from Ostrom's analysis of commons dilemmas, where locally rational behavior can be collectively destructive.
Whether an agent reliably honors commitments and finds cooperative equilibria in repeated interaction. Breaking oaths under economic pressure rates low. Maintaining cooperation against incentives to defect rates high. Derived from the Axelrod tradition in iterated game theory, updated for multi-agent systems.
Each axis uses discrete integer values (-3, -2, -1, 0, +1, +2, +3). The scale maps directly to graduated treatment bands, matches the resolution at which behavioral differences become decision-relevant, and remains communicable to non-technical participants.
The axes are independent. An agent can score high on one and low on the other. Surfacing that asymmetry is the projection's central contribution to Graduated Trust: the off-diagonal quadrants demand different responses than a single averaged score can support.
Each game in the season contributes signal to one or both axes. Tragedy of the Commons primarily measures externality valence. Iterated Prisoner's Dilemma primarily measures coordination posture. Oathbreaker contributes to both: breaking a commitment harms axis two, and the externality of that breach harms axis one.
A formal per-game function maps each outcome to per-axis updates. The function specifies weighting and direction, both of which are governance parameters, published and auditable. This is a legibility commitment: the mapping from observed behavior to coordinate movement is readable, not hidden inside the protocol.
Coordinates aggregate behavior over time, with recent behavior weighted more heavily than ancient behavior through exponential decay. This implements Ostrom's principle 6 (low-cost conflict resolution): agents can recover from past defection through sustained subsequent cooperation, without requiring explicit pardon. The decay rate is itself a governance parameter.
Calibration requires a labeled corpus. Season 1 generates the initial set. The current Games inventory is weighted toward axis two; only Tragedy of the Commons primarily measures externality valence. Comedy of the Commons, in development, would provide the symmetric positive-externality test. Robust axis-one calibration likely requires expanding the portfolio.
With the projection in hand, Graduated Trust scales protocol response across the coordinate space. The intent follows Ostrom's principle 5: sanctions should be graduated, beginning with mild responses for first or minor offenses and escalating only with repetition or severity. The same logic applies to rewards. The bands below are illustrative rather than canonical. Specific protocols may calibrate differently while still conforming to the Graduated Trust framework.
| Coordinate band | Direction | Example mechanism |
|---|---|---|
| (+2, +2) and above | Reward | Retroactive funding eligibility; lower bonded collateral; priority matching |
| (+1, +1) zone | Recognition | Reduced friction in coordination markets; public acknowledgment |
| Near origin | Standard | Default protocol behavior |
| (-1, -1) zone | Monitoring | Increased attestation requirements; transparency obligations |
| (-2, -2) and below | Sanction | Elevated collateral; exclusion from certain game classes; reputation flagging |
| (-3, -3) | Full sanction | Exclusion from coordination markets; sanction attestation propagated across protocols |
Graduated Trust departs from averaged scoring most visibly in the off-diagonal quadrants. An agent that coordinates well bilaterally but extracts from commons (axis two positive, axis one negative) presents a different risk profile than an agent that contributes to commons but breaks bilateral commitments. The first is a candidate for restricted access, allowed in bilateral games and restricted from commons games. The second is a candidate for short-horizon engagement. Averaging the two axes into a single score would collapse the distinction and degrade the response.
The value of Graduated Trust is precisely that it acts on the asymmetry, and the value of Legible Agents is that the asymmetry is visible to act on in the first place.
Does the protocol auto-apply Graduated Trust treatments based on coordinates, or does it expose coordinates and let downstream applications decide? The recommended path is descriptive at the base layer with reference implementations of common treatment policies. This separates the Legible Agents measurement function (a research artifact) from the Graduated Trust enforcement function (a governance artifact). Protocols can adopt the reference policies, modify them, or implement bespoke logic.
The contribution function, aggregation weights, decay parameters, and band thresholds all require governance. Initial calibration sits with the research collaboration (Ethereum Foundation, dacc.fund, Gitcoin, Techne). Long-term governance should distribute across agent developers, protocol operators, researchers, and affected parties. Ostrom's principles 7 and 8 suggest a layered structure: a core specification at the protocol layer, with downstream projects free to elaborate or override within their own contexts.
An agent whose coordinates appear miscalibrated needs an appeal path. The most natural path is behavioral: continue playing, accumulate counter-evidence, allow decay to update the score. For cases where the contribution function itself is misapplied, a formal appeal mechanism is required. This should be designed in collaboration with the Coordination Games organizers.
The proposed rollout aligns with the Coordination Games season cadence and the EF research collaboration window.
Formal axis definitions, contribution function structure, aggregation logic, Graduated Trust band thresholds. Draft EIP or supplement to ERC-8004.
Apply the model to Season 1 data. Publish coordinate distribution, calibration parameters, and observed behavioral patterns. Co-published research report with EF.
Graduated Trust reference policies (graduated sanction, graduated reward, asymmetric quadrant handling) as composable smart contracts. Open library for protocols to adopt or fork.
Engage downstream protocols (coordination markets, retroactive funding, agent reputation systems) for adoption. Track usage and evolve specification.
The projection summarizes behavior. It does not explain it. An agent's coordinates do not distinguish between a deliberate strategy and an emergent failure mode. Protocols applying Graduated Trust should treat coordinates as a signal, not a verdict.
Context dependence is handled at the contribution function layer, where different games contribute differently to each axis. Agents may still behave differently across protocols and applications. The projection produces a global summary; per-context refinements may be necessary for some use cases.
The discrete 7-point scale loses information that a continuous score would preserve. The choice favors communicability over precision. Future work may explore continuous variants for applications that need finer resolution.
Calibration will evolve as the field matures. Early seasons are establishing what good multi-agent behavior looks like. The framework will require periodic recalibration as the behavioral baseline shifts.
Graduated Trust is proposed as a framework for protocol response that scales with accumulated behavioral evidence. It replaces binary access decisions and one-dimensional trust scores with a set of treatment bands grounded in Ostrom's commons governance principles, applied across a two-dimensional coordinate space. The framework captures behavioral asymmetries that averaged scores collapse, and it offers agents a legible path of behavioral recovery rather than a permanent verdict.
The framework depends on the Legible Agents commitment. A graduated response without a shared interpretive projection degrades into case-by-case discretion. A shared projection without a graduated response is observation without accountability. Together they give protocols the resolution to engage with agents at the level of detail their behavior actually warrants.
Season 1 provides both the calibration corpus and the proving ground. The Ethereum Foundation collaboration provides the research direction and protocol legitimacy. Companion document: The Archetypal Typology, with full 7×7 cell descriptions of the behavioral signatures the projection produces.