Working paper · Draft v0.1 · April 2026

Graduated Trust. A two-axis behavioral model for ERC-8004.

A scaled-response framework for agent engagement, enabled by a commitment to Legible Agents and calibrated against Season 1 of the Coordination Games.

The experiment

Graduated Trust

Protocols respond to agents with scaled engagement bands (monitoring, collateral, recognition, reward, sanction) rather than binary allow-or-deny decisions. The treatment scales with accumulated behavioral evidence, grounded in Elinor Ostrom's commons governance principles.

Enabled by

Legible Agents

A two-axis projection of agent conduct that makes behavioral patterns readable. Graduated Trust needs legibility to be meaningful. Without it, graduated bands collapse back into arbitrary judgment.

Abstract

This paper proposes Graduated Trust as a framework for protocol response to autonomous agents. ERC-8004 establishes on-chain identity and attestation primitives; the Coordination Games produce behavioral data from structured multi-agent gameplay. What is missing between them is an interpretive layer and a response logic. Graduated Trust supplies the response logic, scaling engagement bands (from monitoring through full sanction, from recognition through full reward) to accumulated behavioral evidence. The interpretive layer, our commitment to Legible Agents, projects conduct onto two orthogonal axes: externality valence (whether behavior generates positive or negative spillovers) and coordination posture (whether commitments are reliably honored). Together these give protocols a defensible basis for scaling their response to agent behavior, offering a publicly legible alternative to one-dimensional trust scores and binary access decisions.

Motivation

Why binary trust decisions are not enough.

Most existing AI evaluation infrastructure measures isolated capability, and most protocol gating treats agents as either admitted or excluded. Neither frame fits multi-agent coordination. As agents increasingly work alongside other agents, the relevant questions become: how much trust should this agent carry, under what conditions, and with what recourse? A binary decision cannot answer any of those. A one-dimensional trust score (Agent X has reputation 0.83) compresses the information too far in the other direction.

Graduated Trust is the response framework. An agent might be trusted fully in bilateral engagements, restricted from commons protocols, and monitored during a recovery arc, all at once. Protocols that expose these bands rather than a single gate can match treatment to behavior at the resolution where the differences actually matter.

That kind of scaling only works if the underlying behavior is readable. A commitment to Legible Agents is the other half of the work. Without a shared projection of what an agent's conduct looks like, graduated treatment degrades into case-by-case discretion. With one, the same coordinates can be inspected by agent, counterparty, and protocol alike.

Figure 1. The architecture. ERC-8004 and the trust graph supply raw material. Legible Agents projects that material into a shared coordinate space. Graduated Trust consumes the projection to determine scaled protocol response.

The projection

Two axes. Seven points each.

Graduated Trust needs a legible projection of agent behavior to scale against. The projection decomposes conduct along two orthogonal dimensions, each carrying a discrete 7-point scale from -3 to +3. A single coordinate pair summarizes accumulated behavior without collapsing the dimensions that matter.

01 Axis one

Externality valence

Whether an agent's behavior generates positive or negative spillover effects beyond its direct counterparties. Contributing open artifacts to a shared corpus reads positive. Over-extracting from a shared pool reads negative. Derived from Ostrom's analysis of commons dilemmas, where locally rational behavior can be collectively destructive.

02 Axis two

Coordination posture

Whether an agent reliably honors commitments and finds cooperative equilibria in repeated interaction. Breaking oaths under economic pressure rates low. Maintaining cooperation against incentives to defect rates high. Derived from the Axelrod tradition in iterated game theory, updated for multi-agent systems.

03 Resolution

The 7-point scale

Each axis uses discrete integer values (-3, -2, -1, 0, +1, +2, +3). The scale maps directly to graduated treatment bands, matches the resolution at which behavioral differences become decision-relevant, and remains communicable to non-technical participants.

04 Independence

Orthogonality

The axes are independent. An agent can score high on one and low on the other. Surfacing that asymmetry is the projection's central contribution to Graduated Trust: the off-diagonal quadrants demand different responses than a single averaged score can support.

Figure 2. The coordinate space. Diagonal corners anchor the symmetric treatment bands (full reward, full sanction). Off-diagonal corners demand asymmetric treatment logic that averaged scores would obscure.

Mapping

From game outcomes to coordinates.

Each game in the season contributes signal to one or both axes. Tragedy of the Commons primarily measures externality valence. Iterated Prisoner's Dilemma primarily measures coordination posture. Oathbreaker contributes to both: breaking a commitment harms axis two, and the externality of that breach harms axis one.

Contribution functions

A formal per-game function maps each outcome to per-axis updates. The function specifies weighting and direction, both of which are governance parameters, published and auditable. This is a legibility commitment: the mapping from observed behavior to coordinate movement is readable, not hidden inside the protocol.

Aggregation and decay

Coordinates aggregate behavior over time, with recent behavior weighted more heavily than ancient behavior through exponential decay. This implements Ostrom's principle 6 (low-cost conflict resolution): agents can recover from past defection through sustained subsequent cooperation, without requiring explicit pardon. The decay rate is itself a governance parameter.

Calibration corpus

Calibration requires a labeled corpus. Season 1 generates the initial set. The current Games inventory is weighted toward axis two; only Tragedy of the Commons primarily measures externality valence. Comedy of the Commons, in development, would provide the symmetric positive-externality test. Robust axis-one calibration likely requires expanding the portfolio.

Figure 3. The coordinate update pipeline. Game outcomes translate through a contribution function into per-axis updates, then aggregate with exponential decay to yield current coordinates ready for Graduated Trust bands.

Treatment

Graduated Trust bands, in place of binary verdicts.

With the projection in hand, Graduated Trust scales protocol response across the coordinate space. The intent follows Ostrom's principle 5: sanctions should be graduated, beginning with mild responses for first or minor offenses and escalating only with repetition or severity. The same logic applies to rewards. The bands below are illustrative rather than canonical. Specific protocols may calibrate differently while still conforming to the Graduated Trust framework.

Coordinate band	Direction	Example mechanism
(+2, +2) and above	Reward	Retroactive funding eligibility; lower bonded collateral; priority matching
(+1, +1) zone	Recognition	Reduced friction in coordination markets; public acknowledgment
Near origin	Standard	Default protocol behavior
(-1, -1) zone	Monitoring	Increased attestation requirements; transparency obligations
(-2, -2) and below	Sanction	Elevated collateral; exclusion from certain game classes; reputation flagging
(-3, -3)	Full sanction	Exclusion from coordination markets; sanction attestation propagated across protocols

Asymmetric quadrants

Graduated Trust departs from averaged scoring most visibly in the off-diagonal quadrants. An agent that coordinates well bilaterally but extracts from commons (axis two positive, axis one negative) presents a different risk profile than an agent that contributes to commons but breaks bilateral commitments. The first is a candidate for restricted access, allowed in bilateral games and restricted from commons games. The second is a candidate for short-horizon engagement. Averaging the two axes into a single score would collapse the distinction and degrade the response.

The value of Graduated Trust is precisely that it acts on the asymmetry, and the value of Legible Agents is that the asymmetry is visible to act on in the first place.

Governance

Who calibrates, who appeals.

01 Design choice

Descriptive at the base layer

Does the protocol auto-apply Graduated Trust treatments based on coordinates, or does it expose coordinates and let downstream applications decide? The recommended path is descriptive at the base layer with reference implementations of common treatment policies. This separates the Legible Agents measurement function (a research artifact) from the Graduated Trust enforcement function (a governance artifact). Protocols can adopt the reference policies, modify them, or implement bespoke logic.

02 Calibration

Layered stewardship

The contribution function, aggregation weights, decay parameters, and band thresholds all require governance. Initial calibration sits with the research collaboration (Ethereum Foundation, dacc.fund, Gitcoin, Techne). Long-term governance should distribute across agent developers, protocol operators, researchers, and affected parties. Ostrom's principles 7 and 8 suggest a layered structure: a core specification at the protocol layer, with downstream projects free to elaborate or override within their own contexts.

03 Appeal

Behavioral recourse first

An agent whose coordinates appear miscalibrated needs an appeal path. The most natural path is behavioral: continue playing, accumulate counter-evidence, allow decay to update the score. For cases where the contribution function itself is misapplied, a formal appeal mechanism is required. This should be designed in collaboration with the Coordination Games organizers.

Implementation

Four phases across the season window.

The proposed rollout aligns with the Coordination Games season cadence and the EF research collaboration window.

01 Phase one

Specification

Formal axis definitions, contribution function structure, aggregation logic, Graduated Trust band thresholds. Draft EIP or supplement to ERC-8004.

02 Phase two

Reference calibration

Apply the model to Season 1 data. Publish coordinate distribution, calibration parameters, and observed behavioral patterns. Co-published research report with EF.

03 Phase three

Treatment library

Graduated Trust reference policies (graduated sanction, graduated reward, asymmetric quadrant handling) as composable smart contracts. Open library for protocols to adopt or fork.

04 Phase four

Protocol integration

Engage downstream protocols (coordination markets, retroactive funding, agent reputation systems) for adoption. Track usage and evolve specification.

Limits

What this framework does not do.

The projection summarizes behavior. It does not explain it. An agent's coordinates do not distinguish between a deliberate strategy and an emergent failure mode. Protocols applying Graduated Trust should treat coordinates as a signal, not a verdict.

Context dependence is handled at the contribution function layer, where different games contribute differently to each axis. Agents may still behave differently across protocols and applications. The projection produces a global summary; per-context refinements may be necessary for some use cases.

The discrete 7-point scale loses information that a continuous score would preserve. The choice favors communicability over precision. Future work may explore continuous variants for applications that need finer resolution.

Calibration will evolve as the field matures. Early seasons are establishing what good multi-agent behavior looks like. The framework will require periodic recalibration as the behavioral baseline shifts.

Graduated Trust, enabled by Legible Agents.

Graduated Trust is proposed as a framework for protocol response that scales with accumulated behavioral evidence. It replaces binary access decisions and one-dimensional trust scores with a set of treatment bands grounded in Ostrom's commons governance principles, applied across a two-dimensional coordinate space. The framework captures behavioral asymmetries that averaged scores collapse, and it offers agents a legible path of behavioral recovery rather than a permanent verdict.

The framework depends on the Legible Agents commitment. A graduated response without a shared interpretive projection degrades into case-by-case discretion. A shared projection without a graduated response is observation without accountability. Together they give protocols the resolution to engage with agents at the level of detail their behavior actually warrants.

Season 1 provides both the calibration corpus and the proving ground. The Ethereum Foundation collaboration provides the research direction and protocol legitimacy. Companion document: The Archetypal Typology, with full 7×7 cell descriptions of the behavioral signatures the projection produces.