Skip to content

Benchmarks

Two independent evaluations prove GCF's superiority: a comprehension accuracy test (do LLMs understand it?) and a token efficiency test (how many tokens does it cost?).

LLM Comprehension Accuracy

Setup: 500 symbols, 200 edges. Same payload encoded in GCF, TOON, and JSON. Six structured extraction questions sent to an LLM. Deterministic ground truth, no LLM judge.

FormatAccuracyTokensvs JSON
GCF100% (6/6)11,09079% fewer
TOON100% (6/6)16,37869% fewer
JSON66.7% (4/6)53,341baseline

What JSON got wrong

QuestionExpectedJSON answered
"How many symbols?"500320
"How many targets (distance 0)?"166240

At 500 records, JSON's field-name repetition creates enough noise that the model loses count. It's not hallucinating; it's drowning in structural tokens that carry no semantic content.

What GCF got right

All six questions answered correctly:

  1. Symbol count: 500 ✓
  2. Edge count: 200 ✓
  3. Highest-scored symbol name ✓
  4. Kind of highest-scored symbol ✓
  5. Target count (distance 0): 166 ✓
  6. All unique edge types (alphabetical) ✓

GCF achieves this at 32% fewer tokens than TOON, which also scored 100%.

Reproduce

bash
git clone https://github.com/blackwell-systems/gcf-go
cd gcf-go/eval && GOWORK=off go test -run TestComprehension -v -timeout 15m

Eval source: gcf-go/eval


Token Efficiency (TOON's Own Benchmark)

We inserted GCF into TOON's token efficiency benchmark. Their datasets, their tokenizer (gpt-tokenizer, o200k_base), their methodology. The only change: one additional formatter.

Mixed-Structure Track

Datasets with nested or semi-uniform structures. This is where most real-world data lives.

Semi-uniform event logs (2000 records):
  TOON   ████████████████████████████████████████████████████  154,032
  GCF    ████████████████████████████████████░░░░░░░░░░░░░░░  107,269  ◀ 44% smaller

E-commerce orders (500 orders, nested items):
  TOON   ████████████████████████████████████████████████████   73,246
  GCF    ████████████████████████████████████████████░░░░░░░   61,592  ◀ 19% smaller

Mixed-structure total:
  TOON   ████████████████████████████████████████████████████  227,896
  GCF    █████████████████████████████████████░░░░░░░░░░░░░░  169,554  ◀ 34% smaller

Flat-Only Track

Pure tabular data. TOON's claimed sweet spot. GCF still wins.

Employee records (2000 rows):
  TOON   ████████████████████████████████████████████████████   49,966
  GCF    ██████████████████████████████████████████████████░░   49,054  ◀ 2% smaller

Analytics time-series (365 days):
  TOON   ████████████████████████████████████████████████████    9,127
  GCF    ████████████████████████████████████████████████░░░░    8,397  ◀ 8% smaller

Flat-only total:
  TOON   ████████████████████████████████████████████████████   67,837
  GCF    ██████████████████████████████████████████████████░░   66,026  ◀ 3% smaller

Per-Dataset Breakdown

DatasetStructureGCFTOONCSVJSON
Event logsSemi-uniform107,269154,032n/a181,141
E-commerceNested61,59273,246n/a109,574
Nested configDeep693618n/a905
EmployeesFlat49,05449,96647,137127,050
AnalyticsFlat8,3979,1278,39522,257
GitHub reposFlat8,5758,7448,51215,144

TOON's only win: deeply nested configuration. A 75-token difference on a 618-token payload.

Why GCF wins on semi-uniform data

TOON's tabular format requires all rows to have identical fields. When data is semi-uniform (event logs where 50% have nested error objects), TOON falls back to its less efficient nested encoding for the entire array.

GCF handles semi-uniformity natively: primitive fields encode positionally, nested fields attach inline only when present. No format-level "mode switch" is required.

Reproduce

bash
git clone https://github.com/blackwell-systems/toon.git
cd toon && git checkout gcf-comparison
cd benchmarks && pnpm install && pnpm benchmark:tokens

Fork: blackwell-systems/toon@gcf-comparison


Summary

MetricGCFTOONJSON
Comprehension accuracy (500 sym)100%100%66.7%
Input tokens (500 symbols)11,09016,37853,341
Output tokens (100 symbols)5,61911,65022,180
Generation validity5/55/5N/A
vs JSON input savings79%69%baseline
vs TOON input savings32%baselinen/a
vs JSON output savings75%40%baseline
vs TOON output savings52%baselinen/a
Mixed-structure efficiencybest34% larger72% larger
Flat-data efficiencybest3% larger149% larger
Session dedup (5th call)92.7%unavailableunavailable
Delta encoding81.2%unavailableunavailable

GCF wins on input efficiency, output efficiency, and offers session/delta features no competitor has. See the generation eval for output token methodology.