Skip to content

Lossless Verification

GCF guarantees decode(encode(value)) == value for every structured value. This page documents how that guarantee is verified.

The numbers

FormatRound-tripsFailures
JSON11,250,000,000+0
YAML21,000,000,0000
MessagePack585,000,0000
CSV335,000,0000
TOML100,000,0000
Total33,270,000,000+0

33 billion+ random structured values generated, serialized through each format, parsed back, encoded as GCF, decoded from GCF, and compared to the original. Zero mismatches.

What this proves

GCF's grammar expresses any structured data regardless of source format. The encoding operates on structured values (objects, arrays, strings, numbers, booleans, null), not on JSON syntax. Whether your data originates as JSON, YAML, TOML, CSV, MessagePack, or any other format that deserializes to these primitives, GCF encodes it losslessly.

The 5 formats tested are proof points. The capability is universal: any format that produces objects and arrays round-trips through GCF.

Per-language verification

Six language implementations, all passing the same 157 conformance fixtures:

LanguagePackageConformanceRound-trip fuzz
Rustgcf156/15633B+ multi-format (definitive suite)
Gogcf-go156/1561B+ (native Go fuzzing)
TypeScript@blackwell-systems/gcf156/156Conformance-based
Pythongcf-python156/156Conformance-based
Swiftgcf-swift156/156Conformance-based
Kotlingcf-kotlin156/156Conformance-based

Cross-language matrix

Every implementation can encode data that every other implementation decodes correctly. The 6x6 encode/decode matrix is fully verified: encode in Go, decode in Python. Encode in Rust, decode in Swift. Every combination passes.

How the fuzz testing works

Each test generates random structured values using a seeded PRNG, ensuring reproducibility. The round-trip is:

1. Generate random value (objects, arrays, scalars, nested to depth 3-4)
2. Serialize to format X (JSON, YAML, TOML, CSV, or MessagePack)
3. Parse back from format X (normalize types)
4. Encode as GCF via encode_generic()
5. Decode from GCF via decode_generic()
6. Deep-compare decoded value to step 3 output
7. Any mismatch = test failure (zero tolerance)

The value generator produces:

  • Null, booleans, integers, floats, strings
  • Objects with 0-6 keys (alphanumeric keys with underscores, hyphens, dots)
  • Arrays with 0-8 elements
  • Nesting to depth 3-4
  • Strings with spaces, punctuation, unicode-safe characters

Each seed produces a unique value. Seeds are sequential (0, 1, 2, ..., N). The tests are fully deterministic and reproducible.

Reproduce it

bash
git clone https://github.com/blackwell-systems/gcf-rust
cd gcf-rust

# JSON 10B (parallel, ~56 min on 14-core machine)
cargo test --release json_10b -- --nocapture

# YAML 10B (parallel, ~3 hours)
cargo test --release yaml_10b -- --nocapture

# MessagePack + YAML + CSV (334M each)
cargo test --release multiformat -- --nocapture

# TOML 100M
cargo test --release toml_100m -- --nocapture

All tests use Rayon for parallel execution across all available cores. Progress logs to stderr with rate and ETA. The Rust implementation is the definitive fuzz suite: 2.94M round-trips/second on JSON, 307K/s on YAML, at release optimization.

Conformance fixtures

157 fixtures in tests/conformance/ covering:

CategoryCountWhat it tests
arraysVariousEmpty, primitive, mixed-type arrays
attachmentsVariousInline schemas, shared schemas, nested objects
containersVariousObject/array container selection
decodeVariousDecoder edge cases
errors-v2VariousInvalid input rejection
graph-decodeVariousGraph profile decoding
graph-deltaVariousDelta encoding/decoding
graph-encodeVariousGraph profile encoding
graph-pack-rootVariousContent-addressed identity
graph-sessionVariousSession deduplication
inline-schemaVariousInline attachment schemas
keysVariousBare keys, quoted keys, special characters
numbersVariousInteger, float, negative, scientific notation
rootsVariousRoot-level values
scalarVariousAll scalar types and edge cases
streaming-v2VariousDeferred counts, summary trailers
whitespaceVariousIndentation, trailing whitespace

Every implementation must pass every fixture. New fixtures are added when edge cases are discovered (e.g., special-character field names added during multi-format fuzz testing).

Log files

All fuzz run logs are archived in eval/results/multiformat/. Every claimed number has a corresponding log file.

JSON (11.25B total):

  • rust-json10B-parallel-2026-06-14.log (10B, 2.94M/s, parallel)
  • rust-json1B-extra-2026-06-14.log (1B, 2.52M/s, parallel)
  • rust-rerun-json250M-msgpack250M-2026-06-14.log (250M JSON + 250M MessagePack)

YAML (21B total):

  • rust-yaml10B-parallel-2026-06-14.log (10B, 258K/s, parallel, 10.7 hours)
  • rust-yaml10B-parallel-2026-06-15.log (10B, 258K/s, parallel, 10.7 hours)
  • rust-yaml665M-2026-06-14.log (665M, seed offset 334M-999M)
  • rust-msgpack334M-yaml334M-csv334M-2026-06-13.log (334M YAML portion)

MessagePack (584M total):

  • rust-msgpack334M-yaml334M-csv334M-2026-06-13.log (334M MessagePack portion)
  • rust-rerun-json250M-msgpack250M-2026-06-14.log (250M MessagePack portion)

CSV (335M total):

  • rust-msgpack334M-yaml334M-csv334M-2026-06-13.log (334M CSV portion)
  • rust-rerun-csv1M-2026-06-14.log (1M, seed offset 334M)

TOML (100M total):

  • rust-toml100M-2026-06-14.log (100M)

Edge cases verified

The fuzz testing has specifically verified:

  • Null values in all positions (root, object field, array element, tabular cell)
  • Empty strings vs null vs absent (distinct encodings: "", -, ~)
  • Boolean strings ("true", "false" as string values, not parsed as booleans)
  • Numeric strings ("42" as string, not parsed as number)
  • Special characters in keys (%, !, @, #, quoted in field declarations)
  • Deeply nested objects (depth 3-4, objects within arrays within objects)
  • Empty containers (empty arrays, empty objects)
  • Float precision (round-trip preserves values within JSON's double-precision range)
  • Large integers (up to 2^53 - 1, JavaScript's MAX_SAFE_INTEGER)
  • Unicode (multi-byte characters in keys and values)

100% comprehension. 71% fewer tokens. 1,700+ LLM evaluations.