Prompts as Lossy Compression

The encoding

Two ways to encode the same project

The same project, written two ways: a human-readable prompt and a compressed binary archive (xz of the source). Pick a project to see both, then every encoding drawn to scale by byte count.

The rules are universal convention. The model already knows them — almost nothing has to be said.

Prompt · T1 · one line 581 B · 96% passed

Human-readable — you can read and audit every rule.

Prompt · T2 · paragraph 824 B · 97% passed

Human-readable — you can read and audit every rule.

Prompt · T3 · full spec 1,076 B · 99% passed

Human-readable — you can read and audit every rule.

Prompt · T4 · exhaustive 1,545 B · 100% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `roman_numerals` (importable as `import roman_numerals`) exposing exactly these functions at the package top level: - `to_roman(n: int) -> str` - `from_roman(s: str) -> int` - `is_valid_roman(s: str) -> bool` Use only the Python standard library. Functions are called positionally (e.g. `to_roman(1984)`, `from_roman("MCMLXXXIV")`, `is_valid_roman("IV")`). # Roman numerals Implement standard Roman numerals over the range **1 to 3999 inclusive**. **Symbols:** I=1, V=5, X=10, L=50, C=100, D=500, M=1000. **`to_roman(n)`** — greedy conversion using this value/symbol table, largest first: (1000,M), (900,CM), (500,D), (400,CD), (100,C), (90,XC), (50,L), (40,XL), (10,X), (9,IX), (5,V), (4,IV), (1,I). Raises `ValueError` if `n` is not an int or is outside 1..3999. (Treat `bool` as not a valid int.) **`is_valid_roman(s)`** — returns True iff `s` is a *canonical* numeral for some value in 1..3999. Canonical means it matches: `^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$`. The empty string returns False; any non-string or lowercase input returns False. So "IIII", "VV", "IC", "MMMM" are invalid; "IV", "MCMXCIV", "MMMCMXCIX" are valid. **`from_roman(s)`** — first checks `is_valid_roman(s)`; if invalid, raises `ValueError`. Otherwise parses by scanning symbols: add each symbol's value, but subtract when a smaller symbol precedes a larger one. Invariant: `from_roman(to_roman(n)) == n` for all n in 1..3999.

Human-readable — you can read and audit every rule.

Archive · xz of the source 1,160 B

fd 37 7a 58 5a 00 00 04 e6 d6 b4 46 02 00 21 01 16 00 00 00 74 2f e5 a3 e0 05 cb 02 9b 5d 00 33 1c 8a 22 6f a9 32 25 33 7f 62 20 9d d8 19 b0 c2 b7 13 c7 56 ce 0f 8a 93 a9 dd 35 59 bd 6b 02 ba 4b 92 e7 b3 52 1a 77 47 3c 9a 3c ff 31 6c ca a8 6d ee 1a 6b 19 c9 d9 d6 64 74 50 55 88 22 53 2e 04 6f f1 2c be a2 f1 5e 44 eb 81 50 64 60 9d a7 64 94 24 d5 93 42 f8 07 3c 52 79 11 96 e4 04 73 33 3b a7 8b 60 df 5c d9 51 08 2b 52 f8 fd d6 9b ac ef 23 9c fd 07 56 25 d8 51 c1 54 d5 47 a6 b6

Denser, but opaque binary — only a decompressor can read it.

581 B (0.50× the 1,160-B xz archive, smaller than it) but only 96% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

824 B (0.71× the 1,160-B xz archive, smaller than it) but only 97% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

1,076 B (0.93× the 1,160-B xz archive, smaller than it) but only 99% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

100% faithful — but 1,545 B is 1.33× the 1,160-B xz archive. When the content is arbitrary, prose can’t out-pack the byte compressor.

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Source · uncompressed

1,484 B the original

T1 · one line

581 B 96% passed

T2 · paragraph

824 B 97% passed

T3 · full spec

1,076 B 99% passed

T4 · exhaustive

1,545 B 100% passed

ZIP

1,214 B 100% · lossless

1,200 B 100% · lossless

1,160 B 100% · lossless

A conventional shape, but with semi-arbitrary cutoffs and one specific rounding rule the model has to be told.

Prompt · T1 · one line 694 B · 86% passed

Human-readable — you can read and audit every rule.

Prompt · T2 · paragraph 1,002 B · 88% passed

Human-readable — you can read and audit every rule.

Prompt · T3 · full spec 1,075 B · 93% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `grade_book` (importable as `import grade_book`) exposing exactly these functions at the package top level: - `letter_grade(score) -> str` — a numeric score to a letter grade - `grade_points(letter: str) -> float` — a letter grade to GPA grade points - `gpa(scores: list) -> float` — mean grade points across a list of scores Use only the Python standard library. Functions are called positionally (e.g. `letter_grade(91.5)`, `grade_points("B+")`, `gpa([95, 85, 75])`). # Grade book - `letter_grade(score)`: round the score to the nearest integer, then bucket by these cutoffs (minimum score for each letter): A+ 97, A 93, A- 90, B+ 87, B 83, B- 80, C+ 77, C 73, C- 70, D+ 67, D 63, D- 60, F below 60. - `grade_points(letter)`: A+ 4.0, A 4.0, A- 3.7, B+ 3.3, B 3.0, B- 2.7, C+ 2.3, C 2.0, C- 1.7, D+ 1.3, D 1.0, D- 0.7, F 0.0. - `gpa(scores)`: convert each score to a letter, then to grade points, average them, and round to 2 decimal places.

Human-readable — you can read and audit every rule.

Prompt · T4 · exhaustive 1,467 B · 100% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `grade_book` (importable as `import grade_book`) exposing exactly these functions at the package top level: - `letter_grade(score) -> str` — a numeric score to a letter grade - `grade_points(letter: str) -> float` — a letter grade to GPA grade points - `gpa(scores: list) -> float` — mean grade points across a list of scores Use only the Python standard library. Functions are called positionally (e.g. `letter_grade(91.5)`, `grade_points("B+")`, `gpa([95, 85, 75])`). # Grade book **Rounding rule:** wherever a value is rounded below, round half *up* (`decimal.ROUND_HALF_UP`), not banker's rounding. - `letter_grade(score)`: first round `score` to the nearest integer (ties up), then bucket by minimum-score cutoffs, highest first: A+ ≥97, A ≥93, A- ≥90, B+ ≥87, B ≥83, B- ≥80, C+ ≥77, C ≥73, C- ≥70, D+ ≥67, D ≥63, D- ≥60, otherwise F. Raise `ValueError` if score is outside 0..100. (So 96.5 → 97 → "A+"; 89.5 → 90 → "A-"; 72.5 → 73 → "C".) - `grade_points(letter)`: A+ 4.0, A 4.0, A- 3.7, B+ 3.3, B 3.0, B- 2.7, C+ 2.3, C 2.0, C- 1.7, D+ 1.3, D 1.0, D- 0.7, F 0.0. Note **A+ caps at 4.0** (not 4.3). Raise `ValueError` for an unknown letter. - `gpa(scores)`: map each score → letter → grade points, take the arithmetic mean, and round half-up to 2 decimals. An empty list returns 0.0.

Human-readable — you can read and audit every rule.

Archive · xz of the source 1,272 B

fd 37 7a 58 5a 00 00 04 e6 d6 b4 46 02 00 21 01 16 00 00 00 74 2f e5 a3 e0 06 9b 03 15 5d 00 33 1c 8a 22 6f a9 32 25 33 7f 62 20 9d d8 19 b0 c2 b7 13 58 a9 1c e8 c7 cb 6e d5 89 29 75 a8 f2 ab 0b 3c 37 a8 fc 9d 38 58 4f e1 49 2f 48 af b2 c0 d2 9f c3 87 1f 93 49 15 f5 08 6b ae e1 a5 ff 69 52 cb 27 bc 7b 07 20 a8 ca 14 bc 7f f5 c0 6f 44 8a 08 43 e3 ea 85 e1 86 2e 89 00 a5 97 b5 c0 1f a3 62 9c e9 a4 a4 72 bc 8e e8 16 8d 82 c8 45 36 17 e7 65 96 ef 90 85 4d dd cf 62 54 ca ba e2 37

Denser, but opaque binary — only a decompressor can read it.

694 B (0.55× the 1,272-B xz archive, smaller than it) but only 86% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

1,002 B (0.79× the 1,272-B xz archive, smaller than it) but only 88% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

1,075 B (0.85× the 1,272-B xz archive, smaller than it) but only 93% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

100% faithful — but 1,467 B is 1.15× the 1,272-B xz archive. When the content is arbitrary, prose can’t out-pack the byte compressor.

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Source · uncompressed

1,692 B the original

T1 · one line

694 B 86% passed

T2 · paragraph

1,002 B 88% passed

T3 · full spec

1,075 B 93% passed

T4 · exhaustive

1,467 B 100% passed

ZIP

1,326 B 100% · lossless

1,338 B 100% · lossless

1,272 B 100% · lossless

Arbitrary discount and tax tables, a fixed order of operations, banker's rounding. Pure surprise — and every output depends on it, so it touches almost every check.

Prompt · T1 · one line 1,060 B · 15% passed

Human-readable — you can read and audit every rule.

Prompt · T2 · paragraph 1,742 B · 17% passed

Human-readable — you can read and audit every rule.

Prompt · T3 · full spec 1,986 B · 94% passed

Human-readable — you can read and audit every rule.

Prompt · T4 · exhaustive 2,843 B · 100% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `pricing_engine` (importable as `import pricing_engine`) exposing exactly these names at the package top level: - `LineItem(sku: str, qty: int, unit_price: Decimal)` - `Customer(id: str, tier: str)` — tier is one of "bronze", "silver", "gold" - `Order(customer: Customer, items: list[LineItem], region: str, coupon_code: str | None = None)` — region is one of "US-CA", "US-NY", "EU", "NONE" - `PricedOrder` with these fields, each a `Decimal`: `subtotal, volume_discount, tier_discount, coupon_discount, taxable_base, tax, total`, plus `lines`. - `PricingEngine` with a method `price_order(order: Order) -> PricedOrder`. All monetary values are `decimal.Decimal`. Constructors accept keyword arguments (e.g. `LineItem(sku="a", qty=1, unit_price=Decimal("9.99"))`). # Build a pricing engine `price_order` runs this fixed pipeline, in order. **Rounding rule (applies to every monetary quantity below):** round to exactly 2 decimal places using banker's rounding (`decimal.ROUND_HALF_EVEN`, i.e. `quantize(Decimal("0.01"), ROUND_HALF_EVEN)`). Each discount is rounded *before* it is subtracted, so each step consumes an already-rounded number. 1. **Subtotal** = `sum(unit_price * qty)` over all line items, then rounded. 2. **Volume discount** = `round(subtotal * rate)`, where `rate` is chosen by the highest matching threshold (check in this order, first match wins): - subtotal >= 10000 → 0.18 - subtotal >= 5000 → 0.12 - subtotal >= 1000 → 0.05 - else → 0.00 `after_volume = subtotal - volume_discount` 3. **Tier discount** = `round(after_volume * rate)`: - bronze → 0.00, silver → 0.03, gold → 0.07 `after_tier = after_volume - tier_discount` 4. **Coupon discount** on `after_tier` (rounded): - `SAVE10` → `round(after_tier * 0.10)` - `FLAT50` → `round(min(after_tier, 50.00))` (never discount more than the base) - `BOGO` → pick the line item with the lowest `unit_price` (if tie, any is fine); `free_units = that line's qty // 2` (integer floor division); discount = `round(that unit_price * free_units)` - missing coupon, or any unrecognized code → 0.00 (silently ignored) `taxable_base = after_tier - coupon_discount`; if this is negative, clamp it to 0.00. 5. **Tax** = `round(taxable_base * rate)` by region: - US-CA → 0.0825, US-NY → 0.08875, EU → 0.20, NONE → 0.00 `total = taxable_base + tax`. **Validation:** `LineItem` rejects `qty <= 0` and negative `unit_price` (raise `ValueError`); `Customer` rejects a tier not in {bronze, silver, gold}. Populate every `PricedOrder` field: `subtotal`, `volume_discount`, `tier_discount`, `coupon_discount`, `taxable_base`, `tax`, `total`, and `lines` (the list of input line items).

Human-readable — you can read and audit every rule.

Archive · xz of the source 2,548 B

fd 37 7a 58 5a 00 00 04 e6 d6 b4 46 02 00 21 01 16 00 00 00 74 2f e5 a3 e0 17 02 07 97 5d 00 33 1c 8a 22 6f a9 32 41 af a2 0b f3 56 d5 39 60 bd c5 11 6f e7 e3 33 3c 05 a2 27 81 e7 0e cb ea 5f b4 d6 53 e3 a3 62 02 34 5f b0 18 7f 83 4c 0f f9 b3 a4 d2 06 f3 2b 03 18 c6 1e 0b 8d f3 28 d7 b3 54 ce 35 70 70 a0 39 87 69 32 a7 c6 5a b3 2b 35 3c 32 97 68 d3 fd d4 f8 00 e3 b7 23 78 ec 32 f7 83 67 ef 98 88 49 55 20 33 d2 8f ca 4f 9d 64 a3 ea 28 7c 97 5b f0 04 db 2a fc f0 ff cb 53 56 91

Denser, but opaque binary — only a decompressor can read it.

1,060 B (0.42× the 2,548-B xz archive, smaller than it) but only 15% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

1,742 B (0.68× the 2,548-B xz archive, smaller than it) but only 17% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

1,986 B (0.78× the 2,548-B xz archive, smaller than it) but only 94% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

100% faithful — but 2,843 B is 1.12× the 2,548-B xz archive. When the content is arbitrary, prose can’t out-pack the byte compressor.

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Source · uncompressed

5,891 B the original

T1 · one line

1,060 B 15% passed

T2 · paragraph

1,742 B 17% passed

T3 · full spec

1,986 B 94% passed

T4 · exhaustive

2,843 B 100% passed

ZIP

3,146 B 100% · lossless

2,684 B 100% · lossless

2,548 B 100% · lossless

A realistic RBAC/ABAC engine: role inheritance, wildcard resource patterns, conditions. Most of it is conventional — the novelty hides in a few arbitrary precedence rules, so it touches only a minority of checks.

Prompt · T1 · one line 1,324 B · 83% passed

Human-readable — you can read and audit every rule.

Prompt · T2 · paragraph 2,058 B · 94% passed

Human-readable — you can read and audit every rule.

Prompt · T3 · full spec 3,039 B · 98% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `rbac` (importable as `import rbac`) exposing exactly these names at the package top level: - `Permission(action: str, resource: str, effect: str = "allow", conditions: dict | None = None)` — `action` is an action name (e.g. "read") or "*"; `resource` is a `:`-delimited pattern; `effect` is "allow" or "deny"; `conditions` maps context-key to required value. - `Role(name: str, permissions: list[Permission] = [], inherits: list[str] = [])` - `Principal(id: str, roles: list[str] = [], attributes: dict | None = None)` - `Decision` with fields: `allowed: bool`, `effect: str`, `matched: str`, `reason: str`. - `AuthorizationEngine(roles: list[Role])` with a method `check(principal: Principal, action: str, resource: str, context: dict | None = None) -> Decision`. All constructors accept keyword arguments (e.g. `Permission(action="read", resource="doc:*", effect="allow")`). Resources and patterns are `:`-delimited token strings such as `"doc:reports:q4"`. # Build an authorization engine `check(principal, action, resource, context=None)` decides a request by collecting every permission that applies to the principal and resolving conflicts. Implement it as follows. **1. Effective permissions.** Collect the permissions of every role the principal holds, plus the permissions of roles reached transitively through `inherits`. Inheritance cycles must not loop forever (visit each role once). Role names that are not defined are ignored. **2. Which permissions apply.** A permission applies to the request when all of: - **Action matches:** `permission.action == action`, or `permission.action == "*"`. - **Resource matches:** patterns and resources are `:`-delimited token lists. A `*` token matches exactly one token. If the *last* pattern token is `*`, it is a prefix wildcard: the tokens before it must match and any remaining resource tokens are accepted (so `doc:*` matches `doc:reports:q4`). Otherwise the pattern and resource must have the same number of tokens. - **Conditions hold:** build a context mapping from the principal's `attributes` combined with the `context` argument, then every entry in `permission.conditions` must equal the matching value in that mapping. Empty conditions always hold. **3. Resolve conflicts.** Among the applying permissions, the **most specific** resource pattern wins — a pattern with more literal (non-`*`) tokens is more specific, and an exact pattern (no wildcard at all) beats one containing a wildcard. When the winning specificity is tied, **deny beats allow**. So a more specific *allow* overrides a less specific *deny* — this is most-specific-wins, not deny-overrides-everything. **4. Default.** If no permission applies, the request is denied. Return a `Decision` with `allowed` (true iff the winning effect is "allow"), the winning `effect`, a `matched` string identifying the deciding permission (or "<default>"), and a short `reason`.

Human-readable — you can read and audit every rule.

Prompt · T4 · exhaustive 3,888 B · 86% passed

## Required public API (must match exactly so automated tests can call it) Build a Python package named `rbac` (importable as `import rbac`) exposing exactly these names at the package top level: - `Permission(action: str, resource: str, effect: str = "allow", conditions: dict | None = None)` — `action` is an action name (e.g. "read") or "*"; `resource` is a `:`-delimited pattern; `effect` is "allow" or "deny"; `conditions` maps context-key to required value. - `Role(name: str, permissions: list[Permission] = [], inherits: list[str] = [])` - `Principal(id: str, roles: list[str] = [], attributes: dict | None = None)` - `Decision` with fields: `allowed: bool`, `effect: str`, `matched: str`, `reason: str`. - `AuthorizationEngine(roles: list[Role])` with a method `check(principal: Principal, action: str, resource: str, context: dict | None = None) -> Decision`. All constructors accept keyword arguments (e.g. `Permission(action="read", resource="doc:*", effect="allow")`). Resources and patterns are `:`-delimited token strings such as `"doc:reports:q4"`. # Build an authorization engine `check(principal, action, resource, context=None)` runs this exact procedure. **1. Effective permissions.** Starting from `principal.roles`, do a depth-first walk following each role's `inherits`. Visit each role at most once (so cycles terminate). Skip any role name not defined in the engine. Collect each visited role's permissions, remembering which role each came from. **2. Filter to applying permissions.** A permission applies iff all three hold: - **Action:** `permission.action == action` OR `permission.action == "*"`. - **Resource:** split both pattern and resource on `":"`. - If the pattern's **last** token is `"*"` (trailing/prefix wildcard): let `prefix = pattern_tokens[:-1]`. It matches iff `len(resource_tokens) >= len(prefix)` and, for each `i < len(prefix)`, `prefix[i] == "*"` or `prefix[i] == resource_tokens[i]`. (So `doc:*` matches `doc`, `doc:reports`, and `doc:reports:q4`; `*` matches everything.) - Otherwise: it matches iff the token counts are equal and each pattern token is `"*"` or equals the resource token at that position (so `doc:*:q4` matches `doc:reports:q4` but not `doc:reports:q3` or `doc:reports`). - **Conditions:** build `ctx = {**principal.attributes, **(context or {})}` — i.e. the principal's attributes overlaid by the `context` argument, with `context` winning on conflicts. Every key/value in `permission.conditions` must equal `ctx.get(key)`. Empty conditions always pass. **3. Rank and pick a winner.** For each applying permission compute the tuple ``` key = (literal_token_count, exact_flag, action_exact_flag, deny_flag) ``` where `literal_token_count` = number of pattern tokens that are not `"*"`; `exact_flag` = 1 if the pattern contains no `"*"` at all else 0; `action_exact_flag` = 1 if `permission.action != "*"` else 0; `deny_flag` = 1 if `effect == "deny"` else 0. The winner is the permission with the **largest** key (compare tuples left to right). This means: most literal tokens wins; then exact-over-wildcard; then exact-action-over-`*`; and finally, only as the last tie-break, **deny over allow**. **4. Decision.** - If at least one permission applies: `effect` = the winner's effect; `allowed = (effect == "allow")`; `matched` = a string identifying the winning permission (include its role, effect, action, and resource). - If none apply: `allowed = False`, `effect = "deny"`, `matched = "<default>"`. Always populate `Decision.reason` with a short human-readable explanation. **Validation.** `Permission` raises `ValueError` if `effect` is not "allow" or "deny". `Role` raises `ValueError` on an empty name. A `Permission` with `conditions=None` behaves as having no conditions; a `Principal` with `attributes=None` behaves as having empty attributes.

Human-readable — you can read and audit every rule.

Archive · xz of the source 3,340 B

fd 37 7a 58 5a 00 00 04 e6 d6 b4 46 02 00 21 01 16 00 00 00 74 2f e5 a3 e0 1d 6f 0a 59 5d 00 11 68 0c 44 07 39 ce 3c b6 de c4 54 3d 85 e8 ac bb 0e 87 96 14 14 55 81 13 09 32 37 5b b3 8b c2 42 b0 ec a3 5f e9 51 e9 16 29 5f 87 ab 43 09 a9 2c d1 3e b6 c1 4a 5f 6b 67 a2 9f 5e 54 a7 90 91 af 8d f2 34 24 8f 23 92 b0 a2 b0 d5 34 63 14 09 5e 91 fa c4 fe 52 31 15 ea 31 dd b7 47 b8 0a 1e c6 9b ab 6b 0c fc 7c bc b9 15 47 2a 2d 36 f4 ad 9a e1 11 c3 a7 ad 97 05 66 60 e4 9c d8 0e 1d db db

Denser, but opaque binary — only a decompressor can read it.

1,324 B (0.40× the 3,340-B xz archive, smaller than it) but only 83% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

2,058 B (0.62× the 3,340-B xz archive, smaller than it) but only 94% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

3,039 B (0.91× the 3,340-B xz archive, smaller than it) but only 98% faithful. Cheaper than the archive, but lossy — the archive is always 100%.

3,888 B (1.16× the 3,340-B xz archive, larger than it) but only 86% faithful. Both larger and lossy — the archive is always 100%.

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Source · uncompressed

7,536 B the original

T1 · one line

1,324 B 83% passed

T2 · paragraph

2,058 B 94% passed

T3 · full spec

3,039 B 98% passed

T4 · exhaustive

3,888 B 86% passed

ZIP

3,831 B 100% · lossless

3,500 B 100% · lossless

3,340 B 100% · lossless

Four projects across a novelty spectrum

Two ways to encode the same project

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

Every encoding of this project, drawn to scale (bar length = bytes · label = Haiku fidelity, mean of 5)

How a reconstruction is graded

Four findings, and where each comes from

The curves

When the collapse doesn’t happen