Caveman: compressed LLM communication via telegraphic prose
What it proposes
Caveman is a Claude Code skill that instructs the LLM to respond in compressed, filler-free language by dropping articles, hedging, pleasantries, and connective fluff. It claims roughly 65% output token reduction (the README headline says 75%; the project’s own honest three-arm eval, which controls for a simple “be terse” baseline, shows the skill-specific contribution is narrower). A companion sub-tool, caveman-compress, rewrites instruction files (CLAUDE.md, memory files, preferences) into compressed prose so the LLM reads fewer input tokens on every session load, with measured savings around 35-60% on prose-heavy files while leaving code blocks, URLs, and technical terms untouched.
The underlying principle is more durable than the tool itself: natural-language instructions and LLM output both contain enormous amounts of filler that consumes tokens without improving accuracy or comprehension. The project cites a March 2026 arxiv paper (2604.00025) showing that brevity constraints can actually improve model accuracy on certain benchmarks, not just reduce cost. The insight is that verbosity is not a proxy for quality in LLM communication in either direction, and that systematically compressing the prose layer of human-to-model and model-to-human text is a legitimate optimization axis.
A second, quieter insight lives in the compress sub-tool: instruction files that load on every session represent a recurring token tax. Compressing them is a form of amortized optimization — you pay the compression cost once and save on every subsequent session.
Best used when
- Workflows involve long coding sessions where output verbosity slows reading and inflates cost, and the content is primarily technical (debugging, code review, architecture decisions).
- Instruction files (project rules, preferences, memory) have grown large and load on every session, making input token reduction worthwhile.
- The LLM’s audience is an experienced practitioner who does not need pedagogical scaffolding, hedging, or step-by-step narration of obvious actions.
- Output is consumed ephemerally (chat, terminal) rather than published or stored as prose artifacts.
Poor fit when
- The output is itself a prose deliverable: creative writing, documentation, blog posts, reader-facing copy. Compressed telegraphic style would degrade the artifact, not just the medium.
- The workflow depends on nuanced tone, voice, or register. Caveman’s compression rules are blunt instruments that strip connective tissue indiscriminately; they cannot distinguish between filler and intentional rhetorical structure.
- Instruction files contain carefully worded conditional rules where articles and conjunctions carry semantic load. The compress tool’s “preserve technical terms” heuristic does not cover logical nuance in rule prose.
- Multiple contributors maintain shared instruction files. The backup-file workflow (FILE.original.md as human-readable, FILE.md as compressed) adds drift risk if the original is edited without re-compressing.
Verdict
The tool itself is not the right fit for workflows that include creative writing or prose-quality output, because its compression rules are context-blind and conflict with any style or voice requirements. The principle it demonstrates is directly applicable, however: instruction files that load on every session should be written in dense, imperative, filler-free prose from the start, as a matter of authoring discipline rather than automated post-processing. This avoids the drift risk of maintaining two copies while capturing most of the input token savings. For output, a lighter version of the same idea — instructing the model to be concise for technical exchanges and explicit about when to switch to full prose — achieves the core benefit without an external dependency. The adapt verdict reflects this: the lesson is “write your instruction files tight and tell the model to skip filler in technical exchanges,” not “install this skill.”