Reference Library Methodology · the catalogue
v1.0 · May 2026 · MIT

Build a queryable, AI-readable reference library.

A system for turning your books, papers, and notes into long-term semantic memory for agentic AI sessions. Bring your own corpus; the methodology + tools handle the rest. Extract once, consume many times.

— start here

Read the philosophy first.

"A 26,000-line book → ~280-line distillation. 100× compression at the load-bearing material."

Read METHODOLOGY.md to understand the three-layer extraction-and-synthesis pipeline. Then library-structure.md for the directory layout. Then run load_context.py against a small test library to see the session-quick-start output.

— I

Documentation — the methodology in six focused docs

Each doc explains one piece of the system. Read in order for adoption; skim by topic for reference.

1.1
Library Structure
Directory layout · naming conventions · YAML schema
The filesystem layout the methodology and tools assume. Categories, slugs, frontmatter, where the library lives relative to this repo.
1.2
INSIGHTS Extraction
The prompt · costs · refinement
What extraction does, what it costs (~$0.02-0.40/book), how to tune the prompt for your portfolio, when to refine manually.
1.3
Synthesis Pattern
Cross-book pattern docs
When to write a synthesis (≥3 books on a topic). What it contains: consensus, disagreements, "best of," failure modes, open questions.
1.4
Project Maps
Tier-ranked reading lists
Per-project book maps in tiers (load-bearing / consult / context). The routing layer that connects your library to active work.
1.5
MCP Server
TF-IDF search · Claude Code integration
Expose the library as a queryable MCP server — search_library, get_book_insights, get_project_context. No API key, no vector DB, fully local.
1.6
Copyright Tiers
Tier A · B · C acquisition framework
Be explicit about acquisition status. Tier A (citations + summary) is the floor; B (public domain) and C (legitimately purchased) layer on. Tier D (gray-channel) forbidden.
— II

Python Tools — eight scripts for building and querying

All driven by REFERENCE_LIBRARY_ROOT env var or --library flag. Tools live here; library lives wherever you keep it.

2.1
batch_extract_insights.py
INSIGHTS extraction via Anthropic API
Scans library for books without INSIGHTS, runs extraction (haiku or sonnet), writes results. Rate-limited, resumable, cost-tracked.
2.2
load_context.py
Session quick-start generator
Run at session start: produces a context block with the right books at the right priority for a given project. Pipe to clipboard with --clip.
2.3
mcp_server.py
MCP server with TF-IDF search
Register with Claude Code; agents can search the library, fetch book INSIGHTS, get project context — all via standard MCP tool calls.
2.4
regenerate_inventory.py
INVENTORY.md generator
Scans every content.md, reads frontmatter, produces a top-level inventory. Run after adding books.
2.5
tag_library.py
Config-driven YAML frontmatter tagger
Generic tagger driven by a JSON config. Maps categories + book slugs + chapter keywords to project tags.
2.6
epub_to_md.py
EPUB → markdown
Token-efficient EPUB extraction with image preservation and metadata.
2.7
extract_all_bundles.py
Bulk EPUB ingestion
Inbox → library batch extraction with optional skip list.
2.8
fix_image_paths.py
Path normalization utility
Normalizes image references in content.md after import. Run when image paths drift.
— III

Templates — schema starting points

Copy these into your library and fill in your content. The schemas are the load-bearing convention; the body is yours.

3.1
Book Frontmatter
YAML schema for content.md
Title, authors, publisher, ISBN, category, acquisition tier, projects, tags. The minimum viable schema for a book entry.
3.2
INSIGHTS Template
What an extracted INSIGHTS file looks like
Frontmatter + sections by use case (not chapter order) + project relevance summary at the end. Pattern the model follows.
3.3
Synthesis Doc
Cross-book pattern document
Consensus, disagreements, "best of," failure modes, open questions, project application notes.
3.4
Project Map
Per-project tier-ranked reading list
Tier 1 (load-bearing), Tier 2 (consult on demand), Tier 3 (context). Drives the session quick-start tool.
3.5
Inventory
Top-level catalog format
Auto-generated by regenerate_inventory.py; this template shows the format.
3.6
Tagging Config
JSON config for tag_library.py
Maps library categories + specific book slugs + chapter keywords to project tags. Customize for your portfolio.
— IV

Examples — see what the artifacts actually look like

Three fully-anonymized examples using real foundational books (Pragmatic Programmer, Clean Code, Effective Java) so you can see the format in action.

4.1
INSIGHTS Example
The Pragmatic Programmer (20th ed)
A fully-worked INSIGHTS file. 10 patterns extracted, organized by use case, with project relevance summary.
4.2
Synthesis Example
Error handling across 3 books
A cross-book synthesis on error handling, drawing from Pragmatic Programmer + Clean Code + Effective Java.
4.3
Project Map Example
Backend API Refresh
A real-shape project map with tier-ranked books, synthesis doc references, key patterns extraction, and coverage notes.
— for non-github readers

Send a quick note.

Adopting this methodology yourself? Hit a problem with the tools? Have a war story? This form goes straight to the maintainer.

If you have a GitHub account, opening an issue is preferred. This form is the path for everyone else.