Build a queryable, AI-readable reference library.

A system for turning your books, papers, and notes into long-term semantic memory for agentic AI sessions. Bring your own corpus; the methodology + tools handle the rest. Extract once, consume many times.

— start here

Read the philosophy first.

"A 26,000-line book → ~280-line distillation. 100× compression at the load-bearing material."

Read METHODOLOGY.md to understand the three-layer extraction-and-synthesis pipeline. Then library-structure.md for the directory layout. Then run load_context.py against a small test library to see the session-quick-start output.

— I

Documentation — the methodology in six focused docs

Each doc explains one piece of the system. Read in order for adoption; skim by topic for reference.

1.1

Library Structure

Directory layout · naming conventions · YAML schema

The filesystem layout the methodology and tools assume. Categories, slugs, frontmatter, where the library lives relative to this repo.

Foundation

1.2

INSIGHTS Extraction

The prompt · costs · refinement

What extraction does, what it costs (~$0.02-0.40/book), how to tune the prompt for your portfolio, when to refine manually.

Pipeline

1.3

Synthesis Pattern

Cross-book pattern docs

When to write a synthesis (≥3 books on a topic). What it contains: consensus, disagreements, "best of," failure modes, open questions.

Pattern

1.4

Project Maps

Tier-ranked reading lists

Per-project book maps in tiers (load-bearing / consult / context). The routing layer that connects your library to active work.

Pattern

1.5

MCP Server

TF-IDF search · Claude Code integration

Expose the library as a queryable MCP server — search_library, get_book_insights, get_project_context. No API key, no vector DB, fully local.

Integration

1.6

Tier A · B · C acquisition framework

Be explicit about acquisition status. Tier A (citations + summary) is the floor; B (public domain) and C (legitimately purchased) layer on. Tier D (gray-channel) forbidden.

Discipline

— II

Python Tools — eight scripts for building and querying

All driven by REFERENCE_LIBRARY_ROOT env var or --library flag. Tools live here; library lives wherever you keep it.

2.1

batch_extract_insights.py

INSIGHTS extraction via Anthropic API

Scans library for books without INSIGHTS, runs extraction (haiku or sonnet), writes results. Rate-limited, resumable, cost-tracked.

Core

2.2

load_context.py

Session quick-start generator

Run at session start: produces a context block with the right books at the right priority for a given project. Pipe to clipboard with --clip.

Core

2.3

mcp_server.py

MCP server with TF-IDF search

Register with Claude Code; agents can search the library, fetch book INSIGHTS, get project context — all via standard MCP tool calls.

Integration

2.4

regenerate_inventory.py

INVENTORY.md generator

Scans every content.md, reads frontmatter, produces a top-level inventory. Run after adding books.

Maintenance

2.5

tag_library.py

Config-driven YAML frontmatter tagger

Generic tagger driven by a JSON config. Maps categories + book slugs + chapter keywords to project tags.

Maintenance

2.6

epub_to_md.py

EPUB → markdown

Token-efficient EPUB extraction with image preservation and metadata.

Acquisition

2.7

extract_all_bundles.py

Bulk EPUB ingestion

Inbox → library batch extraction with optional skip list.

Acquisition

2.8

fix_image_paths.py

Path normalization utility

Normalizes image references in content.md after import. Run when image paths drift.

Utility

— III

Templates — schema starting points

Copy these into your library and fill in your content. The schemas are the load-bearing convention; the body is yours.

3.1

Book Frontmatter

YAML schema for content.md

Title, authors, publisher, ISBN, category, acquisition tier, projects, tags. The minimum viable schema for a book entry.

Schema

3.2

INSIGHTS Template

What an extracted INSIGHTS file looks like

Frontmatter + sections by use case (not chapter order) + project relevance summary at the end. Pattern the model follows.

Schema

3.3

Synthesis Doc

Cross-book pattern document

Consensus, disagreements, "best of," failure modes, open questions, project application notes.

Schema

3.4

Project Map

Per-project tier-ranked reading list

Tier 1 (load-bearing), Tier 2 (consult on demand), Tier 3 (context). Drives the session quick-start tool.

Schema

3.5

Inventory

Top-level catalog format

Auto-generated by regenerate_inventory.py; this template shows the format.

Reference

3.6

Tagging Config

JSON config for tag_library.py

Maps library categories + specific book slugs + chapter keywords to project tags. Customize for your portfolio.

Config

— IV

Examples — see what the artifacts actually look like

Three fully-anonymized examples using real foundational books (Pragmatic Programmer, Clean Code, Effective Java) so you can see the format in action.

4.1

INSIGHTS Example

The Pragmatic Programmer (20th ed)

A fully-worked INSIGHTS file. 10 patterns extracted, organized by use case, with project relevance summary.

Highest signal

4.2

Synthesis Example

Error handling across 3 books

A cross-book synthesis on error handling, drawing from Pragmatic Programmer + Clean Code + Effective Java.

Pattern

4.3

Project Map Example

Backend API Refresh

A real-shape project map with tier-ranked books, synthesis doc references, key patterns extraction, and coverage notes.

Pattern

— for non-github readers

Send a quick note.

Adopting this methodology yourself? Hit a problem with the tools? Have a war story? This form goes straight to the maintainer.

If you have a GitHub account, opening an issue is preferred. This form is the path for everyone else.