Building a Second Brain with LLMs
I didnβt plan to build a second brain tonight. I stumbled on Andrej Karpathyβs LLM Wiki gist β published 6 days ago, already 5000+ stars. Two hours later I had a working knowledge graph of my own work running in Obsidian.
Hereβs exactly what I built and how.
The problem with RAG
Most LLM + personal knowledge setups work like this:
source β chunks β vectors β retrieve at query time β LLM reasons from scratch
Every query starts cold. The LLM re-derives answers from raw chunks every single time. Nothing accumulates. Ask the same question 100 times; it does 100x the work.
Karpathyβs insight is a single distinction:
RAG retrieves. LLM Wiki compiles.
Instead of storing chunks, the LLM reads your source once, understands it, and writes structured markdown pages β summaries, entity pages, cross-references, contradictions flagged. The wiki becomes a persistent, compounding artifact. Every new source makes it richer. Every good answer gets filed back in.
The human curates sources and asks questions. The LLM does all the bookkeeping.
Architecture
Three layers:
| Layer | What it is | Who writes it |
|---|---|---|
raw/ | Immutable source documents | You |
wiki/ | Structured markdown pages | LLM |
CLAUDE.md | Schema + conventions + workflows | You + LLM co-evolve |
The wiki is just a git repo of markdown files. Obsidian is the renderer. Claude Code is the architect β it decides what pages exist, what they contain, and how they connect.
Directory structure
second-brain/
βββ CLAUDE.md
βββ raw/
β βββ assets/
βββ wiki/
βββ index.md # master catalog β LLM reads this first on every query
βββ log.md # append-only ingest history
βββ projects/ # one page per active repo
βββ data-engineering/ # Spark, Iceberg, Delta, DuckDB
βββ infra/ # Databricks, Terraform, AWS
βββ ai/ # LLMs, RAG, agents, MCP
βββ synthesis/ # cross-domain analysis
CLAUDE.md β the schema file
This is the key piece. It tells Claude Code how to structure pages, what frontmatter to use, and how to maintain the wiki. Without it the LLM is a generic chatbot. With it, it becomes a disciplined wiki maintainer.
# Wiki Instructions
You are the maintainer of this wiki. When given a source to ingest,
follow the schema below. Always update index.md and append to log.md.
## Frontmatter
### projects/<slug>.md
---
title: ""
status: active | paused | archived
repo: ""
stack: []
domain: data-engineering | infra | ai
summary: ""
---
### domain pages
---
title: ""
tags: []
related_projects: []
last_updated: YYYY-MM-DD
---
## Log Format
## [YYYY-MM-DD] ingest | <title>
### What
### Why / decision
### Links
## Conflict Resolution
If a new source contradicts an existing wiki page:
1. Do not overwrite the old information immediately.
2. Create a `## Contradictions / Evolution` section at the bottom of the page.
3. Note the discrepancy: "Source [A] claims X, but Source [B] claims Y."
4. If one is clearly an update (e.g. a version change), move old info to an
`## Archive` foldout and update the main summary.
## File Verification
Always use `find wiki/ -name "*.md"` to verify file existence rather than
relying solely on index.md β the index may lag on large ingest sessions.
The conflict resolution rule matters more than it looks. Without it the LLM silently overwrites your past thinking. With it, your wiki tracks how your understanding evolved over time:
2024: "Use Spark for everything."
2026: "Spark is overkill; use DuckDB for local analytics."
Instead of losing the Spark context, the wiki writes: βEvolved from Spark-heavy infra to local-first DuckDB for cost-efficiency.β Thatβs the synthesis layer.
Setup
mkdir -p ~/second-brain/{raw/assets,wiki/{projects,data-engineering,infra,ai,synthesis}}
touch ~/second-brain/wiki/index.md
touch ~/second-brain/wiki/log.md
touch ~/second-brain/CLAUDE.md
# paste schema into CLAUDE.md
Open ~/second-brain as a vault in Obsidian (File β Open Vault β Open folder as vault). Then:
cd ~/second-brain
claude
First ingest
Once Claude Code is running, give it a natural language instruction β ingest is not a CLI subcommand, itβs a prompt you type to the agent:
ingest this source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Claude Code fetches the gist, extracts the architecture, use cases, and key distinctions, and writes wiki/ai/llm-wiki-pattern.md. It updates index.md and appends to log.md. One source; multiple pages touched.
The log entry looks like:
## [2026-04-10] ingest | LLM Wiki Pattern (Karpathy)
### What
Created wiki/ai/llm-wiki-pattern.md. Initialized directory structure.
### Why / decision
First source. Placed in ai/ β primarily about LLM-driven knowledge management.
### Links
- Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
- Wiki page: ai/llm-wiki-pattern.md
Ingesting your own work
The real value comes when you ingest your own projects and history.
ingest this source: https://github.com/chanukyapekala/clipai
ingest this source: https://github.com/delta-skills
Note on large repos: A large repository will exceed Claudeβs context window in a single pass. For big sources, guide the agent explicitly: βprocess this repo directory by directory, starting with the README and core source folders.β This prevents truncated synthesis.
Ingesting your Claude Code conversation history is a killer move β but comes with a caveat:
ingest my claude history from ~/.claude
Privacy warning: Your Claude history likely contains API keys, temporary tokens, or internal details. Before the LLM commits anything to a git-versioned wiki, scan the output:
grep -r "sk-\|Bearer\|password" wiki/and remove sensitive lines. Keep your wiki repo private if in doubt.
That said β once sanitized, this single ingest compiles everything youβve built, every decision youβve made, every technology youβve explored into structured wiki pages automatically.
The graph
After ingesting my Claude history and a few repos, Obsidianβs graph view showed:

delta-lake β databricks-terraform β iceberg β duckdb-dbt β ashre-spec β clipai-cli β slides-agent
All connected. Obsidian renders the graph from [[wikilinks]] β but Claude Code is the architect that decided those connections exist. The LLM wrote the topology; Obsidian draws it.
The moment it clicked
I asked Claude Code to synthesize patterns across my projects. It surfaced 8 things without any prompting:
- Poetry as my default Python stack
- Local-first, offline-capable tooling preference (Ollama over API keys)
- Learning-through-building as a consistent method
- Personal itch β publishable tool as a recurring arc
- Deliberate adoption of emerging tools before mainstream (Scala 3, DuckDB, Iceberg, Astro)
Things I knew implicitly but had never articulated.
A RAG system would have returned chunks about Poetry and DuckDB. The wiki returned insight about how I build.
Operations
Ingest β give Claude Code a source URL or file path. It reads, extracts, writes pages, updates index and log. One source can touch 10β15 pages.
Query β ask against the wiki:
what do I know about MCP patterns from my own work?
what are the tradeoffs between Genie Code and CLI for my pipelines?
Claude Code reads index.md, finds relevant pages, synthesizes with citations. File good answers back:
save this analysis as wiki/synthesis/genie-vs-cli.md
Lint β periodic health check:
audit my wiki for orphan pages, broken links, and contradictions
What about search at scale?
Right now index.md is enough β Claude Code reads the full catalog on every query. Works reliably up to ~100 pages.
Before you even hit 100 pages, instruct Claude Code to verify file existence with find rather than trusting the index alone β the index can lag during large ingest sessions. This is already in the CLAUDE.md above.
When the wiki grows beyond ~100 pages, qmd adds a local search layer β BM25 + vector hybrid with an MCP server. Claude Code calls it as a native tool instead of scanning index.md. Add it when the wiki earns it.
Version it
cd ~/second-brain
echo "raw/\n.manifest.json\n.env" > .gitignore
git init
git add wiki/ CLAUDE.md .gitignore
git commit -m "initial wiki"
git remote add origin https://github.com/<you>/second-brain.git
git push -u origin main
Keep it private if youβve ingested Claude history. Every ingest is a commit. You can watch your knowledge base grow over time.
Why this works
The tedious part of maintaining a knowledge base is not the reading or the thinking β itβs the bookkeeping. Updating cross-references, keeping summaries current, flagging contradictions. Humans abandon wikis because the maintenance burden grows faster than the value.
LLMs donβt get bored. They donβt forget to update a cross-reference. They can touch 15 files in one pass.
The wiki stays maintained because the cost of maintenance is near zero.
Karpathy traces the idea back to Vannevar Bushβs 1945 Memex β a personal knowledge store with associative trails between documents. Bushβs unsolved problem was who does the maintenance. Turns out: the LLM.
Original gist by Andrej Karpathy β gist.github.com/karpathy/442a6bf555914893e9891c11519de94f