AI Tools & AgentsKR

Why AI Forgets Everything — 3 Open-Source Solutions to the Memory Crisis

karpathy-skills, claude-mem, Cognee — comparing 3 approaches to solving the AI memory problem.

Why AI Forgets Everything — 3 Open-Source Solutions to the Memory Crisis

Why AI Forgets Yesterday's Work — 3 Open-Source Solutions to the Memory Problem

Between an AI that needs constant reminders to "use formal Korean" and one that remembers forever after being told once — which would you choose?

The AI Memory Problem

I spent 10 hours working on a project with Claude Code. Fixed bugs, made architectural decisions, established rules like "we're doing it this way." Then the session ends and I return the next day — the AI has forgotten everything.

This isn't just an inconvenience. It's a structural loss of productivity.

Every session, the same things repeat:

  • "Write Korean posts in formal style"
  • "Follow this pattern for Sanity upload scripts"
  • "This project's slug convention is no suffix for Korean, -en for English"
  • "I started building that feature two days ago... where did I leave off?"

LLMs have the world's best contextual understanding, but memory like a goldfish. When a session ends, everything evaporates.

Three open-source projects tackling this problem are exploding on GitHub right now. Each takes a different approach, and comparing them reveals the true nature of the "AI memory" problem.

1. andrej-karpathy-skills — One CLAUDE.md File to Rule Them All

GitHub: forrestchang/andrej-karpathy-skills

Stars: 48,900+ (7,900 in one day)

What is it

In January 2026, Andrej Karpathy observed and documented recurring failure patterns in LLM coding agents, distilled into a single CLAUDE.md file. Place this file in your project root and Claude Code automatically reads and follows these rules.

Karpathy's 4 LLM Failure Patterns

PatternDescription
Silent AssumptionsMakes assumptions and runs with them without verification
OverengineeringTurns 100-line solutions into 1,000 lines
Scope CreepTouches code you never asked for
Lack of JudgmentSyntax is correct but judgment is absent

The 4 Rules

Rule 1: Think Before Coding

Explicitly state assumptions before coding. If uncertain, ask. When multiple interpretations are possible, present options — don't silently pick one.

Rule 2: Simplicity First

Implement only what's requested. No future flexibility, no configurability, no error handling for impossible scenarios. If a 200-line solution can be 50 lines, rewrite it.

Rule 3: Surgical Changes

When editing existing code, don't "improve" adjacent code. Follow existing style even if you don't like it. If you find unrelated dead code, mention it but don't delete it.

Rule 4: Goal-Driven Execution

Convert every task into verifiable goals:

  • "Add validation" → "Write tests for invalid inputs and make them pass"
  • "Fix bug" → "Write a test that reproduces the bug and make it pass"

Relationship to Memory Problem

karpathy-skills is the most primitive solution to the memory problem: "Let's write down the rules AI keeps forgetting." Since CLAUDE.md auto-loads at session start, these become rules the AI "cannot forget."

The limitations are clear:

  • Manual. Humans must write the rules
  • Static. Can't capture context that accumulates as projects evolve
  • Size-limited. Not recommended beyond 200 lines

Yet it earned 48K stars because this alone raised coding accuracy from 65-70% to 91-94%, according to community reports. Simple but powerful.

2. claude-mem — Automatic Cross-Session Context Memory

GitHub: thedotmack/claude-mem

Stars: 59,300+ (peaked at 62.7K)

What is it

claude-mem is a persistent memory plugin that automatically captures all Claude Code activity, compresses it with AI, and auto-injects it into the next session. If karpathy-skills is "write down the rules," claude-mem is "record everything automatically."

How it works

Session Start → Auto-inject recent work history (800~3,000 tokens)
  ↓
During Work → Background AI compression of all tool call results
  ↓
File Read → Auto-inject past memories related to that file
  ↓
Session End → Generate and save session summary

The core is 5 lifecycle hooks:

HookTriggerAction
SessionStartSession startInject 50 recent observations + 10 session summaries
UserPromptSubmitPrompt inputSession logging
PostToolUseAfter tool executionSend results to background worker, AI compression
StopInterrupt/idleGenerate session-level summary
SessionEndSession endFinalize metadata

Most Innovative Feature: PreToolUse:Read

When Claude reads a file, it auto-injects past observations about that file. In other words, memory follows the agent's gaze. Even the official docs call this feature "genuinely novel."

3-Layer Progressive Disclosure

For token efficiency, it doesn't load all memories at once:

  1. Search (Layer 1): Return only observation IDs (~50-100 tokens)
  2. Timeline (Layer 2): Provide chronological context
  3. Detail (Layer 3): Full load only selected observations (~500-1,000 tokens)

Result: 5,250 tokens used where 25,000 would be needed (80% savings).

Installation

bash
npx claude-mem install

This single line completes SQLite + ChromaDB setup, hook registration, and worker service startup.

Difference from karpathy-skills

karpathy-skillsclaude-mem
MethodManual rule writingAuto-capture + AI compression
ScopeImmutable rulesAll work history
Size~200 linesUnlimited (DB)
UpdatesManualAutomatic
SearchNone (full load)Semantic + keyword
CostFree$0.002-0.01 per compression

Limitations

Weaknesses identified in source code reviews:

  • Zero knowledge integrity verification
  • Zero quality/trust scoring
  • Zero append-only protection

Powerful, but insufficient for enterprise-grade security environments.

3. cognee — Learning Memory Engine for AI Agents

GitHub: topoteretes/cognee

Stars: 16,400+

What is it

cognee is an AI memory engine that transforms unstructured data into a learning, evolving knowledge system. If claude-mem is "personal memory," cognee is closer to "organizational knowledge."

AI Memory in 6 Lines

python
import cognee, asyncio

async def main():
    await cognee.remember("Cognee transforms documents into AI memory.")
    results = await cognee.recall("What does Cognee do?")
    for result in results:
        print(result)

asyncio.run(main())

Behind this simplicity lies a sophisticated pipeline:

  • remember: Data → embeddings + graph nodes → persistent storage
  • recall: Auto-route queries to optimal search strategy (vector/graph hybrid)
  • forget: Selective deletion including relationship cleanup
  • improve: Update knowledge structure via feedback-based learning

RAG vs. cognee

Traditional RAGcognee
Memory TypeStatic documentsLearning, evolving knowledge graph
SearchVector similarity onlyVector + graph traversal hybrid
ContextSingle sessionCross-session, cross-agent
LearningNoneContinuous improvement via improve()
RelationshipsNoneExplicit concept connections + ontology
Multi-agentIsolatedTenant isolation + shared knowledge

Core Differentiator: Learning Memory

RAG retrieves. cognee learns and reasons. It doesn't just find similar documents — it understands how concepts connect and adapts based on outcomes.

Use Cases

  • Customer Support Agents: Connect past consultation histories in graphs, auto-retrieve validated solutions from similar cases
  • Expert Knowledge Transfer: Capture senior analyst SQL patterns, auto-provide to juniors
  • Multi-agent Research: Share knowledge between agents, prevent duplicate research

Comparing the 3 Tools

These three tools solve different layers of the same problem:

karpathy-skills (Rules)
  └── "Do it this way" — Immutable principles
        ↓
claude-mem (Memory)
  └── "Yesterday I did this" — Personal history
        ↓
cognee (Knowledge)
  └── "This connects to that like this" — Organizational knowledge
Dimensionkarpathy-skillsclaude-memcognee
MetaphorPost-it noteDiaryEncyclopedia
TargetIndividual developersIndividual/teamTeam/organization
AutomationManualFully automaticAPI-based
Memory StructureFlat textTime-series observationsKnowledge graph
LearningNonePattern recognitionFeedback-driven evolution
Token EfficiencyFull loadProgressive DisclosureAuto-routing
Stars48.9K59.3K16.4K
Setup DifficultyCopy fileOne npx linepip install

Which Should You Use?

  • Want to improve Claude Code quality right now? → karpathy-skills. One file copy and done
  • Work on the same project daily and hate losing context? → claude-mem. 5-minute install, positive ROI after 3-5 sessions
  • Running AI agents team-wide and need accumulating knowledge? → cognee. Learning memory is the next step beyond RAG

Real-World Combination Recommendation

The most powerful approach is using all three together:

  1. Put karpathy-skills' 4 rules in CLAUDE.md to secure baseline quality
  2. Use claude-mem to automatically maintain cross-session history
  3. Use cognee to build team/org-level knowledge and inject into agents
CLAUDE.md (Rules layer)
  + claude-mem (Memory layer)
    + cognee (Knowledge layer)
= AI that doesn't forget

Closing: Toward AI that Remembers

If 2024's question was "Can AI write code?", 2026's question is "Can AI remember what it did yesterday?"

The 48K stars on karpathy-skills and 59K on claude-mem show how urgent this problem is. Developers no longer ask "Does AI write code well?" They ask: "Does AI maintain context?"

None of these tools are perfect. karpathy-skills is manual, claude-mem lacks enterprise-grade security, and cognee has setup complexity. But they all point in the same direction: The next leap for LLMs isn't bigger models — it's better memory.

*Projects featured in this post:*

Part 1 of 3 complete

2 more parts waiting for you

From theory to production deployment — subscribe to unlock the full series and all premium content.

Compare plans

Stay Updated

Follow us for the latest posts and tutorials

Subscribe to Newsletter

Related Posts