What if your agent had an evening — a daily reflection loop instead of summaries | Blog

The morning the agent didn't recognize itself

Every morning I open Claude Code and sit down across from a colleague who doesn't remember a single day we've worked together.

Yesterday we spent three hours debugging why that retry belongs in the service layer and not in the controller. We looked at it from every angle, reached a decision, jotted down a few preferences for how to structure error handling. This morning the agent asks me whether I'd rather move that retry into the controller.

It's not a bug, it's by design. Every Claude Code session starts with a clean context window. Yesterday's decisions, the preferences we dialed in, the "and why this way and not another" — gone. No magic holds continuity across sessions; two files do: CLAUDE.md, which I write, and auto-memory, which the agent writes for itself. And yet every morning I feel like I'm starting from zero.

I have plenty of memory, what I'm missing is reflection

Auto-memory exists, and in practice it suits me better than I expected. It's a directory of markdown files in ~/.claude/projects/<project>/memory/ — an index, MEMORY.md, plus topic-specific files. At the start of a session only the index loads — on the order of the first two hundred lines, around 25 KB — and the agent pulls in the details when it actually needs them. Plain text: I can read it, rewrite it, grep it. No black box.

On top of that, the agent doesn't store everything. It decides for itself what's worth remembering. That's exactly what I want.

And yet something is missing. When I think it through, the problem isn't a lack of memory. A note that says "today we did this and that" isn't the same as the distillation "and here's what that means for tomorrow." I have a log of events. I want conclusions.

Every summary strips away signal

When a session nears the context limit, auto-compaction kicks in: first the old tool-outputs get dropped — which, according to community analyses of how Claude Code works, can easily make up 60 to 80 percent of the context — and then the rest of the conversation gets summarized. It sounds reasonable until you realize exactly what disappears: specific file paths, decisions we hammered out together, the state of work in progress.

And it's not just about a single compaction. Imagine today's session has already gone through compaction during the day, so it's a summary in its own right. When I pull a daily note out of it in the evening, I'm summarizing a summary. And tomorrow that note gets summarized again — by day two I'm already working with a summary of a summary. It's a game of telephone: every round of lossy summarization strips away a piece of the signal, until after a few days all that's left is a blurry memory of something that used to be precise. Chroma's study on context rot confirms it from the other direction — output quality drops as context length grows, and that holds across all 18 frontier models they tested.

That kills the intuition that it's enough to just write more down. A summary is compression of everything. Reflection is selection: what mattered, what kept coming up, what's worth carrying into the next day. More notes don't make you a smarter agent — sometimes it's exactly the opposite.

Someone has already thought this through

The evening reflection loop isn't sci-fi; academics described it back in 2023.

Take that exact morning problem — the agent asking about a retry we solved yesterday. Reflexion (Shinn et al., NeurIPS 2023) addresses precisely this: the agent verbally reflects on feedback and stores it in an episodic memory buffer, so next time it reads what tripped it up last time. No fine-tuning, no retraining of weights — all through text.

Generative Agents (Park et al., 2023) went further. A memory stream, retrieval by recency, importance, and relevance, and above all recursive synthesis of observations into reflections — the so-called reflection tree. The agent assembles higher-level conclusions out of small observations — exactly that step from "what happened" to "what holds because of it."

Sleep-time compute from the folks around Letta/MemGPT goes in yet another direction: a separate agent, off to the side and offline, consolidates memory. It deduplicates, rewrites, prunes. It turns "raw context" into "learned context." Exactly the step I'm missing in my day-to-day practice.

What if the agent had an evening of its own

What comes next I haven't tried — I'm treating it as a thought experiment, not a how-to. Thinking out loud: what if I gave the agent that evening?

Imagine that at the end of the day the agent looks back. Not to summarize everything — to select. It goes through the conversation, the decisions, the notes it gathered during the day. It asks itself: which decision did I second-guess and then come back to? What will I want to know here tomorrow without having to ask?

And it writes that handful of conclusions, structured, into a layer that gets sharper every day. Not a "what happened" log, but a "what holds because of it" distillate. Instead of adding to memory, it would rewrite it. In the morning it then wouldn't start by reading everything — it would start with what yesterday's self deemed essential.

Here's how I picture it: a Stop or SessionEnd hook at the end of the work fires off a separate reflection session. That session gets the day's conversation and the brief "don't write a log, pick three to five conclusions and merge them with what's already in memory." The output rewrites a topic file in auto-memory rather than setting it from scratch. Instead of a hook, a cloud cron via /schedule or /loop could handle it just as well — but the mechanism is beside the point. What's missing is the intent: to separate reflection from summarization.

The trap of an archive of junk

And now the honest objection, so this doesn't come off too tidy. A naive "write everything down" would end up worse than nothing.

You know it from Zettelkasten as the collector's fallacy: having notes isn't knowledge. Knowledge only emerges through processing — rewriting, connecting, distilling. A pile of notes is just a pile. And with an agent there's one extra catch: contradictory rules. When you have both "put the retry in the service" and "put the retry in the controller" in memory, the choice becomes arbitrary and the agent loses its footing. CLAUDE.md has its reason for staying under two hundred lines, as the docs recommend — longer files, by their account, lower adherence. The more you dump into the context, the worse the agent steers by it.

So that evening step must not be "add today to the pile." It has to be consolidation. Forgetting smartly. Merging duplicates, throwing out what didn't pan out, rewriting old conclusions with new ones. A smarter agent doesn't come from a bigger archive, but from better pruning.

For now, just an idea

For now it's just an idea I'm turning over. But it won't leave me alone.

A morning where neither I nor the agent starts from zero. Where the colleague who sits down across from me is one who, in the evening, picked out what from yesterday is worth bringing into today, and starts where we left off.

A question for you: if your agent had an evening of its own, would your morning workflow be better?