Summary
AI agents (ChatGPT, Claude, Perplexity) consume your website as raw HTML — including navbars, footers, cookie banners, and ad scripts. This wastes 2–5x more tokens than necessary and degrades the quality of AI-generated answers that cite your content. The solution is serving clean Markdown to AI agents via the llms.txt standard. I built next-markdown-mirror — a free, open-source Next.js library that does this automatically in 3 minutes of setup.
The Problem: AI Agents Are Drowning in Your HTML
When an AI agent visits your page, it gets the same thing a browser does — full HTML. Navigation bars, footers, cookie banners, ad scripts, SVG icons. Everything that looks like "page content" to a human is actually buried under tons of boilerplate.
The result? 2–5x more tokens than necessary. And AI tools citing your site produce lower-quality responses because they're trying to extract signal from noise.
This isn't just a performance issue — it directly affects whether AI tools recommend your content or your competitor's.
The Solution: llms.txt and Markdown Mirrors
The llms.txt standard emerged as a way to tell AI agents where to find clean content on your site. Think robots.txt, but for LLMs.
The challenge is implementation. How do you ensure every page has a Markdown equivalent? Manually? That doesn't scale.
next-markdown-mirror
That's why I built next-markdown-mirror — a Next.js library that automatically:
- Detects AI agents via User-Agent headers
- Converts your pages to Markdown on the fly
- Generates llms.txt automatically
- Extracts JSON-LD metadata
How It Works
The library works as middleware. When a request comes from an AI agent, instead of the standard HTML response, it:
- Internally fetches your page
- Converts HTML to clean Markdown
- Filters out navigation, footers, and other noise
- Returns clean, token-efficient content
3-Minute Setup
pnpm add next-markdown-mirror
Then just three files:
// proxy.ts
import { withMarkdownMirror } from 'next-markdown-mirror/nextjs';
export const proxy = withMarkdownMirror();
// app/md-mirror/[...path]/route.ts
import { createMarkdownHandler } from 'next-markdown-mirror/nextjs';
export const GET = createMarkdownHandler({
baseUrl: process.env.NEXT_PUBLIC_SITE_URL!,
});
// app/llms.txt/route.ts
import { createLlmsTxtHandler } from 'next-markdown-mirror/nextjs';
export const GET = createLlmsTxtHandler({
siteName: 'My Site',
baseUrl: process.env.NEXT_PUBLIC_SITE_URL!,
pages: [
{ url: '/', title: 'Home' },
{ url: '/blog', title: 'Blog' },
],
});
That's it. No Cloudflare configuration, no monthly fee.
Why Not Cloudflare?
Cloudflare offers automatic Markdown conversion — but it requires the Pro plan at $20/month per domain ($240/year). For 5 domains, that's $1,200/year.
next-markdown-mirror is free and open source. You self-host it, you have full control over filtering and formatting.
5 Most Common Mistakes When Optimizing for AI Agents
-
Ignoring AI traffic altogether. Many developers don't realize that AI agents already crawl their sites. Check your server logs — you'll likely find requests from GPTBot, ClaudeBot, or PerplexityBot.
-
Blocking AI crawlers in robots.txt. Some site owners block all AI bots out of reflex. This means your content never appears in AI-generated answers — and your competitors' content does instead.
-
Serving the same HTML to everyone. AI agents don't need your JavaScript bundles, navigation menus, or cookie consent dialogs. Serving them the same response as browsers wastes tokens and degrades answer quality.
-
Creating llms.txt manually and forgetting to update it. A stale llms.txt that links to pages that no longer exist is worse than no llms.txt at all. Automate its generation.
-
Over-filtering content. When converting to Markdown, some solutions strip too aggressively — removing tables, code blocks, or structured data that AI agents actually need. Keep the content, remove the chrome.
Why This Matters
AI is becoming the primary way people consume information from the web. If your site isn't optimized for AI agents, you're losing visibility in responses from ChatGPT, Claude, Perplexity, and other tools.
AI SEO isn't a buzzword — it's a real shift in how content gets distributed. And preparing for it costs three files and zero dollars.
FAQ
What is llms.txt?
llms.txt is a proposed standard (similar to robots.txt) that tells AI agents where to find clean, machine-readable versions of your pages. It acts as a directory of your content in a format optimized for large language models.
Does next-markdown-mirror work with any Next.js version?
The library is designed for Next.js 13+ with the App Router. It uses middleware and route handlers, both of which are App Router features.
Will this affect my regular website visitors?
No. The middleware only activates for requests identified as coming from AI agents (via User-Agent detection). Regular browser visitors see your normal HTML pages with no changes.
How does this differ from Cloudflare's AI-ready feature?
Cloudflare's solution requires a Pro plan ($20/month per domain) and runs on their infrastructure. next-markdown-mirror is free, open-source, and runs directly in your Next.js app — giving you full control over content filtering and formatting.
Do I need to update anything when I add new pages?
The Markdown conversion is automatic for all pages — no updates needed. For llms.txt, you define which pages to list. You can either maintain the list manually or generate it dynamically from your sitemap or CMS.
Can AI agents still access my site without llms.txt?
Yes, AI agents will crawl your HTML regardless. But without llms.txt and Markdown mirrors, they waste tokens on boilerplate and may produce lower-quality citations of your content.
Find the project on GitHub. Stars, issues, and PRs are all welcome.