Auto-Generating a Table of Contents (TOC)
Intro
Setting up
A Table of Contents (TOC) is crucial for articles with a lot of content because it makes navigation much easier for the reader.
I used to think people manually defined the headings for their TOC, maybe passing a hardcoded array of titles to a component. While that would work, it’s inefficient and incredibly tedious to maintain every time you change a heading. Fortunately, there are great tools for automatically generating a TOC, mdast-util-toc.
Prepping Your Headings.
Before we can generate a TOC, the tools need to know which headings to use and, more importantly, they need a way to link to them. This means every heading in your content needs a unique ID.
Some modern frameworks (like Astro, which I use) might add these IDs automatically, but many don’t. We can ensure our headings are ready by using two handy rehype plugins:
rehype-slug: This plugin automatically converts your heading text into a unique, URL-safe ID and attaches it to the heading element (e.g.,<h2>My Title</h2>becomes<h2 id="my-title">My Title</h2>).rehype-autolink-headings: This is optional, but useful. It wraps the heading text in an anchor tag that points to the ID. This lets users click on the heading itself to copy the link to that section, or it’s needed if you want an anchor icon next to the heading. I set thebehaviorto"append"to add the link icon at the end of the heading.
Here’s how I set up these plugins in my Astro configuration file.
// @ts-check
import { defineConfig } from "astro/config";
import rehypeSlug from "rehype-slug";
import rehypeAutolinkHeadings from "rehype-autolink-headings";
// https://astro.build/config
export default defineConfig({
markdown: {
rehypePlugins: [
rehypeSlug,
[rehypeAutolinkHeadings, { behavior: "append" }],
],
},
});
Just to point out: This snippet shows the Astro configuration file. If you’re on Next.js, Gatsby, or a different static site generator, you’ll need to check your specific documentation for how to integrate rehype plugins.
Once this is set up, every heading in your content will have a unique ID, making them ready to be indexed.
Indexing your headings
Now that our headings have IDs, the next step is to process the raw page content, find those headings, and structure them into a nested list (which is the actual Table of Contents data structure).
This is where the slightly more confusing code comes in. The main goal of this entire script is to take the raw Markdown content and convert it into a clean, easy-to-use JavaScript object structure.
I happened to copy this function from a popular repository 1 , and I’m therefore not really sure what is going on. It works though.
Here’s the function:
// @ts-nocheck
import { function toc(tree: Nodes, options?: Options | null | undefined): ResultGenerate a table of contents from `tree`.
Looks for the first heading matching `options.heading` (case insensitive) and
returns a table of contents (a list) for all following headings.
If no `heading` is specified, creates a table of contents for all headings in
`tree`.
`tree` is not changed.
Links in the list to headings are based on GitHub’s style.
Only top-level headings (those not in blockquotes or lists), are used.
This default behavior can be changed by passing `options.parents`.toc } from "mdast-util-toc";
import { const remark: Processor<Root, undefined, undefined, Root, string>Create a new unified processor that already uses `remark-parse` and
`remark-stringify`.remark } from "remark";
import { function visit<Tree extends Node, Check extends Test>(tree: Tree, check: Check, visitor: BuildVisitor<Tree, Check>, reverse?: boolean | null | undefined): undefined (+1 overload)visit } from "unist-util-visit";
// These are the node types that contain actual readable text
const const textTypes: string[]textTypes = ["text", "emphasis", "strong", "inlineCode"];
// 1. Helper to grab text from a node
function function flattenNode(node: any): stringflattenNode(node: anynode) {
const const p: any[]p = [];
// visit helps us look inside a node structure
visit<any, Test>(tree: any, visitor: BuildVisitor<any, Test>, reverse?: boolean | null | undefined): undefined (+1 overload)visit(node: anynode, (node: anynode) => {
// Only grab text-like content (ignoring links, lists, etc.)
if (!const textTypes: string[]textTypes.Array<string>.includes(searchElement: string, fromIndex?: number): booleanDetermines whether an array includes a certain element, returning true or false as appropriate.includes(node: anynode.type)) return;
const p: any[]p.Array<any>.push(...items: any[]): numberAppends new elements to the end of an array, and returns the new length of the array.push(node: anynode.value);
});
// Stitch the text parts back together
return const p: any[]p.Array<any>.join(separator?: string): stringAdds all the elements of an array into a string, separated by the specified separator string.join(``);
}
// These interfaces define the final, clean output structure
interface Item {
Item.title: stringtitle: string;
Item.url: stringurl: string;
Item.items?: Item[] | undefineditems?: Item[];
}
interface Items {
Items.items?: Item[] | undefineditems?: Item[];
}
// 2. Recursive function to convert the MDAST output into our clean JS object
function function getItems(node: any, current: any): ItemsgetItems(node: anynode, current: anycurrent): Items {
if (!node: anynode) return {};
if (node: anynode.type === "paragraph") {
// If the node is a paragraph, we're likely dealing with the heading text itself.
// We check for links to grab the URL (the #heading-id)
visit<any, Test>(tree: any, visitor: BuildVisitor<any, Test>, reverse?: boolean | null | undefined): undefined (+1 overload)visit(node: anynode, (item: anyitem) => {
if (item: anyitem.type === "link") {
current: anycurrent.url = item: anyitem.url;
current: anycurrent.title = function flattenNode(node: any): stringflattenNode(node: anynode);
}
if (item: anyitem.type === "text") {
current: anycurrent.title = function flattenNode(node: any): stringflattenNode(node: anynode);
}
});
return current: anycurrent;
}
// If it's a list, it means we have nested sub-headings. We recursively process its children.
if (node: anynode.type === "list") {
current: anycurrent.items = node: anynode.children.map((i: anyi) => function getItems(node: any, current: any): ItemsgetItems(i: anyi, {}));
return current: anycurrent;
} else if (node: anynode.type === "listItem") {
// A single heading entry in the list
const const heading: Itemsheading = function getItems(node: any, current: any): ItemsgetItems(node: anynode.children[0], {});
if (node: anynode.children.length > 1) {
// Handle deeply nested lists (e.g., h2 under an h1)
function getItems(node: any, current: any): ItemsgetItems(node: anynode.children[1], const heading: Itemsheading);
}
return const heading: Itemsheading;
}
return {};
}
// 3. The `remark` plugin that ties it all together
const const getToc: () => (node: any, file: any) => voidgetToc = () => (node: anynode, file: anyfile) => {
// 3a. Use the external library to find headings and structure them into an MDAST list
const const table: Resulttable = function toc(tree: Nodes, options?: Options | null | undefined): ResultGenerate a table of contents from `tree`.
Looks for the first heading matching `options.heading` (case insensitive) and
returns a table of contents (a list) for all following headings.
If no `heading` is specified, creates a table of contents for all headings in
`tree`.
`tree` is not changed.
Links in the list to headings are based on GitHub’s style.
Only top-level headings (those not in blockquotes or lists), are used.
This default behavior can be changed by passing `options.parents`.toc(node: anynode);
// 3b. Convert that complex MDAST list into our simple JavaScript object
file: anyfile.data = function getItems(node: any, current: any): ItemsgetItems(const table: Resulttable.map: List | undefinedList representing the generated table of contents, `undefined` if no table
of contents could be created, either because no heading was found or
because no following headings were found.map, {});
};
export type type TableOfContents = ItemsTableOfContents = Items;
// 4. The main entry point you actually call
export default async function function getTableOfContents(content: string): Promise<TableOfContents>getTableOfContents(
content: stringcontent: string
): interface Promise<T>Represents the completion of an asynchronous operationPromise<type TableOfContents = ItemsTableOfContents> {
// Process the content using the custom plugin
const const result: VFileresult = await function remark(): Processor<Root, undefined, undefined, Root, string>Create a new unified processor that already uses `remark-parse` and
`remark-stringify`.remark().Processor<Root, undefined, undefined, Root, string>.use<[], any, void>(plugin: Plugin<[], any, void>, ...parameters: [] | [boolean]): Processor<Root, undefined, undefined, Root, string> (+2 overloads)Configure the processor to use a plugin, a list of usable values, or a
preset.
If the processor is already using a plugin, the previous plugin
configuration is changed based on the options that are passed in.
In other words, the plugin is not added a second time.
> **Note**: `use` cannot be called on *frozen* processors.
> Call the processor first to create a new unfrozen processor.use(const getToc: () => (node: any, file: any) => voidgetToc).Processor<Root, undefined, undefined, Root, string>.process(file?: Compatible | undefined): Promise<VFile> (+1 overload)Process the given file as configured on the processor.
> **Note**: `process` freezes the processor if not already *frozen*.
> **Note**: `process` performs the parse, run, and stringify phases.process(content: stringcontent);
// Return the data we stored inside the file object
return const result: VFileresult.VFile.data: DataPlace to store custom info (default: `{}`).
It’s OK to store custom data directly on the file but moving it to
`data` is recommended.data;
}
I hope the comments (from gemini btw), help make the function easier to understand.
This function basically takes this:
# Heading One
Content here.
## Sub-Heading 1.1
More content.
### Sub-Sub-Heading 1.1.1
Even more content.
## Sub-Heading 1.2
Another section.
# Heading Two
Final content.
and returns:
// The structure is an object with an 'items' array
{
items: [
{
title: "Heading One",
url: "#heading-one",
items: [
{
title: "Sub-Heading 1.1",
url: "#sub-heading-11",
items: [
{
title: "Sub-Sub-Heading 1.1.1",
url: "#sub-sub-heading-111",
},
],
},
{
title: "Sub-Heading 1.2",
url: "#sub-heading-12",
},
],
},
{
title: "Heading Two",
url: "#heading-two",
},
],
}
Creating the Component
Now that we have our clean, structured list of headings an their urls, it’s time to build the reusable TOC component itself. We’ll pass the generated list as a prop.
The key technique here is recursion. Since a TOC can have nested levels (H2s under H1s, H3s under H2s, etc.), the component needs to be able to call itself to render the sub-lists.
// components/toc.tsx
import type { TableOfContents } from "@/lib/toc";
import { cn } from ".@//lib/utils";
import React from "react";
interface TocProps {
toc: TableOfContents;
}
const MAX_LEVELS = 5;
export default function TableOfContents({ toc }: TocProps) {
return <TocTree tree={toc} />;
}
function TocTree({
tree,
level = 1,
activeItem,
}: {
tree: TableOfContents;
level?: number;
activeItem?: string;
}) {
return tree.items?.length && level < MAX_LEVELS ? (
<div
className={cn("m-0 list-none", {
"pl-4 border-l border-[var(--border-color)]": level !== 1,
})}
>
{tree.items.map((item, index) => (
<div key={index} className="mt-0 pt-1">
<a
href={item.url}
onClick={(e) => {
e.preventDefault();
const element = document.getElementById(item.url.slice(1));
if (element) {
const yOffset = -100; // Adjust based on your header height
const y =
element.getBoundingClientRect().top +
window.pageYOffset +
yOffset;
window.scrollTo({ top: y, behavior: "smooth" });
// Update URL without reloadx
window.history.pushState(null, "", item.url);
}
}}
className={cn(
"inline-block text-[13px] text-[var(--text-secondary)] hover:text-[var(--text-primary)]"
)}
>
{item.title}
</a>
{item.items?.length && (
<TocTree tree={item} level={level + 1} activeItem={activeItem} />
)}
</div>
))}
</div>
) : null;
}
and with that we have a working <TOC> component.
Footnotes
-
Stole it from shadcn-ui/taxonomy ↩