Auto-Generating a Table of Contents (TOC)

Intro

Setting up

A Table of Contents (TOC) is crucial for articles with a lot of content because it makes navigation much easier for the reader.

I used to think people manually defined the headings for their TOC, maybe passing a hardcoded array of titles to a component. While that would work, it’s inefficient and incredibly tedious to maintain every time you change a heading. Fortunately, there are great tools for automatically generating a TOC, mdast-util-toc.

Prepping Your Headings.

Before we can generate a TOC, the tools need to know which headings to use and, more importantly, they need a way to link to them. This means every heading in your content needs a unique ID.

Some modern frameworks (like Astro, which I use) might add these IDs automatically, but many don’t. We can ensure our headings are ready by using two handy rehype plugins:

  1. rehype-slug: This plugin automatically converts your heading text into a unique, URL-safe ID and attaches it to the heading element (e.g., <h2>My Title</h2> becomes <h2 id="my-title">My Title</h2>).
  2. rehype-autolink-headings: This is optional, but useful. It wraps the heading text in an anchor tag that points to the ID. This lets users click on the heading itself to copy the link to that section, or it’s needed if you want an anchor icon next to the heading. I set the behavior to "append" to add the link icon at the end of the heading.

Here’s how I set up these plugins in my Astro configuration file.

// @ts-check
import { defineConfig } from "astro/config";

import rehypeSlug from "rehype-slug";
import rehypeAutolinkHeadings from "rehype-autolink-headings";

// https://astro.build/config
export default defineConfig({
  markdown: {
    rehypePlugins: [
      rehypeSlug,
      [rehypeAutolinkHeadings, { behavior: "append" }],
    ],
  },
});

Just to point out: This snippet shows the Astro configuration file. If you’re on Next.js, Gatsby, or a different static site generator, you’ll need to check your specific documentation for how to integrate rehype plugins.

Once this is set up, every heading in your content will have a unique ID, making them ready to be indexed.

Indexing your headings

Now that our headings have IDs, the next step is to process the raw page content, find those headings, and structure them into a nested list (which is the actual Table of Contents data structure).

This is where the slightly more confusing code comes in. The main goal of this entire script is to take the raw Markdown content and convert it into a clean, easy-to-use JavaScript object structure.

I happened to copy this function from a popular repository 1 , and I’m therefore not really sure what is going on. It works though.

Here’s the function:

// @ts-nocheck
import { function toc(tree: Nodes, options?: Options | null | undefined): Result
Generate a table of contents from `tree`. Looks for the first heading matching `options.heading` (case insensitive) and returns a table of contents (a list) for all following headings. If no `heading` is specified, creates a table of contents for all headings in `tree`. `tree` is not changed. Links in the list to headings are based on GitHub’s style. Only top-level headings (those not in blockquotes or lists), are used. This default behavior can be changed by passing `options.parents`.
@paramtree Tree to search and generate from.@paramoptions Configuration (optional).@returnsResults.
toc
} from "mdast-util-toc";
import { const remark: Processor<Root, undefined, undefined, Root, string>
Create a new unified processor that already uses `remark-parse` and `remark-stringify`.
remark
} from "remark";
import { function visit<Tree extends Node, Check extends Test>(tree: Tree, check: Check, visitor: BuildVisitor<Tree, Check>, reverse?: boolean | null | undefined): undefined (+1 overload)visit } from "unist-util-visit"; // These are the node types that contain actual readable text const const textTypes: string[]textTypes = ["text", "emphasis", "strong", "inlineCode"]; // 1. Helper to grab text from a node function function flattenNode(node: any): stringflattenNode(node: anynode) { const const p: any[]p = []; // visit helps us look inside a node structure visit<any, Test>(tree: any, visitor: BuildVisitor<any, Test>, reverse?: boolean | null | undefined): undefined (+1 overload)visit(node: anynode, (node: anynode) => { // Only grab text-like content (ignoring links, lists, etc.) if (!const textTypes: string[]textTypes.Array<string>.includes(searchElement: string, fromIndex?: number): boolean
Determines whether an array includes a certain element, returning true or false as appropriate.
@paramsearchElement The element to search for.@paramfromIndex The position in this array at which to begin searching for searchElement.
includes
(node: anynode.type)) return;
const p: any[]p.Array<any>.push(...items: any[]): number
Appends new elements to the end of an array, and returns the new length of the array.
@paramitems New elements to add to the array.
push
(node: anynode.value);
}); // Stitch the text parts back together return const p: any[]p.Array<any>.join(separator?: string): string
Adds all the elements of an array into a string, separated by the specified separator string.
@paramseparator A string used to separate one element of the array from the next in the resulting string. If omitted, the array elements are separated with a comma.
join
(``);
} // These interfaces define the final, clean output structure interface Item { Item.title: stringtitle: string; Item.url: stringurl: string; Item.items?: Item[] | undefineditems?: Item[]; } interface Items { Items.items?: Item[] | undefineditems?: Item[]; } // 2. Recursive function to convert the MDAST output into our clean JS object function function getItems(node: any, current: any): ItemsgetItems(node: anynode, current: anycurrent): Items { if (!node: anynode) return {}; if (node: anynode.type === "paragraph") { // If the node is a paragraph, we're likely dealing with the heading text itself. // We check for links to grab the URL (the #heading-id) visit<any, Test>(tree: any, visitor: BuildVisitor<any, Test>, reverse?: boolean | null | undefined): undefined (+1 overload)visit(node: anynode, (item: anyitem) => { if (item: anyitem.type === "link") { current: anycurrent.url = item: anyitem.url; current: anycurrent.title = function flattenNode(node: any): stringflattenNode(node: anynode); } if (item: anyitem.type === "text") { current: anycurrent.title = function flattenNode(node: any): stringflattenNode(node: anynode); } }); return current: anycurrent; } // If it's a list, it means we have nested sub-headings. We recursively process its children. if (node: anynode.type === "list") { current: anycurrent.items = node: anynode.children.map((i: anyi) => function getItems(node: any, current: any): ItemsgetItems(i: anyi, {})); return current: anycurrent; } else if (node: anynode.type === "listItem") { // A single heading entry in the list const const heading: Itemsheading = function getItems(node: any, current: any): ItemsgetItems(node: anynode.children[0], {}); if (node: anynode.children.length > 1) { // Handle deeply nested lists (e.g., h2 under an h1) function getItems(node: any, current: any): ItemsgetItems(node: anynode.children[1], const heading: Itemsheading); } return const heading: Itemsheading; } return {}; } // 3. The `remark` plugin that ties it all together const const getToc: () => (node: any, file: any) => voidgetToc = () => (node: anynode, file: anyfile) => { // 3a. Use the external library to find headings and structure them into an MDAST list const const table: Resulttable = function toc(tree: Nodes, options?: Options | null | undefined): Result
Generate a table of contents from `tree`. Looks for the first heading matching `options.heading` (case insensitive) and returns a table of contents (a list) for all following headings. If no `heading` is specified, creates a table of contents for all headings in `tree`. `tree` is not changed. Links in the list to headings are based on GitHub’s style. Only top-level headings (those not in blockquotes or lists), are used. This default behavior can be changed by passing `options.parents`.
@paramtree Tree to search and generate from.@paramoptions Configuration (optional).@returnsResults.
toc
(node: anynode);
// 3b. Convert that complex MDAST list into our simple JavaScript object file: anyfile.data = function getItems(node: any, current: any): ItemsgetItems(const table: Resulttable.map: List | undefined
List representing the generated table of contents, `undefined` if no table of contents could be created, either because no heading was found or because no following headings were found.
map
, {});
}; export type type TableOfContents = ItemsTableOfContents = Items; // 4. The main entry point you actually call export default async function function getTableOfContents(content: string): Promise<TableOfContents>getTableOfContents( content: stringcontent: string ): interface Promise<T>
Represents the completion of an asynchronous operation
Promise
<type TableOfContents = ItemsTableOfContents> {
// Process the content using the custom plugin const const result: VFileresult = await function remark(): Processor<Root, undefined, undefined, Root, string>
Create a new unified processor that already uses `remark-parse` and `remark-stringify`.
remark
().Processor<Root, undefined, undefined, Root, string>.use<[], any, void>(plugin: Plugin<[], any, void>, ...parameters: [] | [boolean]): Processor<Root, undefined, undefined, Root, string> (+2 overloads)
Configure the processor to use a plugin, a list of usable values, or a preset. If the processor is already using a plugin, the previous plugin configuration is changed based on the options that are passed in. In other words, the plugin is not added a second time. > **Note**: `use` cannot be called on *frozen* processors. > Call the processor first to create a new unfrozen processor.
@example There are many ways to pass plugins to `.use()`. This example gives an overview: ```js import {unified} from 'unified' unified() // Plugin with options: .use(pluginA, {x: true, y: true}) // Passing the same plugin again merges configuration (to `{x: true, y: false, z: true}`): .use(pluginA, {y: false, z: true}) // Plugins: .use([pluginB, pluginC]) // Two plugins, the second with options: .use([pluginD, [pluginE, {}]]) // Preset with plugins and settings: .use({plugins: [pluginF, [pluginG, {}]], settings: {position: false}}) // Settings only: .use({settings: {position: false}}) ```@template{Array<unknown>} [Parameters=[]]@template{Node | string | undefined} [Input=undefined]@template[Output=Input]@overload@overload@overload@paramvalue Usable value.@paramparameters Parameters, when a plugin is given as a usable value.@returnsCurrent processor.
use
(const getToc: () => (node: any, file: any) => voidgetToc).Processor<Root, undefined, undefined, Root, string>.process(file?: Compatible | undefined): Promise<VFile> (+1 overload)
Process the given file as configured on the processor. > **Note**: `process` freezes the processor if not already *frozen*. > **Note**: `process` performs the parse, run, and stringify phases.
@overload@overload@paramfile File (optional); typically `string` or `VFile`]; any value accepted as `x` in `new VFile(x)`.@paramdone Callback (optional).@returns Nothing if `done` is given. Otherwise a promise, rejected with a fatal error or resolved with the processed file. The parsed, transformed, and compiled value is available at `file.value` (see note). > **Note**: unified typically compiles by serializing: most > compilers return `string` (or `Uint8Array`). > Some compilers, such as the one configured with > [`rehype-react`][rehype-react], return other values (in this case, a > React tree). > If you’re using a compiler that doesn’t serialize, expect different > result values. > > To register custom results in TypeScript, add them to > {@linkcode CompileResultMap}. [rehype-react]: https://github.com/rehypejs/rehype-react
process
(content: stringcontent);
// Return the data we stored inside the file object return const result: VFileresult.VFile.data: Data
Place to store custom info (default: `{}`). It’s OK to store custom data directly on the file but moving it to `data` is recommended.
@type{Data}
data
;
}

I hope the comments (from gemini btw), help make the function easier to understand.

This function basically takes this:

# Heading One

Content here.

## Sub-Heading 1.1

More content.

### Sub-Sub-Heading 1.1.1

Even more content.

## Sub-Heading 1.2

Another section.

# Heading Two

Final content.

and returns:

// The structure is an object with an 'items' array
{
  items: [
    {
      title: "Heading One",
      url: "#heading-one", 
      items: [
        {
          title: "Sub-Heading 1.1",
          url: "#sub-heading-11",
          items: [
            {
              title: "Sub-Sub-Heading 1.1.1",
              url: "#sub-sub-heading-111",
            },
          ],
        },
        {
          title: "Sub-Heading 1.2",
          url: "#sub-heading-12",
        },
      ],
    },
    {
      title: "Heading Two",
      url: "#heading-two",
    },
  ],
}

Creating the Component

Now that we have our clean, structured list of headings an their urls, it’s time to build the reusable TOC component itself. We’ll pass the generated list as a prop.

The key technique here is recursion. Since a TOC can have nested levels (H2s under H1s, H3s under H2s, etc.), the component needs to be able to call itself to render the sub-lists.

// components/toc.tsx
import type { TableOfContents } from "@/lib/toc";
import { cn } from ".@//lib/utils";
import React from "react";

interface TocProps {
  toc: TableOfContents;
}

const MAX_LEVELS = 5;

export default function TableOfContents({ toc }: TocProps) {
  return <TocTree tree={toc} />;
}

function TocTree({
  tree,
  level = 1,
  activeItem,
}: {
  tree: TableOfContents;
  level?: number;
  activeItem?: string;
}) {
  return tree.items?.length && level < MAX_LEVELS ? (
    <div
      className={cn("m-0 list-none", {
        "pl-4 border-l border-[var(--border-color)]": level !== 1,
      })}
    >
      {tree.items.map((item, index) => (
        <div key={index} className="mt-0 pt-1">
          <a
            href={item.url}
            onClick={(e) => {
              e.preventDefault();
              const element = document.getElementById(item.url.slice(1));
              if (element) {
                const yOffset = -100; // Adjust based on your header height
                const y =
                  element.getBoundingClientRect().top +
                  window.pageYOffset +
                  yOffset;
                window.scrollTo({ top: y, behavior: "smooth" });
                // Update URL without reloadx
                window.history.pushState(null, "", item.url);
              }
            }}
            className={cn(
              "inline-block text-[13px] text-[var(--text-secondary)] hover:text-[var(--text-primary)]"
            )}
          >
            {item.title}
          </a>
          {item.items?.length && (
            <TocTree tree={item} level={level + 1} activeItem={activeItem} />
          )}
        </div>
      ))}
    </div>
  ) : null;
}

and with that we have a working <TOC> component.

Footnotes

  1. Stole it from shadcn-ui/taxonomy

1,185 words

© 2023. All rights reserved.