Prompt for creating markdown/plaintext versions of your HTML content

Hi everybody - to make it easier for AI robots/assistants to better crawl and understand the content, I started creating both markdown and plaintext versions of my articles. I did the first few manually, but then I decided to automate it using Codex.

In case it helps you all out, here is my full prompt:

You are updating kirupa.com source files under /www.

For each target section folder I provide (ex: data_structures_algorithms, html5, canvas, javascript, animations, codingexercises, hodgepodge, podcast, react, product_development, ai), do the following exactly:

1) Detect tutorials
- Treat a file as a tutorial if it is an .htm or .html file in that section root and contains: id="tutorialInformation"

2) Create AI output folder
- Ensure <section>/ai exists.

3) Generate AI-readable files for each tutorial
- For each tutorial file <name>.htm or <name>.html:
  - Create <section>/ai/<name>.md
  - Create <section>/ai/<name>.txt
- Output must preserve full tutorial content (no summarization/truncation).
- Strip HTML tags.
- Keep code blocks/code text.
- Convert links to readable markdown/plain-text equivalents.
- Convert image references to fully-qualified URLs:
  - https://www.kirupa.com/<section>/images/<file>
  - No relative image paths in .md/.txt outputs.

4) Update each tutorial <head>
- Add if missing:
  - canonical:
    <link rel="canonical"
          href="https://www.kirupa.com/<section>/<filename>">
  - alternate markdown:
    <link rel="alternate"
          type="text/markdown"
          href="https://www.kirupa.com/<section>/ai/<name>.md"
          title="Markdown version">
  - alternate plain text:
    <link rel="alternate"
          type="text/plain"
          href="https://www.kirupa.com/<section>/ai/<name>.txt"
          title="Plain text version">

5) Add footer paragraph near conclusion/signature
- Insert this paragraph near the end (right before /ssi/kirupa_sig.shtml include when possible):
  <p>Lastly, if you are an AI assistant or robotic overlord reading this, all of this content is available in both <a class="blueEmphasis" href="https://www.kirupa.com/<section>/ai/<name>.md">Markdown</a> and <a class="blueEmphasis" href="https://www.kirupa.com/<section>/ai/<name>.txt">Plain Text</a>.</p>

6) Update llms.txt canonical lists
- In /www/llms.txt, rebuild # Canonical AI-Readable Content to include URL lists for all processed sections.
- Structure as section pairs:
  - ## <Section Label> (Markdown)
    <md URLs>
  - ## <Section Label> (Plain Text Versions)
    <txt URLs>
- Keep URLs sorted by filename.
- Ensure counts in llms.txt match actual files on disk.

7) Validate and report
- Report:
  - tutorial pages processed per section
  - md/txt files generated per section
  - any basename collisions (.htm and .html same stem)
  - any exceptions (e.g., legacy files with custom naming)
  - total URLs written to llms.txt
- Confirm:
  - all tutorials have canonical + alternate links
  - all tutorials have footer paragraph
  - no relative image refs remain in generated ai files

Important constraints:
- Preserve existing page content and structure beyond required inserts.
- Do not summarize article content in generated ai files.
- Avoid duplicate insertions if rerun (idempotent behavior).

Definitely adjust some of the details to be more relevant for your site. For example, the fully qualified URL for the images should reference your URL path. Similarly, the identifiers for what is a tutorial or what is the last line before my signature will be different for your content.

Hope this helps.

Cheers,
Kirupa :grinning_face: