Turn Markdown into Typed JSON
with a Single Schema

Define a schema, parse any markdown, get strongly-typed JSON back. Built for RAG pipelines, PDF-to-MD validation, AI Skills, and structured content ingestion.

Open Playground View Getting Started

document()section()match()block()

The Problem

Markdown Is Everywhere, but Parsing It Is a Mess

You get markdown from converters, authors, and imports. But turning it into usable, structured data always ends in fragile custom code.

PDF-to-MD output is unpredictable

Converter tools produce markdown with inconsistent headings, broken tables, and missing structure. Without validation, bad output silently enters your pipeline.

RAG ingestion breaks on unstructured markdown

Chunking raw markdown for vector databases loses context. Without typed extraction, your retrieval quality degrades and you can't trust what's stored.

Every team writes its own parser glue

Remark plugins, regex extraction, Zod schemas stitched together — each project reinvents markdown parsing with a fragile, untested custom layer.

Built and Documented

Production-Grade, Not a Prototype

27Type builders

129API type pages

102Mapped methods

10Interaction guides

6E2E examples

9Runtime tests

Open Playground

Core Capabilities

From Raw Markdown to Structured Data

mdshape handles parsing, validation, and typed extraction in a single runtime — so you stop writing custom glue for every project.

RUNBOOK: Payment Risk Incident

1. OWNER

Name	Alex Turner
Email	alex@zayra.com

2. SEVERITY

Level	P1
Escalation	immediate

# RUNBOOK: Payment Risk Incident

## 1. OWNER
- Name: Alex Turner
- Email: alex@zayra.com

## 2. SEVERITY
- Level: P1
- Escalation: immediate

import { md } from 'mdshape'

const schema = md.document({
  title: md.heading(1),
  owner: md.section('1. OWNER').fields({
    Name: md.string(),
    Email: md.email(),
  }),
  severity: md.section('2. SEVERITY').fields({
    Level: md.string(),
    Escalation: md.string(),
  }),
})

{
  "success": true,
  "data": {
    "title": "RUNBOOK: Payment Risk Incident",
    "owner": {
      "Name": "Alex Turner",
      "Email": "alex@zayra.com"
    },
    "severity": {
      "Level": "P1",
      "Escalation": "immediate"
    }
  }
}

How It Works

Markdown In, Typed JSON Out

Three steps: your markdown, your schema, your structured data. Works with any source — PDF converters, authored docs, imported files.

Your Markdown

From a PDF converter, a content author, an import tool, or any other source.

# RUNBOOK: Payment Risk Incident

## 1. OWNER
- Name: Alex Turner
- Email: alex@zayra.com

## 2. SEVERITY
- Level: P1
- Escalation: immediate

Your Schema

Define the structure you expect. mdshape validates and extracts in one pass.

const schema = md.document({
  title: md.heading(1),
  owner: md.section('1. OWNER').fields({
    Name: md.string(),
    Email: md.email(),
  }),
  severity: md.section('2. SEVERITY').fields({
    Level: md.string(),
    Escalation: md.string(),
  }),
})

Typed JSON

Get structured, strongly-typed output ready for your database, RAG pipeline, or API.

{
  "success": true,
  "data": {
    "title": "RUNBOOK: Payment Risk Incident",
    "owner": {
      "Name": "Alex Turner",
      "Email": "alex@zayra.com"
    },
    "severity": {
      "Level": "P1",
      "Escalation": "immediate"
    }
  }
}

Try It Now

Comparison

Why Not Just Use Remark + Zod?

You can — but you'll write the glue yourself. Here's what you get out of the box with mdshape vs. assembling your own stack.

Capability	mdshape	Zod + remark	Markdoc	Contentlayer	Valibot + custom
Markdown → typed JSON in one call	Native	Custom required	Custom required	Partial	Custom required
Structure validation (heading order, section sequence, field presence)	Native	Custom required	Custom required	Custom required	Custom required
Rich block extraction (tables, mermaid, math, footnotes)	Native	Custom required	Partial	Partial	Custom required
Typed diagnostics with code, path, and line number	Native	Partial	Partial	Partial	Partial
Ready for production without custom integration layer	Native	Custom required	Custom required	Partial	Custom required

Markdown → typed JSON in one call

mdshape: Native
Zod + remark: Custom required
Markdoc: Custom required
Contentlayer: Partial
Valibot + custom: Custom required

Structure validation (heading order, section sequence, field presence)

mdshape: Native
Zod + remark: Custom required
Markdoc: Custom required
Contentlayer: Custom required
Valibot + custom: Custom required

Rich block extraction (tables, mermaid, math, footnotes)

mdshape: Native
Zod + remark: Custom required
Markdoc: Partial
Contentlayer: Partial
Valibot + custom: Custom required

Typed diagnostics with code, path, and line number

mdshape: Native
Zod + remark: Partial
Markdoc: Partial
Contentlayer: Partial
Valibot + custom: Partial

Ready for production without custom integration layer

mdshape: Native
Zod + remark: Custom required
Markdoc: Custom required
Contentlayer: Partial
Valibot + custom: Custom required

Native: Works out of the box, no custom code needed.
Partial: Possible but requires extra work or has gaps.
Custom required: You need to build this yourself.

Based on documented default capabilities as of each tool's latest stable release.

Open Playground

Common Questions

Before You Decide

Can I use it to validate PDF-to-Markdown converter output?

Yes. Define a schema with the structure you expect, run safeParse on the converter output, and get typed diagnostics for every deviation — missing headings, wrong field order, broken tables.

How does it help with RAG pipelines?

Instead of chunking raw markdown and losing context, mdshape extracts structured, typed JSON from your documents. Each field, section, and block becomes a typed entry you can store in your vector database with full context preserved.

I write AI Skills with .md files. Does this help?

Exactly this use case. Define a schema for your skill format — required sections, field order, metadata — and validate every .md file before it enters your agent pipeline. Catch formatting issues at authoring time, not at runtime.

I already use Zod + remark. Why would I switch?

You don't have to drop Zod. mdshape replaces the glue layer — the remark plugins, AST walkers, and custom extraction — with a single call. Zod validates values; mdshape validates and extracts markdown structure natively.

How much code does it take to get started?

A basic schema is under 10 lines. The Playground lets you iterate without installing anything. Most teams go from zero to first validation in under 15 minutes.

What exactly does mdshape do that a markdown parser doesn't?

A parser turns markdown into an AST. That's it — you still need to walk the tree, extract fields, validate structure, and shape the output yourself. mdshape does all of that in one call: you define a schema, it returns typed JSON or typed errors. No AST manipulation.

What happens when the markdown doesn't match the schema?

You get a typed error object with every issue: which field is missing, which section is out of order, the exact line and column number. No generic "parse failed" — every failure is actionable.

Does it modify my markdown?

No. mdshape is read-only. It parses and extracts — it never changes the source markdown. Your files stay portable and untouched.

Is the documentation LLM-friendly?

Yes. We serve a llms.txt file at docs.markschema.com/llms.txt with the full documentation index — pages, API types, guides, and examples — so LLMs and AI agents can discover and reference our docs natively.

Find Us On

Product Hunt GitHub npm

Turn Markdown into Typed JSONwith a Single Schema