Skip to content

Latest Release: mdshape Core v0.1

Turn Markdown into Typed JSON
with a Single Schema

Define a schema, parse any markdown, get strongly-typed JSON back. Built for RAG pipelines, PDF-to-MD validation, AI Skills, and structured content ingestion.

document()section()match()block()

The Problem

Markdown Is Everywhere, but Parsing It Is a Mess

You get markdown from converters, authors, and imports. But turning it into usable, structured data always ends in fragile custom code.

PDF-to-MD output is unpredictable

Converter tools produce markdown with inconsistent headings, broken tables, and missing structure. Without validation, bad output silently enters your pipeline.

RAG ingestion breaks on unstructured markdown

Chunking raw markdown for vector databases loses context. Without typed extraction, your retrieval quality degrades and you can't trust what's stored.

Every team writes its own parser glue

Remark plugins, regex extraction, Zod schemas stitched together — each project reinvents markdown parsing with a fragile, untested custom layer.

Built and Documented

Production-Grade, Not a Prototype

27Type builders
129API type pages
102Mapped methods
10Interaction guides
6E2E examples
9Runtime tests
1llms.txt ready

Core Capabilities

From Raw Markdown to Structured Data

mdshape handles parsing, validation, and typed extraction in a single runtime — so you stop writing custom glue for every project.

RUNBOOK: Payment Risk Incident

1. OWNER

NameAlex Turner
Emailalex@zayra.com

2. SEVERITY

LevelP1
Escalationimmediate

How It Works

Markdown In, Typed JSON Out

Three steps: your markdown, your schema, your structured data. Works with any source — PDF converters, authored docs, imported files.

Your Markdown

From a PDF converter, a content author, an import tool, or any other source.

runbook.md
# RUNBOOK: Payment Risk Incident

## 1. OWNER
- Name: Alex Turner
- Email: alex@zayra.com

## 2. SEVERITY
- Level: P1
- Escalation: immediate
Your Schema

Define the structure you expect. mdshape validates and extracts in one pass.

schema.ts
const schema = md.document({
  title: md.heading(1),
  owner: md.section('1. OWNER').fields({
    Name: md.string(),
    Email: md.email(),
  }),
  severity: md.section('2. SEVERITY').fields({
    Level: md.string(),
    Escalation: md.string(),
  }),
})
Typed JSON

Get structured, strongly-typed output ready for your database, RAG pipeline, or API.

output.json
{
  "success": true,
  "data": {
    "title": "RUNBOOK: Payment Risk Incident",
    "owner": {
      "Name": "Alex Turner",
      "Email": "alex@zayra.com"
    },
    "severity": {
      "Level": "P1",
      "Escalation": "immediate"
    }
  }
}

Comparison

Why Not Just Use Remark + Zod?

You can — but you'll write the glue yourself. Here's what you get out of the box with mdshape vs. assembling your own stack.

CapabilitymdshapeZod + remarkMarkdocContentlayerValibot + custom
Markdown → typed JSON in one callNativeCustom requiredCustom requiredPartialCustom required
Structure validation (heading order, section sequence, field presence)NativeCustom requiredCustom requiredCustom requiredCustom required
Rich block extraction (tables, mermaid, math, footnotes)NativeCustom requiredPartialPartialCustom required
Typed diagnostics with code, path, and line numberNativePartialPartialPartialPartial
Ready for production without custom integration layerNativeCustom requiredCustom requiredPartialCustom required

Markdown → typed JSON in one call

mdshape
Native
Zod + remark
Custom required
Markdoc
Custom required
Contentlayer
Partial
Valibot + custom
Custom required

Structure validation (heading order, section sequence, field presence)

mdshape
Native
Zod + remark
Custom required
Markdoc
Custom required
Contentlayer
Custom required
Valibot + custom
Custom required

Rich block extraction (tables, mermaid, math, footnotes)

mdshape
Native
Zod + remark
Custom required
Markdoc
Partial
Contentlayer
Partial
Valibot + custom
Custom required

Typed diagnostics with code, path, and line number

mdshape
Native
Zod + remark
Partial
Markdoc
Partial
Contentlayer
Partial
Valibot + custom
Partial

Ready for production without custom integration layer

mdshape
Native
Zod + remark
Custom required
Markdoc
Custom required
Contentlayer
Partial
Valibot + custom
Custom required
  • Native: Works out of the box, no custom code needed.
  • Partial: Possible but requires extra work or has gaps.
  • Custom required: You need to build this yourself.

Based on documented default capabilities as of each tool's latest stable release.

Common Questions

Before You Decide

Yes. Define a schema with the structure you expect, run safeParse on the converter output, and get typed diagnostics for every deviation — missing headings, wrong field order, broken tables.

Instead of chunking raw markdown and losing context, mdshape extracts structured, typed JSON from your documents. Each field, section, and block becomes a typed entry you can store in your vector database with full context preserved.

Exactly this use case. Define a schema for your skill format — required sections, field order, metadata — and validate every .md file before it enters your agent pipeline. Catch formatting issues at authoring time, not at runtime.

You don't have to drop Zod. mdshape replaces the glue layer — the remark plugins, AST walkers, and custom extraction — with a single call. Zod validates values; mdshape validates and extracts markdown structure natively.

A basic schema is under 10 lines. The Playground lets you iterate without installing anything. Most teams go from zero to first validation in under 15 minutes.

A parser turns markdown into an AST. That's it — you still need to walk the tree, extract fields, validate structure, and shape the output yourself. mdshape does all of that in one call: you define a schema, it returns typed JSON or typed errors. No AST manipulation.

You get a typed error object with every issue: which field is missing, which section is out of order, the exact line and column number. No generic "parse failed" — every failure is actionable.

No. mdshape is read-only. It parses and extracts — it never changes the source markdown. Your files stay portable and untouched.

Yes. We serve a llms.txt file at docs.markschema.com/llms.txt with the full documentation index — pages, API types, guides, and examples — so LLMs and AI agents can discover and reference our docs natively.