The Structured Mishnah Dataset (JSON)

Name: The Structured Mishnah Dataset
Creator: Moshe Kline
License: https://creativecommons.org/licenses/by/4.0/

המשנה כדרכה — Machine-Readable Dataset · 524 Chapters · 4,442 Cells · 2,276 Structural Markers · Free

The complete Hebrew text of the Mishnah as a structured JSON dataset. All six orders, sixty-three tractates, and 524 chapters — encoded as two-dimensional matrices with cell-level position labels, mishnah verse numbers, and structural markers identifying parallels, chiastic patterns, and envelope closures across forty years of compositional analysis.

Download mishnah_db.json

Format	JSON (UTF-8)
File size	~6 MB
Records	524 chapters across 63 tractates
Cells	4,442 with structural position labels
Subdivisions	3,737 within cells
Structural markers	2,276 across 7 marker types
Language	Hebrew text; bilingual metadata
License	CC-BY
Price	Free
Registration	None required

What's in the Dataset

Every chapter of the Mishnah is encoded as a two-dimensional matrix. The text is laid out as rows and columns, with every cell carrying a structural label that records its position in the chapter's compositional architecture. This is the same data that drives the Structured Mishnah's online presentation and the printed PDF — exposed in a form that machines can read directly.

The dataset preserves three kinds of information that no other digital Mishnah edition publishes:

First, the row and column shape of every chapter. A chapter's structural fingerprint is its row-by-row column distribution — for example, Taanit chapter 1 has the shape [[4], [1,2,1], [2,2], [1,2,1], [4]], which is a palindrome around the central row. These shapes make compositional patterns queryable: chapters with envelope structures, chapters with perfect symmetry, chapters with chiastic centers can be located computationally.

Second, the structural markers within cells — horizontal parallels, vertical threads, internal parallels, and envelope closures. These are the literary-architectural features that link cells to one another within and across rows. The dataset includes 2,276 such markers, each tagged by type and carrying its source text.

Third, the cell subdivisions — internal A/B/C divisions within cells where a single column position contains multiple discrete units of content. The dataset captures 3,737 subdivisions across 1,569 cells.

Schema

The file is a single JSON object. The first key is _meta, which contains dataset-level metadata. Every subsequent key identifies a chapter using the format tractate_chapter (lowercase tractate name with underscores; chapter as integer). For example: berakhot_1, shabbat_6, avot_3.

Each chapter object contains the following fields:

Field	Type	Meaning
`tractate_he`	string	Hebrew tractate name (e.g., `תענית`)
`tractate_en`	string	English transliteration (e.g., `Taanit`)
`seder_he`	string	Hebrew order name (e.g., `מועד`)
`seder_en`	string	English order name (e.g., `Moed`)
`chapter_num`	integer	Chapter number
`chapter_he`	string	Hebrew chapter letter (e.g., `א`)
`shape`	array	Row-by-row colspan distribution (the chapter's structural fingerprint)
`column_headers`	array (optional)	Conceptual labels for the columns when the author named them
`rows`	array	The chapter's rows; each row contains a `row_num` and an array of `cells`
`source_url`	string	Canonical URL of the chapter on chaver.com

Each cell contains its structural label, a position object recording row and column index plus colspan, the Hebrew mishnah verse number where applicable, the cell text, the original HTML, and an array of structural markers. Cells with internal subdivisions contain a subdivisions array instead of flat text, with each subdivision carrying its own label, mishnah verse number, text, and markers.

Example: Reading the Dataset

The following Python snippet loads the dataset and prints all chapters whose shape is a perfect palindrome:

import json

with open('mishnah_db.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

for key, chapter in data.items():
    if key == '_meta':
        continue
    shape = chapter['shape']
    if shape == shape[::-1]:
        print(f"{key}: {shape}")

The same approach works for any structural query: chapters with column headers, chapters with envelope closures, chapters whose first and last rows match, chapters with InternalParallel patterns. The structural markers are the queryable surface; forty years of analysis is encoded as data.

What Makes This Dataset Unique

Plain Mishnah text is widely available — Sefaria, the Bar-Ilan Responsa Project, Wikisource, and several other digital editions provide it. None of them publish the structural information that distinguishes this dataset.

The position labels (e.g., 2א, 3בA) record where each cell sits in the chapter's architectural plan — what the cell's row is, what its column is, what subdivision of the column it belongs to. These positions are the result of analyzing every chapter for its internal compositional logic. They are not derivable from the text; they are the analysis of the text.

The structural markers — horizontal parallels, vertical threads, chiastic patterns, internal parallels, envelope closures — record where the author of the Mishnah linked cells to one another through repeated language, parallel construction, or framing devices. These are the literary mechanisms that hold each chapter together as a composition rather than a list.

The column headers — present in chapters where the author named the conceptual axes of the chapter explicitly — record the categories the chapter is organized around. The Sefer Yetzirah triad of nefesh, olam, shanah appears as the columns of Gittin chapter 3. The triad of woman, self-testimony, testimony appears as the columns of Ketubot chapter 2. These are the chapter's own self-description; they have not been published anywhere else.

Citation

If you use this dataset in research, teaching, or downstream tools, please cite:

Kline, Moshe. The Structured Mishnah Dataset. Chaver.com.
<https://chaver.com/Mishnah-New/Hebrew/Text/mishnah-data>

The dataset is released under a Creative Commons Attribution license (CC-BY). Free for personal use, research, teaching, and integration into other tools, including AI and computational systems. Attribution to Moshe Kline and chaver.com is required.

להורדה חינם: בסיס הנתונים של המשנה כדרכה

הטקסט המלא של המשנה בעברית כבסיס נתונים מובנה (JSON). כל שישה סדרים, שישים ושלושה מסכתות ו-524 פרקים — מקודדים כמטריצות דו-ממדיות עם תוויות מיקום ברמת התא, מספרי משנה, וסימוני מבנה המזהים מקבילות, תבניות כיאסטיות וסגירות מעטפת מתוך ארבעים שנות ניתוח קומפוזיציוני.

הנתונים מתאימים לשימוש בכלי מחקר ממוחשבים, מערכות בינה מלאכותית, וכל יישום הזקוק לטקסט המשנה במבנה ספרותי שלו. רישיון: CC-BY. ללא צורך ברישום.

הורדת mishnah_db.json

About the Research

The Structured Mishnah is based on forty years of systematic analysis by Moshe Kline. The methodology was first presented to Professor David Weiss-Halivni in the early 1980s, who indicated that findings would be significant only if they applied to the whole of the Mishnah — a challenge that drove the complete analysis. The Structured Mishnah was accepted for publication by Ben-Gurion University Press following peer review by Professor Daniel Boyarin (UC Berkeley) and Professor Shamma Friedman.

The same structural methodology applied to the Mishnah was subsequently applied to the Torah, producing the Woven Torah hypothesis. Research on the Torah has been published in the Journal of Biblical Literature (2025), SBL Press (2015), and the Journal of Hebrew Scriptures (2008).

Other Formats

Three ways to access the Structured Mishnah:

Read online — Browse the Mishnah Portal, pick a tractate and chapter, and read the structured text in your browser.
Download PDF — The complete Mishnah as a printable, searchable PDF.
Download JSON dataset — This page. Machine-readable for computational research and AI systems.

Introduction to the Structured Mishnah
Torah Units Dataset (JSON & CSV) — companion structural dataset for the Torah
Download the Structured Torah PDF
The Woven Torah Method