JSON Format Reference
Complete technical documentation for understanding exported JSON data structures
Quick Navigation
Introduction
This document describes the JSON output format generated by the transcription system. The JSON format provides structured data extracted from historical documents, including transcriptions, translations, and parsed record information.
This technical documentation section will be extended in the future to include additional technical details about the system's data formats, processing pipelines, and integration specifications.
Output Formats
The system generates JSON output in two primary formats:
Single Record JSON
Individual transcription result download format. Each JSON object represents a single processed image/document.
Batch JSON
Array format containing multiple records from an upload. Used when downloading all transcription results for an entire upload at once.
Top-Level Structure
The base JSON structure for a single transcription result:
{
"image_file": "string (filename of the source image)",
"transcription_original": "string (original language transcription)",
"translation_en": "string (English translation)",
"detected_language": "string (e.g., 'Polish', 'Latin', 'German', 'Russian', 'other')",
"parsed_records": <JsonElement> | null,
"metadata": {
"processed_at": "DateTime (ISO format)"
}
}
Field Descriptions
- image_file: The filename of the source image that was transcribed
- transcription_original: Full transcription of the document in its original language
- translation_en: Complete English translation of the transcribed text
- detected_language: Language detected in the source document
- parsed_records: Array of parsed record objects (see Parsed Records Schema below). Omitted if parsing was not performed
- metadata.processed_at: Timestamp when the transcription was completed
Batch Format
When downloading multiple records, the output is an array of the above structure:
[
{
"image_file": "...",
"transcription_original": "...",
"translation_en": "...",
"detected_language": "...",
"parsed_records": [...],
"metadata": { "processed_at": "..." }
},
...
]
Common Fields (All Record Types)
All parsed records include these common fields:
{
"record_type": "birth | marriage | death | notary | land_record | court_record | index | other",
"record_number": "string (omit if null)",
"language_detected": "Polish | Latin | German | Russian | other",
"script": "Latin",
"jurisdiction": {
"parish_or_office": "string (omit if null)",
"village_or_town": "string (omit if null)",
"gmina": "string (omit if null)",
"powiat": "string (omit if null)",
"gubernia_or_wojewodztwo": "string (omit if null)"
},
"dates": {
"record_date": "YYYY-MM-DD | YYYY-MM | YYYY (omit if null)",
"event_date": "YYYY-MM-DD | YYYY-MM | YYYY (omit if null)",
"date_precision": "day | month | year | unknown"
},
"event_place": {
"place_name": "string",
"house_number": "string",
"parish_church": "string"
},
"religion": "Roman Catholic | Greek Catholic | Jewish | Lutheran | other | unknown",
"signatures_or_marks": "string (omit if null)",
"source_excerpt_diplomatic": "string (omit if null)",
"summary_for_indexing": "string (3-6 sentences in English)",
"translation_en_modern": "string (full translation)",
"quality": {
"confidence": 0.0-1.0,
"issues": ["string array"]
},
"notes": {
"missing": ["string array"],
"inference": [
{
"field": "string",
"reason": "string"
}
]
}
}
Common Field Notes
- record_type: Determines which type-specific fields are included
- record_number: Omitted if not present in the source document
- dates: Uses ISO 8601 format with varying precision levels
- quality.confidence: Float value between 0.0 and 1.0 indicating extraction confidence
- notes.inference: Array documenting fields that were inferred rather than explicitly stated
Record Type-Specific Fields
Each record type includes additional fields specific to that type. Click on a record type below to see its structure:
Birth Records (record_type: "birth")
{
"participants": {
"child": {
"given_names": "string (original from record)",
"given_names_local": "string (Polish translation)",
"surname": "string",
"sex": "male | female | unknown",
"legitimacy": "legitimate | illegitimate | unknown",
"birth_order": "integer (omit if null)"
},
"parents_of_child": {
"father": {
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"age_years": "integer",
"occupation_or_status": "string",
"residence": "string"
},
"mother": {
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"maiden_name": "string",
"age_years": "integer",
"residence": "string"
}
},
"godparents": [
{
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"residence": "string"
}
],
"witnesses": [
{
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"age_years": "integer",
"occupation_or_status": "string",
"residence": "string"
}
]
}
}
Note: Birth records never include groom, bride, deceased, marriage_specific, notary, property, or financial sections.
Marriage Records (record_type: "marriage")
{
"participants": {
"groom": {
"given_names": "string (original from record)",
"given_names_local": "string (Polish translation)",
"surname": "string",
"age": { "years": "integer", "approximate": false },
"marital_status": "bachelor | widower | unknown",
"occupation_or_status": "string",
"residence": "string",
"parents": {
"father": {
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"status": "alive | deceased | unknown"
},
"mother": {
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"maiden_name": "string",
"status": "alive | deceased | unknown"
}
}
},
"bride": {
"given_names": "string (original from record)",
"given_names_local": "string (Polish translation)",
"surname": "string",
"maiden_name": "string",
"age": { "years": "integer", "approximate": false },
"marital_status": "single | widow | unknown",
"occupation_or_status": "string",
"residence": "string",
"parents": { /* same structure as groom.parents */ }
},
"witnesses": [ /* same structure as birth witnesses */ ]
},
"marriage_specific": {
"banns_dates": ["YYYY-MM-DD"],
"consents": "string",
"previous_spouses": "string",
"church_or_civil": "church | civil | unknown"
}
}
Note: Marriage records never include child, parents_of_child, godparents, deceased, notary, property, or financial sections.
Death Records (record_type: "death")
{
"participants": {
"deceased": {
"given_names": "string (original from record)",
"given_names_local": "string (Polish translation)",
"surname": "string",
"sex": "male | female | unknown",
"age": { "years": "integer", "approximate": false },
"occupation_or_status": "string",
"residence": "string",
"birthplace": "string",
"parents_or_spouse": "string"
},
"witnesses": [ /* same structure as birth witnesses */ ]
}
}
Note: Death records never include child, parents_of_child, godparents, groom, bride, marriage_specific, notary, property, or financial sections.
Notary Records (record_type: "notary")
Notary records use a different structure for legal documents (contracts, sales, deeds, testaments, powers of attorney):
{
"record_type": "notary",
"document_number": "string (omit if null)",
"document_type": "sale | contract | deed | testament | power_of_attorney | lease | mortgage | other",
"notary": {
"given_names": "string",
"surname": "string",
"title": "string",
"office_location": "string"
},
"parties": [
{
"role": "seller | buyer | grantor | grantee | testator | beneficiary | lessor | lessee | mortgagor | mortgagee | other",
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"residence": "string",
"occupation_or_status": "string"
}
],
"property": {
"description": "string",
"location": "string",
"boundaries": "string",
"area_or_size": "string",
"parcel_number": "string"
},
"financial": {
"transaction_value": "string",
"currency": "string",
"payment_terms": "string",
"fees_or_taxes": "string"
}
}
Note: Notary records never include vital record sections (participants.child, participants.groom, participants.bride, participants.deceased, marriage_specific).
Land Records (record_type: "land_record")
Land records document property transactions (transfers, surveys, mortgages, leases):
{
"record_type": "land_record",
"transaction_type": "sale | inheritance | mortgage | lease | survey | partition | exchange | donation | other",
"parties": [ /* same structure as notary parties */ ],
"property": {
"description": "string",
"location": "string",
"parcel_number": "string",
"boundaries": "string",
"area": "string",
"improvements": "string",
"land_use": "string"
},
"financial": {
"value": "string",
"currency": "string",
"payment_terms": "string",
"encumbrances": "string"
}
}
Note: Land records never include vital record sections or notary object.
Court Records (record_type: "court_record")
Court records document legal proceedings (judgments, petitions, guardianship):
{
"record_type": "court_record",
"case_number": "string (omit if null)",
"court_name": "string",
"case_type": "civil | criminal | inheritance | guardianship | bankruptcy | appeal | other",
"parties": [
{
"role": "plaintiff | defendant | petitioner | respondent | appellant | judge | witness | guardian | ward | executor | heir | creditor | debtor | other",
"given_names": "string",
"given_names_local": "string",
"surname": "string",
"residence": "string",
"occupation_or_status": "string"
}
],
"case_details": {
"subject_matter": "string",
"claims": "string",
"evidence": "string"
},
"decision": {
"outcome": "string",
"terms": "string",
"costs": "string"
},
"property_involved": { "description": "string", "location": "string", "value": "string" },
"financial_amounts": {
"amount_claimed": "string",
"amount_awarded": "string",
"currency": "string"
}
}
Note: Court records never include vital record sections, notary object, or standard property/financial sections (uses property_involved and financial_amounts instead).
Index Records (record_type: "index")
Index records represent index pages listing names and record references:
{
"record_type": "index",
"index_record_type": "birth | marriage | death | mixed | unknown",
"index_entries": [
{
"name": "string (full name as listed in index)",
"record_number": "string | null",
"page": "string | null"
}
],
"summary_for_indexing": "string"
}
Note: Index records use a minimal schema and never include participant, property, or financial sections.
Other Content (record_type: "other")
Other content uses a minimal schema for text that doesn't fit other categories:
{
"record_type": "other",
"transcription_original": "string (full transcription of the original text)",
"translation_en": "string (full translation in modern English)",
"summary_for_indexing": "string (brief summary, 2-4 sentences)"
}
Note: Other records never include structured participant, property, or financial sections.
Entity Coordinates (Conditional)
Entity coordinates are only present when entity tagging is enabled during transcription. The entities field contains an array of detected entities with bounding box coordinates:
{
"entities": [
{
"id": "string (unique identifier, e.g., 'E1', 'E2')",
"text": "string (entity text as it appears in the document)",
"type": "name | surname | place | parish | date | occupation | witness | godparent | relationship | entity",
"box": {
"x_min": 0-1000,
"y_min": 0-1000,
"x_max": 0-1000,
"y_max": 0-1000
}
}
]
}
Important Notes
- Coordinates use a normalized 0-1000 scale (not pixel values) for resolution independence
- The
idfield matches entity identifiers embedded in the transcription text - Entity types correspond to semantic categories detected in the document
- This field is omitted entirely if entity tagging was not enabled
Field Optionality Rules
Always Present Fields
These fields are always included in the output (may be null):
image_file,transcription_original,translation_en,detected_languagemetadata.processed_atparsed_records[].record_type,parsed_records[].language_detectedparsed_records[].summary_for_indexing,parsed_records[].translation_en_modernparsed_records[].quality
Omitted When Null/Empty
These fields are completely omitted from the JSON output if they are null or empty:
parsed_records(entire field if parsing was not performed)record_number,signatures_or_marks,source_excerpt_diplomatic(if not present)- Any jurisdiction sub-fields or date fields (if null)
birth_order,maiden_name,residence,occupation_or_status(if not present)- Array fields that are empty (e.g.,
godparents: []is omitted)
Conditional Based on Record Type
These sections are only included for their respective record types:
- Birth:
participants.child,participants.parents_of_child,participants.godparents - Marriage:
participants.groom,participants.bride,marriage_specific - Death:
participants.deceased - Notary:
notary,property,financial - Land:
property,financial - Court:
case_details,decision,property_involved,financial_amounts - Index:
index_entries
Data Type Conventions
Dates
- Format: ISO 8601 (YYYY-MM-DD, YYYY-MM, or YYYY)
- Precision: Indicated by
date_precisionfield - Examples:
"1847-10-23","1847-10","1847"
Names
- Dual field system:
- •
given_names: Original name as it appears in the record - •
given_names_local: Polish equivalent or localized version - Surnames: Single field, may include gender-specific endings
Confidence Scores
- Type: Float
- Range: 0.0 to 1.0
- Interpretation: Higher values indicate higher confidence in extraction accuracy
Coordinates
- Type: Integer
- Range: 0-1000 (normalized scale)
- Purpose: Resolution-independent bounding box coordinates
- Note: Not pixel values; normalized for different image resolutions
Enumerated Values
Many fields use specific enumerated values. Common enumerations:
record_type: birth, marriage, death, notary, land_record, court_record, index, otherlanguage_detected: Polish, Latin, German, Russian, othersex: male, female, unknownreligion: Roman Catholic, Greek Catholic, Jewish, Lutheran, other, unknownmarital_status: bachelor, widower, single, widow, unknowndate_precision: day, month, year, unknown
Array Structure
The parsed_records field is always an array, even for single-record documents:
- Single record:
[{...}] - Multiple records:
[{...}, {...}, {...}]
This ensures consistent parsing regardless of document content.
Need help understanding your JSON data?