Allen Yang

Wednesday, 17 June 2026 01:26

How to Convert Word to JSON in Python (DOCX to JSON)

Converting Word documents to JSON in Python

Converting Word documents to JSON is a common requirement when building automated document processing pipelines, feeding content into AI models, or migrating structured data from DOCX files into databases and APIs. Unlike CSV or XML, JSON provides a flexible, hierarchical format that can represent paragraphs, tables, and nested document structures in a single output.

However, Word files do not have a native JSON export format. A .docx file is a rich-text document composed of sections, paragraphs, styles, and tables—not a structured data source. Converting it to JSON requires deciding how to map that content into a meaningful schema.

This tutorial demonstrates how to convert Word to JSON in Python using Spire.Doc for Python. You will learn three progressively advanced methods: extracting plain paragraph text, converting Word tables to JSON arrays, and preserving the full document structure—including headings, paragraphs, and tables—in a hierarchical JSON output. The examples in this tutorial work with both DOCX and legacy DOC files supported by Spire.Doc.

Quick Navigation

How Is Word Converted into JSON?
Install the Required Library
Method 1 – Convert Word Text to JSON
Method 2 – Convert Word Tables to JSON
Method 3 – Preserve Document Structure in JSON
When to Use Word to JSON Conversion
Limitations and Best Practices
FAQ
Conclusion

1. How Is Word Converted into JSON?

A Word document is a rich-text format organized into sections, paragraphs, and tables—not a structured data format. When you convert Word to JSON, there is no single standard for how the content should be represented. The right schema depends on how the JSON will be used:

Goal	Recommended Schema	Key Characteristics
AI embedding / semantic search	Paragraph array	Flat list of text strings, one per paragraph
Full-text search indexing	Text blocks with metadata	Paragraphs with section index and style info
Database import from tables	Table row objects	Header-keyed dictionaries, one per row
RAG pipeline / knowledge base	Hierarchical structure	Nested sections with headings, paragraphs, and tables
Document archival / interchange	Full document model	Sections, styles, metadata, and all content types

For example, a Word document containing a heading and a paragraph could be represented in JSON as:

{
  "document": [
    {"type": "heading", "level": 1, "text": "Project Overview"},
    {"type": "paragraph", "text": "This report summarizes the quarterly results."}
  ]
}

The three methods in this tutorial correspond directly to these schema choices:

Method 1 produces a paragraph array (AI embedding, search indexing)
Method 2 produces table row objects (database import, data extraction)
Method 3 produces a hierarchical structure (RAG, knowledge base, document understanding)

Choose the method that matches your goal, or combine elements from multiple methods to build a custom schema.

2. Install the Required Library

This tutorial uses Spire.Doc for Python to read and parse DOC/DOCX files. Install it via pip:

pip install spire.doc

Alternatively, you can download Spire.Doc for Python and integrate it manually.

After installation, import the library in your Python script:

from spire.doc import Document, FileFormat
from spire.doc.common import *

Spire.Doc provides APIs to load Word documents, iterate through sections, paragraphs, and tables, and extract text content—everything needed to build a Word-to-JSON pipeline.

3. Method 1 – Convert Word Text to JSON

The simplest way to convert Word to JSON is to extract all paragraph text from the document and store it in a JSON array. This approach works well when you need the full text content without structural metadata—such as for full-text search, AI text embedding, or simple content export.

3.1 Read Paragraphs from a Word Document

Spire.Doc represents a Word document as a collection of Sections, each containing Paragraphs. To extract all text, you iterate through every section and every paragraph within it.

from spire.doc import Document
from spire.doc.common import *

input_file = "ProjectReport.docx"

document = Document()
document.LoadFromFile(input_file)

paragraphs = []
for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    for j in range(section.Paragraphs.Count):
        paragraph = section.Paragraphs.get_Item(j)
        text = paragraph.Text
        if text.strip():
            paragraphs.append(text)

document.Close()

Each paragraph's .Text property returns the plain text content, stripping away formatting. The if text.strip() check filters out empty paragraphs that exist as spacing or layout elements in Word.

3.2 Serialize the Extracted Text to JSON

Assuming the paragraph data extracted in the previous step is stored in the paragraphs list, you can serialize it to JSON and save it to a file as follows:

import json

output_file = "paragraphs.json"

result = {
    "source": input_file,
    "paragraph_count": len(paragraphs),
    "paragraphs": paragraphs
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows the structure of the generated output file:

{
  "source": "ProjectReport.docx",
  "paragraph_count": 3,
  "paragraphs": [
    "Quarterly Sales Report",
    "This document provides an overview of sales performance across all regions."
  ]
}

Conversion Result

The image below shows the source Word document and the JSON file generated after extracting paragraph text.

Word to JSON conversion result - paragraph extraction

3.3 Explanation

Why iterate through Sections and Paragraphs instead of extracting all text at once? Because Word documents are organized hierarchically. A document contains one or more sections (each with its own page layout), and each section contains paragraphs. Iterating at this level gives you control over which content to include or skip—such as filtering empty paragraphs or limiting extraction to specific sections.

Storing paragraphs as a JSON array is the most straightforward structure. Each element is a string, making the output easy to consume in downstream systems. This approach is well-suited for:

Full-text indexing – feed paragraph text into search engines like Elasticsearch
AI text embedding – convert paragraphs into vector representations for semantic search
Simple content export – extract readable text from Word files without formatting

However, this method loses structural information. Headings, body text, and list items are all treated the same way. If you need to distinguish between them, see Method 3.

If your goal is simply to extract text content from Word documents without converting it to JSON, you may also be interested in our guide on extracting text from Word documents in Python.

4. Method 2 – Convert Word Tables to JSON

In many Word documents—reports, invoices, product lists, configuration tables—the most valuable content lives inside tables, not in paragraphs. Converting Word tables to JSON allows you to extract structured row-and-column data that can be directly loaded into databases, APIs, or data analysis tools.

Why Tables Need Special Handling

Tables in Word are stored as a grid of rows and cells, where each cell contains its own paragraphs. Unlike paragraph text, table data has an inherent two-dimensional structure that maps naturally to JSON objects. The first row often contains column headers, and subsequent rows contain data records.

Extracting Tables from a Word Document

The following code reads all tables from a Word document, uses the first row as column headers, and converts each subsequent row into a JSON object:

import json
from spire.doc import Document
from spire.doc.common import *

input_file = "SalesData.docx"
output_file = "tables.json"

document = Document()
document.LoadFromFile(input_file)

all_tables = []

for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    for t in range(section.Tables.Count):
        table = section.Tables.get_Item(t)
        rows_data = []

        if table.Rows.Count < 2:
            continue

        header_row = table.Rows[0]
        headers = []
        for c in range(header_row.Cells.Count):
            cell_text = header_row.Cells[c].Paragraphs[0].Text.strip()
            headers.append(cell_text)

        for r in range(1, table.Rows.Count):
            row = table.Rows[r]
            row_dict = {}
            for c in range(row.Cells.Count):
                cell_text = row.Cells[c].Paragraphs[0].Text.strip()
                row_dict[headers[c] if c < len(headers) else f"Column_{c}"] = cell_text
            rows_data.append(row_dict)

        all_tables.append({
            "table_index": t,
            "headers": headers,
            "row_count": len(rows_data),
            "rows": rows_data
        })

document.Close()

result = {
    "source": input_file,
    "table_count": len(all_tables),
    "tables": all_tables
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows the structure of the generated output file, with each table row mapped to a JSON object using the header row as keys:

{
  "source": "SalesData.docx",
  "table_count": 1,
  "tables": [
    {
      "table_index": 0,
      "headers": ["Region", "Product", "Units Sold", "Revenue"],
      "row_count": 3,
      "rows": [
        {"Region": "North", "Product": "Laptop", "Units Sold": "120", "Revenue": "114000"},
        {"Region": "South", "Product": "Laptop", "Units Sold": "80", "Revenue": "76000"}
      ]
    }
  ]
}

Conversion Result

The image below demonstrates how table data from a Word document is converted into structured JSON records.

Word to JSON conversion result - table extraction

Explanation

The code treats the first row as a header row and maps each cell in subsequent rows to the corresponding header key. This produces a JSON array of objects, which is the most common and useful format for tabular data.

Key considerations:

table.Rows.Count < 2 skips tables that have only a header row or are empty
row.Cells[c].Paragraphs[0].Text extracts text from the first paragraph in each cell. For simplicity, the example reads only the first paragraph. If a cell contains multiple paragraphs, iterate through the entire Paragraphs collection and concatenate the results:

cell_text = "\n".join(
    row.Cells[c].Paragraphs[p].Text.strip()
    for p in range(row.Cells[c].Paragraphs.Count)
    if row.Cells[c].Paragraphs[p].Text.strip()
)

headers[c] if c < len(headers) else f"Column_{c}" handles cases where a data row has more cells than the header row

This method is ideal for extracting structured data from reports, invoices, product catalogs, and configuration tables stored in Word documents. The resulting JSON can be directly loaded into databases, used in web APIs, or processed by data analysis tools.

If you need to generate Word documents from structured JSON data, see our tutorial on converting JSON to Word in Python, which covers creating Word content and tables directly from JSON objects and arrays.

5. Method 3 – Preserve Document Structure in JSON

Methods 1 and 2 treat paragraphs and tables as separate, isolated elements. In practice, Word documents have a meaningful hierarchy: headings introduce sections, paragraphs provide detail, and tables present structured data within a specific context.

Preserving this hierarchy in JSON produces output that is far more useful for knowledge base construction, RAG (Retrieval-Augmented Generation) pipelines, and document understanding systems. Instead of a flat list of text, you get a structured representation that maintains the logical flow of the original document.

How to Preserve Headings, Paragraphs, and Tables in a Hierarchical JSON Structure

The approach is to iterate through all child objects in each section's body, determine the type of each object (paragraph or table), and build a structured JSON representation accordingly. For paragraphs, you can detect headings by checking the StyleName property.

import json
from spire.doc import Document
from spire.doc.common import *

input_file = "ProjectReport.docx"
output_file = "structured_output.json"

HEADING_STYLES = {
    "Heading1": 1,
    "Heading2": 2,
    "Heading3": 3,
    "Heading4": 4,
}

def get_heading_level(style_name):
    return HEADING_STYLES.get(style_name, None)

def extract_table_data(table):
    rows_data = []
    if table.Rows.Count < 1:
        return {"headers": [], "rows": []}

    header_row = table.Rows[0]
    headers = []
    for c in range(header_row.Cells.Count):
        headers.append(header_row.Cells[c].Paragraphs[0].Text.strip())

    for r in range(1, table.Rows.Count):
        row = table.Rows[r]
        row_dict = {}
        for c in range(row.Cells.Count):
            cell_text = row.Cells[c].Paragraphs[0].Text.strip()
            row_dict[headers[c] if c < len(headers) else f"Column_{c}"] = cell_text
        rows_data.append(row_dict)

    return {"headers": headers, "rows": rows_data}

document = Document()
document.LoadFromFile(input_file)

sections_data = []

for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    content_items = []

    for j in range(section.Body.ChildObjects.Count):
        obj = section.Body.ChildObjects.get_Item(j)

        if isinstance(obj, Paragraph):
            text = obj.Text.strip()
            if not text:
                continue

            heading_level = get_heading_level(obj.StyleName)
            if heading_level:
                content_items.append({
                    "type": "heading",
                    "level": heading_level,
                    "text": text
                })
            else:
                content_items.append({
                    "type": "paragraph",
                    "text": text
                })

        elif isinstance(obj, Table):
            table_data = extract_table_data(obj)
            content_items.append({
                "type": "table",
                "row_count": len(table_data["rows"]),
                "data": table_data
            })

    sections_data.append({
        "section_index": i,
        "content": content_items
    })

document.Close()

result = {
    "source": input_file,
    "section_count": len(sections_data),
    "sections": sections_data
}

with open(output_file, "w", encoding="utf-8") as f:
    json.dump(result, f, indent=2, ensure_ascii=False)

Output Example

The following JSON snippet shows how headings, paragraphs, and tables are represented in the hierarchical output structure:

{
  "source": "ProjectReport.docx",
  "section_count": 1,
  "sections": [
    {
      "section_index": 0,
      "content": [
        {
          "type": "heading",
          "level": 1,
          "text": "Quarterly Sales Report"
        },
        {
          "type": "paragraph",
          "text": "This report provides an overview of sales performance across all regions."
        },
        {
          "type": "heading",
          "level": 2,
          "text": "Regional Breakdown"
        },
        {
          "type": "table",
          "row_count": 3,
          "data": {
            "headers": ["Region", "Product", "Units Sold", "Revenue"],
            "rows": [
              {"Region": "North", "Product": "Laptop", "Units Sold": "120", "Revenue": "114000"}
            ]
          }
        }
      ]
    }
  ]
}

Conversion Result

The image below illustrates how headings, paragraphs, and tables are preserved in a hierarchical JSON structure.

Word to JSON conversion result - hierarchical structure

Explanation

This method differs from the previous two in a fundamental way: it uses section.Body.ChildObjects to iterate through all content elements in document order, rather than separately iterating paragraphs and tables. This preserves the original sequence and interleaving of headings, paragraphs, and tables.

Key design decisions:

Heading detection via StyleName – Word headings are paragraphs styled with "Heading1", "Heading2", etc. Checking the style name allows you to distinguish headings from body text and record the heading level. Note that the exact heading style names may vary depending on the Word template or language settings (e.g., "Heading 1" with a space, or localized names like "标题 1" in Chinese). To handle these variations, normalize the style name before lookup:

def get_heading_level(style_name):
    normalized = style_name.lower().replace(" ", "")
    heading_map = {"heading1": 1, "heading2": 2, "heading3": 3, "heading4": 4}
    return heading_map.get(normalized, None)

ChildObjects iteration – Unlike section.Paragraphs (which only returns paragraphs) or section.Tables (which only returns tables), ChildObjects returns all elements in their original order. This is essential for preserving the document's logical structure.
Structured JSON output – Each content item includes a type field (heading, paragraph, or table), making it easy for downstream systems to process different content types appropriately.

This approach is particularly valuable for:

RAG and AI pipelines – the heading structure enables chunking documents by section, improving retrieval accuracy
Knowledge base construction – hierarchical JSON maps directly to tree-structured knowledge graphs
Document understanding – preserving the relationship between headings and their associated content allows semantic analysis of document sections

If you need to extract specific content types from Word documents, such as headings, paragraphs, or tables, see our tutorial on reading Word documents in Python, which covers content extraction techniques in more detail.

6. When to Use Word to JSON Conversion

Word to JSON conversion is useful in any scenario where structured data needs to be extracted from Word documents at scale. Common use cases include:

AI and RAG document processing – Convert Word documents into JSON chunks for embedding and retrieval in LLM-based applications. The hierarchical structure from Method 3 enables section-level chunking, which produces better retrieval results than flat text splitting.
Knowledge base construction – Build structured knowledge bases from technical documentation, policy documents, or manuals stored as .docx files.
Batch data extraction – Extract data from hundreds of Word reports, invoices, or forms and load the results into a database or data warehouse.
Contract and resume parsing – Convert legal contracts, HR documents, or resumes into structured JSON for automated analysis and comparison.
API and web application data exchange – Serve Word document content through REST APIs as JSON, enabling web and mobile applications to consume document data without handling .docx files directly.

7. Limitations and Best Practices

Limitations

No standard JSON schema for Word – Unlike CSV or XML, there is no universally accepted format for representing Word content in JSON. The structure you choose must be designed for your specific use case.
Complex formatting is not captured – The methods in this tutorial extract text content and basic structural metadata (heading levels, table data). They do not capture fonts, colors, images, page layout, headers/footers, or footnotes. If your application requires these elements, additional extraction logic is needed.
Merged table cells require special handling – Word tables can contain merged cells (both horizontal and vertical). The simple row-by-row extraction in Method 2 assumes a regular grid. Documents with merged cells may produce unexpected results.
Large documents may need chunked processing – For documents with hundreds of pages or dozens of tables, consider processing sections or tables individually to manage memory usage.

Best Practices

Design your JSON schema before writing code – Decide what you need (text only? headings? tables? full structure?) and choose the appropriate extraction method.
Validate output against sample documents – Word documents vary widely in structure and formatting. Test your conversion logic against representative samples from your actual document set.
Handle encoding explicitly – Always specify encoding="utf-8" when writing JSON files to avoid character encoding issues with non-ASCII text.
Use ensure_ascii=False in json.dump – This preserves Unicode characters in the output rather than escaping them, which is important for documents containing non-English text.

8. FAQ

Can I convert DOCX to JSON in Python?

Yes. Using Spire.Doc for Python, you can load any .docx file, iterate through its sections, paragraphs, and tables, and serialize the extracted content to JSON using Python's built-in json module. This tutorial demonstrates three methods for doing so, from simple text extraction to full structural preservation.

What is the best Word to JSON converter for developers?

For developers who need batch processing, automation, or custom JSON schemas, a Python-based approach using Spire.Doc is more flexible than online converters. Online tools work for one-off conversions but cannot handle large-scale processing, custom output formats, or integration into automated pipelines.

Can I convert Word tables to JSON?

Yes. By iterating through the tables in a Word document and extracting cell text row by row, you can convert table data into a JSON array of objects. Method 2 in this tutorial demonstrates this with header-based key mapping.

Does Word have a native JSON export option?

No. Microsoft Word does not provide a built-in JSON export format. Word files can be saved as DOCX, PDF, HTML, RTF, and plain text, but converting to JSON requires a programmatic approach that reads the document structure and maps it to a JSON schema.

Can I preserve headings and structure when converting Word to JSON?

Yes. By iterating through all child objects in each section's body and checking paragraph style names, you can detect headings, body paragraphs, and tables, then build a hierarchical JSON structure that preserves the document's logical organization. Method 3 in this tutorial provides a complete implementation.

Can I convert Word to JSON online?

Yes, there are online Word to JSON converters that can handle one-off conversions. However, online tools are limited to single-file processing and do not allow customization of the JSON schema. For batch processing, automated pipelines, or custom output structures, a Python-based approach using Spire.Doc is more practical and scalable.

9. Conclusion

In this article, we demonstrated how to convert Word documents to JSON in Python using Spire.Doc for Python. We covered three methods of increasing complexity: extracting paragraph text as a flat JSON array, converting Word tables to structured JSON objects, and preserving the full document hierarchy—including headings, paragraphs, and tables—in a single JSON output.

Each method serves a different purpose. Plain text extraction works for indexing and embedding. Table extraction is ideal for data migration and report parsing. Full structural preservation enables knowledge base construction and RAG pipelines. Choose the approach that matches your requirements, and extend the JSON schema as needed for your specific use case.

Spire.Doc for Python provides comprehensive Word document processing capabilities beyond JSON conversion, including document creation, formatting, mail merge, and format conversion. You can apply for a 30-day free license to evaluate all features.

Published in Conversion

Tagged under

doc Python Conversion

Friday, 12 June 2026 08:48

How to Convert JSON to Word in Python (JSON to DOCX)

Convert JSON data to Word documents in Python

JSON is one of the most common formats for exchanging structured data between applications, APIs, and databases. In many business scenarios, however, JSON data needs to be transformed into human-readable Word documents such as reports, invoices, summaries, contracts, or exported records.

Converting JSON to Word is not a simple file format conversion. JSON has no inherent Word structure, so the process requires parsing the JSON data and mapping its elements to appropriate Word document components such as paragraphs, tables, and headings.

This article demonstrates how to convert JSON data into Word documents in Python using Spire.Doc for Python. We'll cover multiple approaches, including exporting JSON as formatted text, creating Word tables from JSON arrays, and generating structured reports from nested JSON data.

Content Overview

Understanding JSON-to-Word Conversion
Install Spire.Doc for Python
Method 1: Convert JSON to Word as Formatted Text
Method 2: Convert JSON Arrays to Word Tables
Method 3: Generate Structured Word Reports from JSON
Handle Nested JSON Objects
Handle Missing or Optional Fields
Convert JSON Files to Word Documents
Why Use Spire.Doc for JSON-to-Word Conversion
FAQ
Conclusion

1. Understanding JSON-to-Word Conversion

JSON and Word documents serve fundamentally different purposes. JSON is a structured data format designed for data exchange and machine processing, while Word documents are intended for human consumption with rich formatting, visual hierarchy, and page layout.

As a result, converting JSON to Word is not a direct format transformation. The JSON data must first be parsed and mapped to appropriate document elements before a Word document can be generated.

The conversion process typically follows this workflow:

JSON Data
      ↓
Parse JSON (json.loads)
      ↓
Map Data Structure
      ↓
Spire.Doc for Python
      ↓
Paragraphs / Tables / Headings
      ↓
DOCX Document

In Python, the built-in json module is commonly used to parse JSON data, while Spire.Doc for Python handles document generation. After the JSON structure is analyzed and mapped, Spire.Doc can create paragraphs, tables, headings, images, and other Word elements programmatically, producing a fully formatted DOCX document.

The table below shows common mappings between JSON structures and Word elements:

JSON Structure	Word Element	Example
Key-Value Pair	Paragraph	`"Name": "John"` → `Name: John`
Array	Table	`[{...}, {...}]` → rows and columns
Object	Section	Nested object → grouped content
Title Field	Heading	`"title": "Report"` → Heading 1
URL/Image Path	Image	`"logo": "img.png"` → embedded image

Understanding these mappings is important because the same JSON data can be presented in different ways depending on the document's purpose. For example, simple key-value data may be exported as paragraphs, while collections of records are usually easier to read when rendered as tables. With Spire.Doc for Python, these mappings can be implemented programmatically to generate professional Word documents from structured JSON data.

2. Install Spire.Doc for Python

Before converting JSON to Word, you need to install Spire.Doc for Python in your development environment.

Install via pip (Recommended)

pip install spire.doc

Alternatively, you can download Spire.Doc for Python and integrate it manually.

After installation, import the library in your project:

from spire.doc import *
from spire.doc.common import *

3. Method 1: Convert JSON to Word as Formatted Text

This method is the simplest approach for converting JSON to Word. It works well for API responses, configuration files, and simple JSON exports where each key-value pair maps to a paragraph.

Sample JSON

{
  "Name": "John Smith",
  "Department": "Sales",
  "Country": "USA"
}

Python Code

import json
from spire.doc import Document, FileFormat, HorizontalAlignment

json_data = '{"Name": "John Smith", "Department": "Sales", "Country": "USA"}'
data = json.loads(json_data)

document = Document()
section = document.AddSection()

for key, value in data.items():
    paragraph = section.AddParagraph()
    text_range = paragraph.AppendText(f"{key}: {value}")
    text_range.CharacterFormat.FontSize = 12
    paragraph.Format.AfterSpacing = 6

document.SaveToFile("json_to_text.docx", FileFormat.Docx)
document.Close()

Output

The following Word document shows how JSON key-value pairs can be converted into formatted paragraphs.

JSON key-value pairs converted to Word paragraphs

When to Use This Approach

This method is best suited for:

Simple key-value JSON objects
API response exports
Configuration file documentation
Quick data snapshots

It is not ideal for large datasets or tabular data, where Method 2 (tables) provides better readability.

If your goal is to analyze, filter, or manipulate structured JSON data in a spreadsheet, you may also be interested in our guide on converting JSON to Excel in Python.

4. Method 2: Convert JSON Arrays to Word Tables

When JSON data contains arrays of objects, tables provide the most effective way to present the data in a Word document. This is the most common scenario for converting JSON to Word, as many APIs and databases return data as JSON arrays.

Sample JSON

[
  {"Product": "Laptop", "Price": 1200, "Stock": 45},
  {"Product": "Mouse", "Price": 30, "Stock": 200},
  {"Product": "Keyboard", "Price": 85, "Stock": 120}
]

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color
)

json_data = '''[
  {"Product": "Laptop", "Price": 1200, "Stock": 45},
  {"Product": "Mouse", "Price": 30, "Stock": 200},
  {"Product": "Keyboard", "Price": 85, "Stock": 120}
]'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

if data:
    headers = list(data[0].keys())
    table = section.AddTable(True)
    table.ResetCells(len(data) + 1, len(headers))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(headers):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True
        text_range.CharacterFormat.FontSize = 12

    for row_index, record in enumerate(data):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(headers):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            text_range = paragraph.AppendText(str(record.get(key, "")))
            text_range.CharacterFormat.FontSize = 11

document.SaveToFile("json_to_table.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the generated Word table created from the JSON array.

JSON array converted to Word table

Why Use Tables for JSON Arrays

Tables are the natural fit for JSON array data because:

Each JSON object maps to a table row
Each key maps to a column header
Data is aligned for easy scanning and comparison
Tables are the standard format for reports, inventory lists, and exported database records

Enhancing JSON Tables with Formatting

Unlike plain text exports, Spire.Doc allows JSON data to be rendered as professionally formatted Word tables. Beyond basic table creation, you can apply:

Table styles – Use DefaultTableStyle or ApplyStyle for consistent, polished table appearances
Borders and shading – Control cell borders, background colors, and alternating row colors
Alignment – Set horizontal and vertical alignment at the cell, row, or table level
Custom formatting – Apply font size, bold, and color to individual cells or ranges
Auto-fit behavior – Use AutoFit to adjust column widths to content or window size

These formatting capabilities transform raw JSON data into professional report layouts suitable for business documents, client deliverables, and automated reporting pipelines.

If you need to create more sophisticated Word tables, such as merged cells, custom table layouts, or advanced formatting, see our guide on creating and formatting tables in Word documents using Python.

5. Method 3: Generate Structured Word Reports from JSON

Real-world JSON data often contains a mix of metadata, summary text, and tabular data. This method combines headings, paragraphs, and tables to generate a complete structured Word report from JSON.

Sample JSON

{
  "title": "Monthly Sales Report",
  "period": "June 2026",
  "summary": "Total revenue reached $580,000 this month, representing a 12% increase over the previous period. All regions showed positive growth.",
  "sales": [
    {"Region": "North", "Revenue": 150000, "Units": 320},
    {"Region": "South", "Revenue": 120000, "Units": 280},
    {"Region": "East", "Revenue": 180000, "Units": 410},
    {"Region": "West", "Revenue": 130000, "Units": 290}
  ]
}

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color,
    BuiltinStyle
)

json_data = '''{
  "title": "Monthly Sales Report",
  "period": "June 2026",
  "summary": "Total revenue reached $580,000 this month, representing a 12% increase over the previous period. All regions showed positive growth.",
  "sales": [
    {"Region": "North", "Revenue": 150000, "Units": 320},
    {"Region": "South", "Revenue": 120000, "Units": 280},
    {"Region": "East", "Revenue": 180000, "Units": 410},
    {"Region": "West", "Revenue": 130000, "Units": 290}
  ]
}'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

heading_style = document.AddStyle(BuiltinStyle.Heading1)
subheading_style = document.AddStyle(BuiltinStyle.Heading2)

title_para = section.AddParagraph()
title_para.ApplyStyle(heading_style.Name)
title_para.AppendText(data.get("title", "Report"))

period_para = section.AddParagraph()
period_para.AppendText(f"Period: {data.get('period', 'N/A')}")
period_para.Format.AfterSpacing = 12

summary_heading = section.AddParagraph()
summary_heading.ApplyStyle(subheading_style.Name)
summary_heading.AppendText("Executive Summary")

summary_para = section.AddParagraph()
summary_para.AppendText(data.get("summary", ""))
summary_para.Format.AfterSpacing = 12

sales_heading = section.AddParagraph()
sales_heading.ApplyStyle(subheading_style.Name)
sales_heading.AppendText("Sales Data")

sales = data.get("sales", [])
if sales:
    headers = list(sales[0].keys())
    table = section.AddTable(True)
    table.ResetCells(len(sales) + 1, len(headers))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(headers):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True

    for row_index, record in enumerate(sales):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(headers):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            paragraph.AppendText(str(record.get(key, "")))

document.SaveToFile("json_report.docx", FileFormat.Docx)
document.Close()

Output

The generated Word document combines headings, descriptive text, and tabular data into a structured report, making the JSON data easier to read and share.

Structured Word report generated from JSON data

Key Techniques

This example demonstrates several important techniques for generating Word reports from JSON:

Headings – Use BuiltinStyle.Heading1 and Heading2 for document structure and table-of-contents compatibility
Paragraphs – Add summary and descriptive text between headings
Tables – Render JSON arrays as tabular data within the report
Combinations – Mix multiple Word element types in a single document

Why Structured Reports Matter

In business environments, JSON data rarely exists in isolation. It typically comes from APIs, databases, or reporting systems and needs to be transformed into documents that decision-makers can read, share, and archive. Common scenarios include:

Sales reports – Revenue, units, and regional breakdowns from CRM or ERP systems
Inventory reports – Stock levels, reorder alerts, and warehouse summaries
Customer summaries – Contact details, order history, and account status
Compliance reports – Audit logs, access records, and policy status
Automated reporting systems – Scheduled jobs that generate documents from JSON data and distribute them via email or document management systems

Spire.Doc makes it possible to transform structured JSON data into polished business documents automatically, combining headings, paragraphs, and tables in a single output.

If you need to build more sophisticated document layouts, such as multi-section reports, cover pages, tables of contents, headers, footers, or custom document templates, see our guide on creating structured Word documents in Python.

6. Handle Nested JSON Objects

Many real-world JSON responses contain nested objects. For example, a customer record may include an address object with its own fields. Handling these nested structures is essential for complete JSON-to-Word conversion.

Example JSON

{
  "customer": {
    "name": "Tom Wilson",
    "email": "tom@example.com",
    "address": {
      "street": "123 Main St",
      "city": "Springfield",
      "state": "IL"
    }
  }
}

Python Code

import json
from spire.doc import Document, FileFormat, HorizontalAlignment

def add_nested_object(section, obj, indent_level=0):
    for key, value in obj.items():
        if isinstance(value, dict):
            heading_para = section.AddParagraph()
            heading_text = "  " * indent_level + key.capitalize()
            text_range = heading_para.AppendText(heading_text)
            text_range.CharacterFormat.Bold = True
            text_range.CharacterFormat.FontSize = 12 - indent_level
            heading_para.Format.AfterSpacing = 4
            add_nested_object(section, value, indent_level + 1)
        else:
            paragraph = section.AddParagraph()
            label = "  " * indent_level + f"{key}: {value}"
            text_range = paragraph.AppendText(label)
            text_range.CharacterFormat.FontSize = 11
            paragraph.Format.AfterSpacing = 2

json_data = '''{
  "customer": {
    "name": "Tom Wilson",
    "email": "tom@example.com",
    "address": {
      "street": "123 Main St",
      "city": "Springfield",
      "state": "IL"
    }
  }
}'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

add_nested_object(section, data)

document.SaveToFile("json_nested.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the hierarchical Word document generated from the nested JSON structure.

Nested JSON converted to a hierarchical Word document

Nested JSON objects can be represented as hierarchical sections in a Word document, making complex data structures easier to read and navigate.

How It Works

The add_nested_object function recursively traverses the JSON structure:

When it encounters a dict value, it creates a bold heading for the key and recurses into the nested object
When it encounters a scalar value, it creates a paragraph with the key-value pair
The indent_level parameter controls indentation and font size to create a visual hierarchy

This recursive approach handles arbitrarily deep nesting and produces a readable hierarchical layout in the Word document.

7. Handle Missing or Optional JSON Fields

In real-world applications, JSON data from APIs and databases often contains missing or optional fields. Records may have inconsistent keys, and some fields may be absent entirely. Handling these cases gracefully prevents errors and ensures the generated Word document remains complete.

Example JSON with Missing Fields

[
  {"Name": "Tom Wilson", "Email": "tom@example.com", "Phone": "555-0100"},
  {"Name": "Jane Doe", "Email": "jane@example.com"},
  {"Name": "Bob Brown", "Phone": "555-0300"}
]

Python Code

import json
from spire.doc import (
    Document, FileFormat, HorizontalAlignment,
    VerticalAlignment, TableRowHeightType, Color
)

json_data = '''[
  {"Name": "Tom Wilson", "Email": "tom@example.com", "Phone": "555-0100"},
  {"Name": "Jane Doe", "Email": "jane@example.com"},
  {"Name": "Bob Brown", "Phone": "555-0300"}
]'''
data = json.loads(json_data)

document = Document()
section = document.AddSection()

if data:
    all_keys = []
    for record in data:
        for key in record.keys():
            if key not in all_keys:
                all_keys.append(key)

    table = section.AddTable(True)
    table.ResetCells(len(data) + 1, len(all_keys))

    header_row = table.Rows[0]
    header_row.IsHeader = True
    header_row.Height = 20
    header_row.HeightType = TableRowHeightType.Exactly

    for col_index, header in enumerate(all_keys):
        header_row.Cells[col_index].CellFormat.Shading.BackgroundPatternColor = Color.get_Gray()
        header_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
        paragraph = header_row.Cells[col_index].AddParagraph()
        paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        text_range = paragraph.AppendText(header)
        text_range.CharacterFormat.Bold = True

    for row_index, record in enumerate(data):
        data_row = table.Rows[row_index + 1]
        data_row.Height = 20
        data_row.HeightType = TableRowHeightType.Exactly
        for col_index, key in enumerate(all_keys):
            data_row.Cells[col_index].CellFormat.VerticalAlignment = VerticalAlignment.Middle
            paragraph = data_row.Cells[col_index].AddParagraph()
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            paragraph.AppendText(str(record.get(key, "N/A")))

document.SaveToFile("json_missing_fields.docx", FileFormat.Docx)
document.Close()

Output

The following screenshot shows the generated Word table, where missing fields are automatically filled with placeholder values to maintain a consistent document structure.

Word table generated from JSON data with missing fields

Key Techniques

dict.get(key, "N/A") – Returns a default value when a key is missing, preventing KeyError exceptions
Dynamic column collection – Iterates all records to build a complete set of column headers, ensuring no field is missed even when it appears in only some records
Consistent table structure – All rows have the same number of columns regardless of which fields are present in each record

This approach is essential for production use cases where API responses may vary in structure across different records or over time.

8. Convert JSON Files to Word Documents

In practice, JSON data often originates from files rather than inline strings. API export results, configuration files, database dumps, data exchange files, and log data are all commonly stored as .json files that need to be converted to Word documents.

The conversion process for JSON files follows this workflow:

JSON File (.json)
        ↓
Load JSON (json.load)
        ↓
Generate Word Document (Spire.Doc)
        ↓
DOCX Document

Python Code

import json
from spire.doc import Document, FileFormat

with open("data.json", "r", encoding="utf-8") as f:
    data = json.load(f)

document = Document()
section = document.AddSection()

# Process the loaded JSON data
# using any of the techniques shown in Methods 1–3
# (formatted text, tables, or structured reports)

document.SaveToFile("data_report.docx", FileFormat.Docx)
document.Close()

Key Points

json.load() reads and parses a JSON file directly, unlike json.loads() which parses a string
encoding="utf-8" ensures proper handling of non-ASCII characters in JSON files
Once the JSON file is loaded into a Python dictionary or list, Spire.Doc for Python can generate paragraphs, tables, or structured reports from the parsed data using any of the methods described earlier in this article

For complete examples of processing the loaded data, refer to Method 1 for formatted text, Method 2 for tables, or Method 3 for structured reports.

9. Why Use Spire.Doc for JSON-to-Word Conversion

Converting JSON to Word involves several practical challenges that go beyond simple data parsing. Generating properly formatted tables, applying consistent styles, creating structured reports with headings and paragraphs, and handling nested or incomplete data all require a capable document generation API.

Challenges of JSON-to-Word Conversion

Table generation – JSON arrays must be mapped to Word tables with headers, rows, and cell formatting
Document formatting – Raw data exports lack the visual hierarchy that makes Word documents readable
Structured reports – Combining headings, paragraphs, and tables in a single document requires coordinating multiple element types
Nested data – Deeply nested JSON objects need recursive traversal and hierarchical layout
Large documents – Generating multi-page reports from large JSON datasets demands efficient resource management

Benefits of Spire.Doc for Python

Spire.Doc for Python addresses these challenges with a straightforward API:

Create Word documents without Microsoft Word – No Office installation or Interop dependencies required
Generate paragraphs, tables, images, headers, and footers – Full coverage of Word document elements
Apply built-in and custom styles – Consistent formatting across documents using BuiltinStyle and ParagraphStyle
Automate report generation – Programmatically build structured reports from any JSON data source
Export to DOCX and other formats – Save to DOCX, PDF, HTML, RTF, and more using FileFormat

With Spire.Doc, the JSON-to-Word conversion process becomes a structured mapping from parsed data to Word elements, rather than manual string formatting or template manipulation.

10. FAQ

How do I convert JSON to Word in Python?

Parse the JSON data using Python's built-in json module, then use Spire.Doc for Python to create a Word document. Map JSON key-value pairs to paragraphs, JSON arrays to tables, and use headings for structure. See Method 1 for a basic example and Method 3 for a complete report.

Can JSON arrays be converted into Word tables?

Yes. JSON arrays of objects map naturally to Word tables, where each object becomes a row and each key becomes a column. See Method 2 for a complete code example that creates a formatted table from a JSON array.

How do I create a DOCX report from API JSON responses?

Fetch the API response as JSON, parse it, and use Spire.Doc for Python to generate the report. Combine headings for titles, paragraphs for summaries, and tables for data arrays. See Method 3 for a structured report example.

Can nested JSON objects be exported to Word?

Yes. Use a recursive function to traverse nested JSON objects, creating headings for object keys and paragraphs for scalar values. See Section 6 for a detailed example of handling nested structures with visual hierarchy.

How do I convert a JSON file to a Word document?

Use Python's json.load() to read the JSON file, then process the parsed data with Spire.Doc for Python. See Section 8 for a code example.

What is the best way to generate Word documents from JSON data?

The best approach depends on the JSON structure. For simple key-value data, use formatted paragraphs. For arrays, use tables. For complex nested data with mixed content, combine headings, paragraphs, and tables as shown in Method 3.

11. Conclusion

Generating Word documents from JSON data is a common requirement in reporting, document automation, and data export workflows. With Spire.Doc for Python, you can create paragraphs, tables, and structured document layouts directly from JSON, making it easier to produce professional DOCX files from application data.

The same approach can be extended to API responses, database records, configuration files, and other structured data sources, helping automate document generation in both small projects and enterprise systems.

For scenarios involving large documents or document conversion requirements, a licensed version is required.

Published in Conversion

Tagged under

doc Python Conversion

Thursday, 11 June 2026 01:23

How to Convert Word Tables to CSV (DOC/DOCX to CSV)

Why Word Cannot Be Saved Directly as CSV
Method 1 – Convert Word Tables to CSV Using Spreadsheet Software
Can You Use an Online Word to CSV Converter?
Method 2 – Convert Word Tables to CSV Automatically with Python
FAQ

Install with Pypi

Method	Ease of Use	Batch Processing	Privacy	Best For
Spreadsheet Software	High	No	High	Occasional conversions, manual review
Python (Spire.Doc)	Medium	Yes	High	Automation, batch processing, recurring tasks

1. Why Word Cannot Be Saved Directly as CSV

Microsoft Word does not offer a "Save as CSV" option. This is not an oversight — it reflects a fundamental format mismatch:

Word documents contain mixed content: paragraphs, images, headers, footers, styled text, and tables. A single document can have multiple sections, columns, and nested elements.
CSV files contain only flat tabular data: rows and columns of plain text separated by commas.

Word cannot automatically determine how to flatten a rich-text document into a tabular layout. A document with three paragraphs, an image, and a table does not map cleanly to rows and columns. The only part of a Word document that has a natural CSV representation is structured table data.

This is why every practical approach to convert Word to CSV focuses on extracting tables from the document — whether through spreadsheet software, online tools, or programmatic methods.

2. Method 1 – Convert Word Tables to CSV Using Spreadsheet Software

The most straightforward way to convert Word tables to CSV is to copy the table into a spreadsheet application and export it. Both Microsoft Excel and Google Sheets support this workflow.

The Workflow

Copy the Word table into a spreadsheet — Select the table in Word, copy it, and paste it into a new spreadsheet
Verify the imported data — Check that rows, columns, and cell values are correctly separated. Watch for merged cells, which may cause misalignment
Export as CSV — Save or download the spreadsheet in CSV format

Option A – Microsoft Office

Open the Word document and copy the table you want to export.
Paste the table into an Excel worksheet and verify that rows and columns are imported correctly.
Review merged cells, line breaks, or other formatting issues that could affect the CSV structure.
Choose File > Save As and save the worksheet as a CSV file.

Convert Word table to CSV using Microsoft Office

Excel preserves Word table structure well — rows and columns map correctly in most cases. If your document contains multiple tables, you can paste each one onto a separate worksheet and save each as an individual CSV file.

Considerations:

Merged cells in the Word table may cause misalignment after pasting
Excel runs locally, so your data stays on your machine
The process is manual and not practical for frequent or large-scale conversions

Option B – Google Sheets

Copy the table from the Word document (in Google Docs or other documet viewers).
Paste it into a new Google Sheets spreadsheet.
Verify the imported table structure and adjust any misaligned data.
Download the spreadsheet as a CSV file using File > Download > Comma Separated Values (.csv).

Convert Word to CSV using Google Sheets

Google Sheets is free and requires only a Google account. It also makes it easy to share and review data with collaborators before exporting to CSV.

Considerations:

Data is stored on Google's servers during editing — consider this for sensitive information
No software installation required
Like Excel, this is a manual process with no automation support

When to Use This Method

Spreadsheet-based conversion works well when you occasionally need to export Word table data to CSV and want to review the data before saving. For recurring conversions, multiple documents, or automated workflows, the Python method below is more efficient.

If you also need to convert DOCX (Word documents) to XLSX, you can refer to our Docx to XLSX conversion guide for a structured spreadsheet workflow.

3. Can You Use an Online Word to CSV Converter?

Yes. Several websites offer Word to CSV converter tools that let you upload a DOC or DOCX file and download a CSV file. These are suitable for quick, one-time conversions when you don't want to install any software.

However, online converters have notable limitations:

Privacy — Your document is uploaded to a third-party server, which may not be acceptable for sensitive or proprietary data
File size limits — Most free tools restrict uploads to 5–10 MB
Table recognition — Some converters extract only the first table; others may misinterpret document structure
No batch processing — You can convert only one file at a time

For sensitive data, recurring conversions, or batch processing, local methods (spreadsheet software or Python) are preferable.

4. Method 2 – Convert Word Tables to CSV Automatically with Python

If you need to convert Word files to CSV regularly, automate document processing, or handle large numbers of files, Python provides a more efficient solution. With Spire.Doc for Python, you can read Word documents, extract table data, and export it directly to CSV format — all without Microsoft Word installed.

Install Spire.Doc for Python

Install the library via pip:

pip install spire.doc

Import the required classes in your Python script:

from spire.doc import *
from spire.doc.common import *

Alternatively, you can download Spire.Doc for Python and integrate it manually.

Convert a Word Table to CSV

The following example loads a Word document, extracts the first table, reads its rows and cells, and writes the data to a CSV file.

import csv
from spire.doc import *
from spire.doc.common import *

document = Document()
document.LoadFromFile("Sample.docx")

section = document.Sections.get_Item(0)

for t in range(section.Tables.Count):

    table = section.Tables.get_Item(t)
    csv_data = []

    for r in range(table.Rows.Count):

        row = table.Rows.get_Item(r)
        row_data = []

        for c in range(row.Cells.Count):

            cell = row.Cells.get_Item(c)

            paragraphs = []

            for p in range(cell.Paragraphs.Count):

                text = cell.Paragraphs.get_Item(p).Text.strip()

                if text:
                    paragraphs.append(text)

            row_data.append(" ".join(paragraphs))

        csv_data.append(row_data)

    csv_path = f"table_{t + 1}.csv"

    with open(csv_path, "w", newline="", encoding="utf-8-sig") as f:
        csv.writer(f).writerows(csv_data)

document.Close()

How It Works

Document.LoadFromFile() loads the Word document into memory.
section.Tables.get_Item(table_index) selects the table to export.
The script loops through every row and cell in the table using the Rows and Cells collections.
Each table cell may contain one or more paragraphs. The script reads all paragraphs using cell.Paragraphs and extracts their text content.
The extracted paragraph text is cleaned with .strip() and combined into a single string for the CSV cell value.
csv.writer() exports the collected table data to a standard CSV file that can be opened in Excel, Google Sheets, databases, or other data-processing tools.

Output Result

Below is a preview of the Word table and the generated CSV file:

Convert Word to Excel using Python

The output is a properly formatted .csv file containing the Word table data, ready for import into Excel, databases, or any system that accepts CSV input.

Extract Multiple Tables from a Word Document

If your Word document contains multiple tables, iterate through section.Tables and save each one as a separate CSV file:

for t in range(section.Tables.Count):
    word_table_to_csv(
        word_path,
        f"table_{t + 1}.csv",
        table_index=t
    )

Batch Convert Multiple Word Files

To process an entire folder of Word documents, loop through the files and extract the first table from each:

for filename in os.listdir(input_folder):
    if filename.lower().endswith((".doc", ".docx")):
        word_table_to_csv(
            os.path.join(input_folder, filename),
            os.path.join(
                output_folder,
                os.path.splitext(filename)[0] + ".csv"
            )
        )

Why Use Python for Word to CSV Conversion?

Python automation with Spire.Doc for Python offers clear advantages when you need to convert Word tables to CSV at scale:

Advantage	Details
Batch conversion	Process dozens or hundreds of Word files in a single script
Automation	Schedule conversions to run automatically — daily, weekly, or on demand
Large datasets	Handle Word documents with large tables that are impractical to convert manually
Workflow integration	Integrate Word-to-CSV conversion into data pipelines, ETL processes, or CI/CD workflows
No Microsoft Word dependency	Spire.Doc for Python works without Microsoft Word installed
Data accuracy	Programmatic extraction eliminates copy-paste errors and ensures consistent results

For more advanced usage, you can also check our guide on extracting tables from Word documents using Python.

5. FAQ

Can I convert Word to CSV directly?

No. Microsoft Word does not have a built-in option to save or export documents as CSV. Word's "Save As" dialog supports formats like DOCX, PDF, RTF, HTML, and plain text — but not CSV. To convert Word to CSV, you need to extract table data from the document and write it to a CSV file using spreadsheet software or Python automation.

Why can't Word save directly as CSV?

Word is a rich-text document format that supports paragraphs, images, headers, styles, and mixed content. CSV is a flat tabular format that stores only rows and columns of text separated by commas. Word cannot automatically determine how to flatten a complex document structure into a tabular layout, so it does not offer CSV as an export option. Only structured data — typically data in Word tables — can be meaningfully converted to CSV.

How do I convert a Word table to CSV?

You have two main options: (1) Spreadsheet software — Copy the Word table into Excel or Google Sheets, verify the data, and save or download as CSV. This is the most common approach for occasional use. (2) Python — Use Spire.Doc for Python to read the Word document, access the table programmatically, extract cell values, and write them to a CSV file. This is ideal for automation, batch processing, and recurring conversions.

Can I convert DOCX to CSV without Excel?

Yes. You can convert DOCX to CSV without Excel using: (1) Google Sheets — Paste the Word table data into a Google Sheets spreadsheet and download as CSV. (2) Online tools — Upload your DOCX file to a Word-to-CSV converter website and download the result. (3) Python — Use Spire.Doc for Python to read the DOCX file, extract table data, and write it to CSV. This works without any Microsoft Office software installed.

Is there a free Word to CSV converter?

Yes. There are free options in two categories: (1) Online converters — Many websites offer free Word-to-CSV conversion, though they typically have file size limits and raise privacy concerns since your data is uploaded to a third-party server. (2) Python scripts — You can write a free, local conversion script using Spire.Doc for Python (which offers a free version) and Python's built-in csv module. This keeps your data private and has no file size restrictions.

How do I extract data from a Word document to CSV in Python?

Use Spire.Doc for Python to load the Word document, access the table through the Sections and Tables collections, iterate through rows and cells to read each cell's text, and write the data to a CSV file using Python's standard csv.writer. The complete code example is provided in Method 2 above.

Does Spire.Doc for Python require Microsoft Word to be installed?

No. Spire.Doc for Python is a standalone library that creates, reads, and manipulates Word documents independently. It does not require Microsoft Word or any Office component to be installed on your system. This makes it suitable for server environments, automated workflows, and machines where Office is not available.

Conclusion

Converting Word to CSV means extracting structured table data from DOC or DOCX documents and saving it in a tabular format. Spreadsheet software (Excel or Google Sheets) provides a simple manual approach — copy the Word table, verify the data, and export as CSV. This works well for occasional conversions but does not scale to batch processing or recurring workflows.

Python automation with Spire.Doc for Python provides a reliable solution for converting Word tables to CSV programmatically. It reads DOC and DOCX files, extracts table data accurately, and writes CSV output — all without requiring Microsoft Word. For developers and organizations that regularly convert DOC or DOCX files to CSV, Spire.Doc for Python offers a reliable way to automate the entire process while preserving table data accurately.

You can apply for a 30-day free license to evaluate all features of Spire.Doc for Python.

1. Why Convert CSV to Word?

You might wonder: why not just use Excel? After all, CSV files open natively in spreadsheet applications. While Excel is great for data analysis and calculations, Word documents serve different purposes. Word provides superior formatting for narrative reports, client deliverables, and print-ready documents where data needs to appear alongside explanatory text, headers, and styled layouts.

Common Use Cases

Use Case	Why Word Over Excel
Business reports	Combine data tables with narrative analysis and executive summaries
Project documentation	Embed data within structured documents that include instructions and context
Client deliverables	Present data in branded, professionally formatted documents
Academic papers	Follow specific formatting guidelines (APA, MLA) with data integrated into the text
Mail merge preparation	Use CSV data as the source for personalized letters and labels in Word

When you need to convert a CSV file to a Word document, the right method depends on how often you do it and how much formatting control you need.

2. Method 1 – Copy and Paste CSV Data into Word

The simplest way to bring CSV data into Word is to copy it from a spreadsheet and paste it directly. This method works well for small datasets and one-time tasks.

Copy and paste CSV data from Excel into Word

Step 1: Open the CSV File in Excel

Double-click your .csv file, or open Excel and use File > Open to load the CSV. Excel will automatically parse the comma-separated values into columns.

Step 2: Select the Data

Highlight the cells you want to include in your Word document. You can select the entire sheet by pressing Ctrl + A, or select a specific range.

Step 3: Paste into Word

Open Microsoft Word, place your cursor where you want the data, and press Ctrl + V. Word will automatically convert the tabular data into a Word table.

Step 4: Apply Table Formatting

Use Word's Table Design tab to apply a style, adjust column widths, and format headers.

Pros and Cons

Aspect	Evaluation
Ease of use	Very easy — no special tools required
Speed	Fast for small datasets
Formatting control	Limited — formatting may break with large data
Scalability	Not suitable for files with hundreds or thousands of rows
Reproducibility	Manual process — hard to repeat consistently

If you're also working with spreadsheet workflows, you may find our guide on converting CSV files to Excel helpful.

3. Method 2 – Convert CSV to a Word Table Using Text-to-Table

Word has a built-in feature that can convert delimited text directly into a table — no Excel required. This method is particularly relevant if you're searching for how to convert CSV to a Word table, since it uses Word's native Text-to-Table conversion.

Convert CSV to Word table using the Text-to-Table feature

Step 1: Open the CSV File in a Text Editor

Open your .csv file in Notepad, Notepad++, or any plain text editor. You'll see the raw comma-separated values.

Step 2: Copy the CSV Content

Select all the text (Ctrl + A) and copy it (Ctrl + C).

Step 3: Paste into Word as Plain Text

In Word, paste the content. It will appear as plain text with commas separating the values.

Step 4: Use Text-to-Table Conversion

Select the pasted text, then go to Insert > Table > Convert Text to Table. In the dialog box:

Set Separate text at to Commas
Adjust the number of columns if needed
Click OK

Word will convert the comma-separated text into a properly structured table.

Step 5: Format the Table

Apply a table style from the Table Design tab, format the header row, and adjust column widths as needed.

Pros and Cons

Aspect	Evaluation
Ease of use	Easy — no Excel needed, works entirely within Word
Formatting control	Medium — Word handles the table structure automatically
Scalability	Works for moderate-sized files; very large files may be slow
Accuracy	Good — Word correctly parses comma delimiters in most cases
Limitation	May misinterpret commas inside quoted fields (e.g., "Smith, John")

If your data is already stored in Excel workbooks rather than CSV files, see our guide on converting Excel sheets to Word documents.

4. Method 3 – Use an Online CSV to Word Converter

If you don't have Excel or Word installed, or you just need a quick one-off conversion, an online CSV to Word converter can get the job done in seconds. Several free tools allow you to upload a CSV file and download a Word document.

How It Works

Search for "CSV to Word converter online" in your browser
Upload your .csv file to the converter website
Wait for the conversion to complete
Download the generated .docx file

What to Look for in an Online Converter

When choosing an online CSV-to-Word converter, consider:

File size limits
Supported output formats (DOC vs DOCX)
Data privacy policies
Table formatting quality
Batch conversion support

Pros and Cons

Aspect	Evaluation
Ease of use	Very easy — no software installation required
Speed	Fast for small to medium files
Formatting control	Low — you get what the tool produces
Privacy	Concern — your data is uploaded to a third-party server
File size limits	Most tools impose upload size restrictions
Batch processing	Not supported — one file at a time

When to Use an Online Converter

Online converters are a reasonable choice when you have a single, non-sensitive CSV file and just need a quick conversion. However, if your data contains personal information, financial records, or business-critical content, uploading it to a third-party service may not be appropriate.

If you need repeatable or large-scale conversions, automation is usually a better long-term solution.

5. Limitations of Manual and Online CSV-to-Word Conversion

Manual methods and online tools work for occasional use, but they break down when you need to process CSV files regularly or at scale. Here are the common challenges:

Common Challenges

Repetitive work — If you convert CSV to Word every week or every day, manual copy-paste becomes tedious and error-prone.
Large datasets — Word struggles to handle tables with thousands of rows pasted from Excel. Performance degrades and formatting breaks.
Batch processing — When you need to convert multiple CSV files to Word documents, doing them one by one is impractical.
Formatting consistency — Manual formatting varies each time. Headers, fonts, and table styles may look different across documents.
Privacy concerns — Online converters require uploading your data to external servers, which may not be acceptable for sensitive information.
Automated report generation — If reports need to be generated on a schedule (daily, weekly), manual conversion cannot keep up.

For these situations, Python automation provides a practical path forward — and the next section shows exactly how to implement it.

6. Method 4 – Convert CSV to Word Automatically with Python

Python is a natural choice for automating CSV-to-Word conversion. It has a built-in csv module for reading data, and with Spire.Doc for Python, you can create and format Word documents without requiring Microsoft Word to be installed.

This section walks through the complete implementation: installing the library, reading CSV data, building a Word table, and saving the result as DOCX.

Install Spire.Doc for Python

Install the library via pip:

pip install spire.doc

Import the required classes in your Python script:

from spire.doc import *
from spire.doc.common import *

Step 1: Read CSV Data

Python's built-in csv module reads CSV files into a list of rows:

import csv

csv_data = []
with open("sales_data.csv", "r", encoding="utf-8-sig") as file:
    reader = csv.reader(file)
    for row in reader:
        csv_data.append(row)

The first row typically contains column headers, and subsequent rows contain the data.

Step 2: Create a Word Document and Table

Create a new Word document, add a section, and initialize a table with the dimensions of your CSV data:

document = Document()
section = document.AddSection()

num_rows = len(csv_data)
num_cols = len(csv_data[0]) if csv_data else 0

table = section.AddTable(True)
table.ResetCells(num_rows, num_cols)
table.PreferredWidth = PreferredWidth(WidthType.Percentage, 100)

Step 3: Populate the Table with CSV Data

Iterate through the CSV rows and write each value into the corresponding cell. Format the header row with a distinct style:

for r in range(num_rows):
    row = table.Rows[r]
    row.Height = 22
    row.HeightType = TableRowHeightType.Exactly

    for c in range(num_cols):
        cell = row.Cells[c]
        paragraph = cell.AddParagraph()
        text_range = paragraph.AppendText(csv_data[r][c])
        cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

        if r == 0:
            row.IsHeader = True
            cell.CellFormat.Shading.BackgroundPatternColor = Color.get_DarkBlue()
            text_range.CharacterFormat.Bold = True
            text_range.CharacterFormat.TextColor = Color.get_White()
            text_range.CharacterFormat.FontSize = 11
            paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
        else:
            text_range.CharacterFormat.FontSize = 10
            if r % 2 == 0:
                cell.CellFormat.Shading.BackgroundPatternColor = Color.get_LightGray()
            else:
                cell.CellFormat.Shading.BackgroundPatternColor = Color.Empty()

This code formats the first row as a header with a dark blue background and white bold text, and applies alternating row colors for readability.

Step 4: Save as DOCX

Save the generated Word document:

document.SaveToFile("SalesReport.docx", FileFormat.Docx)
document.Close()

Below is a preview of the CSV data and the generated Word document:

CSV data converted to a formatted Word table using Python

The output is a properly formatted .docx file containing your CSV data in a Word table.

For more advanced table customization options, check out our guide on creating and formatting Word tables with Python.

7. Complete CSV to Word Python Example

Here is the complete, runnable script that reads a CSV file and converts it to a Word document with a title, formatted table, alternating row colors, and table borders.

import csv
from spire.doc import *
from spire.doc.common import *

def csv_to_word(csv_path, output_path, title="Data Report"):
    csv_data = []
    with open(csv_path, "r", encoding="utf-8-sig") as file:
        reader = csv.reader(file)
        for row in reader:
            csv_data.append(row)

    if not csv_data:
        print("CSV file is empty.")
        return

    num_rows = len(csv_data)
    num_cols = len(csv_data[0])

    document = Document()
    section = document.AddSection()

    title_para = section.AddParagraph()
    title_range = title_para.AppendText(title)
    title_range.CharacterFormat.FontSize = 18
    title_range.CharacterFormat.Bold = True
    title_para.Format.HorizontalAlignment = HorizontalAlignment.Center
    title_para.Format.AfterSpacing = 12

    table = section.AddTable(True)
    table.ResetCells(num_rows, num_cols)
    table.PreferredWidth = PreferredWidth(WidthType.Percentage, 100)

    for r in range(num_rows):
        row = table.Rows[r]
        row.Height = 22
        row.HeightType = TableRowHeightType.Exactly

        for c in range(num_cols):
            cell = row.Cells[c]
            paragraph = cell.AddParagraph()
            text_range = paragraph.AppendText(csv_data[r][c])
            cell.CellFormat.VerticalAlignment = VerticalAlignment.Middle

            if r == 0:
                row.IsHeader = True
                cell.CellFormat.Shading.BackgroundPatternColor = Color.get_DarkBlue()
                text_range.CharacterFormat.Bold = True
                text_range.CharacterFormat.TextColor = Color.get_White()
                text_range.CharacterFormat.FontSize = 11
                paragraph.Format.HorizontalAlignment = HorizontalAlignment.Center
            else:
                text_range.CharacterFormat.FontSize = 10
                if r % 2 == 0:
                    cell.CellFormat.Shading.BackgroundPatternColor = Color.get_LightGray()
                else:
                    cell.CellFormat.Shading.BackgroundPatternColor = Color.Empty()

    table.Format.Borders.Vertical.BorderType = BorderStyle.Single
    table.Format.Borders.Vertical.LineWidth = 0.5
    table.Format.Borders.Horizontal.BorderType = BorderStyle.Single
    table.Format.Borders.Horizontal.LineWidth = 0.5

    document.SaveToFile(output_path, FileFormat.Docx)
    document.Close()
    print(f"Word document saved to: {output_path}")

csv_to_word("sales_data.csv", "SalesReport.docx", "Q4 Sales Report")

How It Works

csv.reader reads the CSV file row by row, handling different encodings via utf-8-sig (which handles BOM markers).
Document() creates a blank Word document. AddSection() adds a section (page) to the document.
AddTable(True) creates a new table with auto-fit enabled. ResetCells() sets the exact dimensions.
AppendText() writes each CSV value into the corresponding cell as a text range.
Header formatting applies a dark blue background, white bold text, and center alignment to the first row.
Alternating row colors use light gray for even rows and no fill for odd rows, improving readability.
SaveToFile() exports the document as a .docx file.

8. Why Use Spire.Doc for CSV-to-Word Conversion?

Spire.Doc for Python offers several technical advantages for developers who need to generate Word documents from CSV data programmatically.

Advantages

Advantage	Details
No Microsoft Word dependency	Create and manipulate DOCX files without installing Microsoft Word on the server or machine
Comprehensive table formatting	Control cell shading, borders, alignment, row heights, column widths, and table styles
Automated report generation	Build scripts that convert CSV to Word on a schedule, integrating with data pipelines
Batch document processing	Process multiple CSV files in a loop, generating separate Word documents for each
Python integration	Works seamlessly with Python's standard `csv` module and other data processing libraries
Full DOCX support	Generate documents compatible with Microsoft Word, LibreOffice, and Google Docs

Key API Classes

Document — Represents a Word document. Use it to create new documents or load existing ones.
Section — Represents a section (page) within a document. Contains paragraphs, tables, and other content.
Table — Represents a table in a Word document. Supports row/column manipulation, styling, and borders.
TableRow / TableCell — Provide access to individual rows and cells for formatting and content insertion.
Paragraph / TextRange — Handle text content within cells, including font, size, color, and alignment.

9. CSV to Word Conversion Methods Compared

Method	Ease of Use	Batch Processing	Formatting Control	Privacy	Best For
Copy & Paste	★★★★★	✗	Low	✓	One-time, small datasets
Text-to-Table	★★★★☆	✗	Medium	✓	No-Excel workflows, moderate data
Online Converter	★★★★★	✗	Low	✗	Quick one-off conversions
Python + Spire.Doc	★★★☆☆	✓	High	✓	Recurring tasks, batch processing, automation

Summary: Manual methods and online tools are quick and accessible but don't scale. Python automation with Spire.Doc requires a small setup investment but pays off when you need consistent, repeatable, or batch CSV-to-Word conversion.

10. FAQ

How do I convert a CSV file to a Word document?

You can convert a CSV file to a Word document using several methods: (1) Open the CSV in Excel, copy the data, and paste it into Word; (2) Use Word's Text-to-Table feature to convert comma-separated text directly into a table; (3) Use an online CSV to Word converter for a quick one-off conversion; (4) Use Python with Spire.Doc for Python to automate the conversion programmatically. The Python approach is best for recurring tasks or batch processing.

Can I convert CSV to DOCX automatically?

Yes. You can automate CSV-to-DOCX conversion using Python. Read the CSV data with Python's built-in csv module, then use Spire.Doc for Python to create a Word document, populate a table with the CSV data, and save it as a .docx file. This approach works without Microsoft Word installed and can be scheduled to run automatically.

How do I insert CSV data into a Word table?

To insert CSV data into a Word table manually, you can use Word's Insert > Table > Convert Text to Table feature — paste the CSV text, then convert it using commas as the delimiter. For programmatic insertion, use Python: read the CSV with the csv module, create a table in a Word document using Spire.Doc for Python, and iterate through the CSV rows to populate each cell.

Is there a free CSV to Word converter online?

Yes, several websites offer free CSV-to-Word conversion. However, online converters have limitations: file size restrictions, limited formatting control, and privacy concerns since your data is uploaded to a third-party server. For sensitive data or recurring conversions, a local Python solution with Spire.Doc for Python is a more reliable and private alternative.

Can Python convert CSV files to Word documents?

Yes, Python can convert CSV files to Word documents. Using Spire.Doc for Python, you can read CSV data with the standard csv module, create a Word document, add a formatted table, populate it with the CSV content, and save the result as a DOCX file. This works without Microsoft Word and supports batch processing of multiple CSV files.

Does Spire.Doc for Python require Microsoft Word to be installed?

No. Spire.Doc for Python is a standalone library that creates and manipulates Word documents independently. It does not require Microsoft Word or any Office component to be installed on your system. This makes it suitable for server environments and automated workflows.

Conclusion

Converting CSV to Word is a common task with multiple approaches. Manual methods — copy-and-paste and Word's Text-to-Table feature — work well for occasional use with small datasets. Online converters offer convenience for quick, one-off tasks but raise privacy concerns and lack formatting control. None of these options scale to batch processing, scheduled report generation, or scenarios requiring consistent formatting across many documents.

Python automation with Spire.Doc for Python provides a reliable solution for converting CSV to DOCX programmatically. It reads CSV data, creates formatted Word tables, and generates professional documents without requiring Microsoft Word — making it ideal for automated workflows, batch processing, and server-side document generation.

You can apply for a 30-day free license to evaluate all features of Spire.Doc for Python.

Why Export Excel to JSON?

Excel is the most widely used tool for storing structured data, but modern applications communicate in JSON. Converting between these formats is essential whenever spreadsheet data needs to move into a web context.

Common use cases include:

Sending spreadsheet data to web applications
Importing data into REST APIs
Working with JavaScript frameworks like React, Vue, or Angular
Migrating data into NoSQL databases like MongoDB
Exchanging data between systems in integration pipelines

Excel has no native "Save as JSON" option, so you need an external tool or library to bridge this gap.

What Does Excel Data Look Like in JSON?

Excel rows are typically converted into JSON objects, while column headers become object keys.

Excel Data:

Excel Data Example

JSON Output:

[
  {"ID": 1, "Name": "Alice", "Department": "HR"},
  {"ID": 2, "Name": "Bob", "Department": "Engineering"}
]

Each row becomes a JSON object, each column header becomes a key, and the entire worksheet becomes an array. Both XLS and XLSX files follow the same mapping pattern.

Method 1: Export Excel to JSON Online

Online Excel-to-JSON converters provide the fastest solution for one-time conversions without requiring software installation or programming knowledge.

Steps to Convert Excel to JSON Online

Upload the Excel file: Select your .xlsx or .xls file from local storage. Most platforms support drag-and-drop.
Configure options: Specify whether to include headers, select specific worksheets, or customize output formatting.
Convert and download: The server processes your file and generates JSON output. Retrieve the converted file or copy the result.

Recommended Online Excel to JSON Converters

Different tools excel at different scenarios:

Tool	Best For	File Size Limit	Special Features
TableConvert	Table-based JSON structures	10MB	Custom JSON formatting, nested objects
Data Formatter Pro	Quick conversion in the browser	5MB	Browser-side conversion, no upload required
JSON Editor Online	Visual editing after conversion	5MB	Built-in JSON validator and formatter

Advantages and Limitations

Advantages:

No installation required — access from any browser
Fast for small files under 5MB
Beginner-friendly with graphical interfaces

Limitations:

File size limits: Most free converters restrict uploads to 5-10MB
Privacy concerns: Uploading business data to external servers introduces compliance risks
Formula handling: Online converters export formula results as static values
Multiple worksheets: Many tools export only the active worksheet or lose sheet structure

Online converters work well for quick, non-sensitive conversions. For anything involving large files, confidential data, or complex workbooks, you need a programmatic solution.

Method 2: Export Excel to JSON in Python with Pandas

Pandas is Python's most popular data analysis library, offering straightforward Excel-to-JSON conversion through its DataFrame API. This method suits data scientists and analysts who already use Pandas for data manipulation.

Install Pandas and Dependencies

pip install pandas openpyxl

For legacy .xls files, also install xlrd:

pip install xlrd

Read Excel and Export JSON

import pandas as pd

# Load Excel file into DataFrame
df = pd.read_excel("sales_report.xlsx")

# Export DataFrame to JSON
df.to_json(
    "sales_report.json",
    orient="records",
    indent=4
)

print("Excel data exported to JSON successfully")

Below is an example of the Excel worksheet and JSON output:

Convert Excel to JSON with Pandas

Key Parameters:

orient="records": Structures output as an array of objects (most common format)
indent=4: Pretty-prints JSON with 4-space indentation

Understanding JSON Output Options

Pandas provides multiple output orientations through the orient parameter:

orient="records" (Recommended for APIs):

[
  {"ID": 1, "Name": "Alice", "Department": "HR"},
  {"ID": 2, "Name": "Bob", "Department": "Engineering"}
]

orient="index":

{
  "0": {"ID": 1, "Name": "Alice", "Department": "HR"},
  "1": {"ID": 2, "Name": "Bob", "Department": "Engineering"}
}

orient="split":

{
  "columns": ["ID", "Name", "Department"],
  "index": [0, 1],
  "data": [[1, "Alice", "HR"], [2, "Bob", "Engineering"]]
}

The records orientation is the most widely compatible format for REST APIs and JavaScript applications.

Handling Specific Worksheets

import pandas as pd

# Read specific worksheet by name
df = pd.read_excel("workbook.xlsx", sheet_name="Q4_Sales")

# Read specific worksheet by index (0-based)
df = pd.read_excel("workbook.xlsx", sheet_name=0)

df.to_json("q4_sales.json", orient="records", indent=4)

Pandas excels for data analysis where you need to filter, aggregate, or transform data before export. However, it loads entire files into memory and cannot preserve formula logic, making it less suitable for large files or enterprise scenarios.

Excel-to-JSON conversion is often only one step in a data workflow. If you need to import JSON data back into spreadsheets, see our tutorial on converting JSON to Excel for a complete two-way data exchange solution.

Method 3: Export Excel to JSON in Python with Spire.XLS

Spire.XLS for Python provides a professional Excel processing library designed for scenarios where Pandas falls short. It handles complex workbook structures, preserves formula calculations, and processes large files efficiently without loading entire datasets into memory.

Install Spire.XLS for Python

pip install Spire.XLS

Export Excel Data to JSON

from spire.xls import Workbook
import json

# Create workbook instance
workbook = Workbook()
workbook.LoadFromFile("sales_data.xlsx")

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Extract data into structured format
data = []
headers = []

# Read headers from first row
for col in range(sheet.AllocatedRange.Columns.Count):
    cell = sheet.AllocatedRange.Rows[0].Cells[col]
    headers.append(cell.Value)

# Read data rows
for row_idx in range(1, sheet.AllocatedRange.Rows.Count):
    row_data = {}
    row = sheet.AllocatedRange.Rows[row_idx]

    for col_idx in range(len(headers)):
        cell = row.Cells[col_idx]
        row_data[headers[col_idx]] = cell.Value

    data.append(row_data)

# Export to JSON file
with open("sales_data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=4, ensure_ascii=False)

print(f"Exported {len(data)} records to JSON")
workbook.Dispose()

The conversion result is shown below:

Convert Excel to JSON with Spire.XLS

Key Points

Load Workbook: Use Workbook.LoadFromFile() to load the Excel file into memory. This method supports both XLS and XLSX formats.
Access Worksheet: Retrieve a specific worksheet using workbook.Worksheets[index], where index 0 refers to the first sheet.
Extract Headers: Iterate through the first row of the allocated range (sheet.AllocatedRange.Rows[0]) to collect column headers, which will serve as JSON object keys.
Read Data Rows: Loop through remaining rows (starting from index 1) and extract cell values. For each row, create a dictionary mapping headers to cell values.
Export to JSON: Use Python's built-in json.dump() function to write the data structure to a JSON file with proper formatting (indent=4) and Unicode support (ensure_ascii=False).

JSON is not the only format used for data exchange. If you need a simpler, tabular format for reporting or system integration, see our guide on converting Excel to CSV in Python.

Export Multiple Worksheets to JSON

One of Spire.XLS's key advantages is handling multi-sheet workbooks while preserving structure:

from spire.xls import Workbook
import json

workbook = Workbook()
workbook.LoadFromFile("quarterly_reports.xlsx")

workbook_data = {}

for sheet_index in range(workbook.Worksheets.Count):
    sheet = workbook.Worksheets[sheet_index]
    sheet_name = sheet.Name

    sheet_data = []
    headers = []

    last_row = sheet.LastRow
    last_col = sheet.LastColumn

    if last_row > 0 and last_col > 0:
        # Read headers
        for col in range(1, last_col + 1):
            cell_value = sheet.Range[1, col].Value
            headers.append(cell_value if cell_value else f"Column{col}")

        # Read data rows
        for row in range(2, last_row + 1):
            row_data = {}
            has_data = False

            for col in range(1, last_col + 1):
                cell = sheet.Range[row, col]
                value = cell.Value

                # Handle formula cells - export calculated results
                if cell.HasFormula:
                    value = cell.FormulaValue

                row_data[headers[col - 1]] = value
                if value is not None and str(value).strip():
                    has_data = True

            if has_data:
                sheet_data.append(row_data)

    workbook_data[sheet_name] = sheet_data
    print(f"Processed: {sheet_name} ({len(sheet_data)} rows)")

with open("quarterly_reports.json", "w", encoding="utf-8") as f:
    json.dump(workbook_data, f, indent=4, ensure_ascii=False)

print(f"Exported {workbook.Worksheets.Count} worksheets to JSON")
workbook.Dispose()

Output Structure:

{
  "Q1_Sales": [
    {"Product": "Widget A", "Revenue": 15000, "Units": 500},
    {"Product": "Widget B", "Revenue": 22000, "Units": 730}
  ],
  "Q2_Sales": [
    {"Product": "Widget A", "Revenue": 18000, "Units": 600},
    {"Product": "Widget B", "Revenue": 25000, "Units": 830}
  ]
}

Benefits of Using Spire.XLS

Preserve workbook structure: Maintain worksheet organization in the JSON output
Handle formulas correctly: Export calculated values from formula cells
Memory-efficient processing: Handle large workbooks without loading entire files into memory
No Excel dependency: Process files without requiring Microsoft Excel installation
Cross-platform: Run on Windows, Linux, and macOS

Pandas vs Spire.XLS Comparison

Feature	Pandas	Spire.XLS
Open Source	✓	✗
Data Analysis	✓	✓
Formula Results	Limited	✓
Multiple Worksheets	Basic	✓
Enterprise Automation	Limited	✓
Memory Efficiency	Moderate	✓
Large File Support	Limited	✓

For systems that require hierarchical or schema-based data exchange, you can also learn how to convert Excel to XML in Python.

Common Challenges When Converting Excel to JSON

Multiple Worksheets

Workbooks often contain multiple related worksheets. Exporting all sheets as a single flat array loses organizational structure. Use a library like Spire.XLS to preserve worksheet names as top-level keys in your JSON output.

Formula Cells

Excel formulas calculate values dynamically. When exporting to JSON, you typically want the calculated result, not the formula string. Spire.XLS provides the FormulaValue property to export computed values, while Pandas reads displayed values by default.

Date Formatting

Excel stores dates as numeric serial dates. Without explicit handling, dates may export as meaningless numbers like 45662 instead of "2026-05-01". Convert date columns to ISO 8601 strings for JSON compatibility.

Empty Cells and Null Values

Empty cells can be represented as null, omitted entirely, or exported as empty strings. Use null for missing values and empty strings for explicitly empty cells to preserve data intent.

Which Method Should You Choose?

Scenario	Recommended Method	Rationale
Quick one-time conversion	Online converter	No setup, fastest for occasional use
Data analysis workflows	Pandas	Integrates with analysis pipelines
Complex workbooks with multiple sheets	Spire.XLS	Preserves structure, handles formulas
Large files (>100MB)	Spire.XLS	Memory-efficient processing
Sensitive/confidential data	Spire.XLS (local)	No external server transmission

FAQ

Can Excel save directly as JSON?

No. Excel's Save As dialog supports XLSX, XLS, CSV, PDF, and XML, but not JSON. You need an online converter, a Python library, or a custom script to export Excel data to JSON.

How do I export Excel data to a JSON file?

Choose your tool, load the Excel file, extract the worksheet data, transform rows to JSON objects with column headers as keys, and write the output to a .json file.

With Pandas:

import pandas as pd
df = pd.read_excel("data.xlsx")
df.to_json("data.json", orient="records", indent=4)

What is the best Python library for converting Excel to JSON?

Pandas: Best for data analysis workflows with powerful transformations, but loads entire files into memory and cannot preserve formulas.
Spire.XLS: Best for enterprise scenarios with large files, multiple worksheets, and formula handling.

How can I export multiple worksheets to JSON?

Use Spire.XLS to iterate through worksheets and organize them in a dictionary with sheet names as keys:

from spire.xls import Workbook
import json

workbook = Workbook()
workbook.LoadFromFile("multi_sheet.xlsx")

result = {}
for sheet in workbook.Worksheets:
    sheet_data = []  # Extract sheet data
    # ... extraction logic ...
    result[sheet.Name] = sheet_data

with open("output.json", "w") as f:
    json.dump(result, f, indent=4)

Can formulas be preserved during Excel-to-JSON conversion?

Formulas themselves cannot be preserved in JSON since JSON is a static data format. However, you can export the calculated results of formulas. Use Spire.XLS's FormulaValue property to get computed values instead of formula strings.

How do I handle large Excel files when exporting to JSON?

Avoid Pandas for large files — it loads everything into memory. Use Spire.XLS for memory-efficient cell-by-cell access. For very large datasets, consider line-delimited JSON (JSONL) format, where each line is a separate JSON object, enabling streaming processing.

Conclusion

Exporting Excel to JSON bridges the gap between spreadsheet data and modern applications. For quick conversions, online tools get the job done without any setup. When you need data analysis capabilities, Pandas provides powerful transformations. For enterprise scenarios with large files, multiple worksheets, or formula handling, Spire.XLS delivers the control and precision you need. Choose based on your file size, complexity, and workflow requirements.

Further Reading:

Published in xls

Thursday, 28 May 2026 02:21

How to Download / Export Excel Files in JavaScript & React

Excel File Export in JavaScript and React

Modern web applications often need to generate downloadable Excel reports directly in the browser without relying on backend services. Whether you're building dashboards, reporting tools, or data-heavy business applications, browser-based spreadsheet export has become a common frontend requirement.

The challenge lies in creating Excel files that work across different browsers while maintaining formatting, supporting multiple output formats, and ensuring fast downloads—all without sending sensitive data to a server. Traditional approaches often require complex server-side processing or rely on limited client-side libraries.

Spire.XLS for JavaScript enables developers to generate, export, and download Excel files using JS entirely in the browser using WebAssembly technology. This approach provides true client-side Excel generation with support for multiple formats including XLS, XLSX, XLSB, ODS, PDF, XML, and XPS.

This article demonstrates how to generate and download Excel files in modern JavaScript and React applications using browser-side processing with Spire.XLS for JavaScript. We'll cover basic file generation, stream-based exports, React integration, and HTML table conversion with practical code examples.

Quick Navigation

Why Export Excel in Browser
Install Spire.XLS for JavaScript
Download Excel File in JavaScript
Export HTML Table to Excel
React JS Export Excel Example
About Client-Side Excel Export
Troubleshooting
Conclusion
FAQ

Why Export Excel in Browser

Browser-side Excel export provides significant advantages over traditional server-side approaches:

Enhanced Privacy – Sensitive data never leaves the client device, reducing security risks and compliance concerns
Faster Downloads – Eliminating server round-trips reduces latency and improves user experience
No Server-Side Processing – Reduces backend infrastructure costs and eliminates server bottlenecks
Works Offline – Client-side generation functions even without network connectivity
Scalable Architecture – Each user's browser handles their own export, distributing computational load
Framework Agnostic – Works seamlessly with React, Vue, Angular, and vanilla JavaScript applications

By implementing Excel export functionality in the browser, developers can create responsive, secure, and cost-effective solutions that scale naturally with user demand.

Install Spire.XLS for JavaScript

Before generating and downloading Excel files in JavaScript, you need to install Spire.XLS for JavaScript and configure it in your development environment.

Installation via npm

Spire.XLS for JavaScript can be installed via npm:

npm i spire.xls

After installation, include the library in your project:

import { Workbook } from '@e-iceblue/spire.xls';

Note: The current WebAssembly runtime is provided through the spire.office package structure internally, even when installing spire.xls from npm. This is why initialization imports reference /node_modules/spire.office/.

Manual Installation

Alternatively, you can download the package from the e-iceblue website and copy the dependencies to your project directory.

For detailed setup instructions, refer to the Getting Started with Spire.XLS for JavaScript.

Initialize the WASM Module

Before using Spire.XLS, you must initialize the WebAssembly module. The initialization process loads required resources and sets up the runtime:

// Import and initialize the common module first
import('/node_modules/spire.office/spire.common.js').then(async (commonModule) => {
    // Initialize the WASM runtime
    await commonModule.initializeWasm();
    
    // Load the XLS module
    await import('/node_modules/spire.office/spire.xls.js');
    
    console.log('Spire.XLS ready');
});

Important Notes:

Initialization is required before accessing window.spirexls or window.xlswasm
The browser downloads required WebAssembly resources during first load
Always verify the module exists before performing Excel operations

Version Note: This article uses spire.office v11.4.1+. The module is accessed via window.spirexls or window.xlswasm. Older examples using window.wasmModule.spirexls may require updates.

Spire.XLS for JavaScript integrates seamlessly with all major frontend frameworks and build tools:

React – Use with hooks (useState, useEffect) for state-driven Excel export components
Vue.js – Integrate with Vue's reactive data system and lifecycle methods
Angular – Compatible with Angular services and dependency injection patterns
Next.js – Works in client-side components for server-rendered React applications

The WebAssembly module loads once at application initialization and can be shared across components, making it efficient for multi-page applications regardless of the framework choice.

Download Excel File in JavaScript

The following example demonstrates how to generate an Excel file with Spire.XLS for JavaScript and download it directly in the browser.

Create and Download an XLSX File

// Ensure the WASM module has been initialized
if (!window.spirexls && !window.xlswasm) {
    console.error("Spire.XLS is not initialized.");
    return;
}

// Get the initialized WebAssembly module
const wasmModule = window.spirexls || window.xlswasm;

// Create a new workbook
const workbook = new wasmModule.Workbook();
const worksheet = workbook.Worksheets.get(0);

// Create sample data
const products = [
    ["Product", "Quantity", "Price"],
    ["Laptop", 10, 999.99]
    ["Mouse", 50, 24.99]
]

// Insert data into the worksheet
for (let i = 0; i < products.length; i++) {
    for (let j = 0; j < products[i].length; j++) {
        if (typeof products[i][j] === "string") {
            worksheet.Range.get({ row: i + 1, column: j + 1 }).Text = products[i][j];
        }
        else {
            worksheet.Range.get({ row: i + 1, column: j + 1 }).NumberValue = products[i][j];
        }
    }
}

// Add a total column
worksheet.Range.get({ row: 1, column: products[0].length + 1 }).Text = "Total";
worksheet.Range.get({ row: 2, column: products[0].length + 1 }).Formula = "=B2*C2";
worksheet.Range.get({ row: 3, column: products[0].length + 1 }).Formula = "=B3*C3";

// Save the workbook to the virtual file system (VFS)
const outputFileName = "Report.xlsx";

workbook.SaveToFile({
    fileName: outputFileName,
    version: wasmModule.ExcelVersion.Version2016
});

// Release workbook resources
workbook.Dispose();

// Read the generated file from VFS
const fileArray =
    window.dotnetRuntime.Module.FS.readFile(outputFileName);

// Create a Blob object
const excelBlob = new Blob(
    [fileArray],
    {
        type: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
    }
);

// Trigger browser download
const url = URL.createObjectURL(excelBlob);
const a = document.createElement("a");
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);

Below is a preview of the generated XLSX file:

Generate and Download an Excel File in JavaScript

How the Export Process Works

Create a workbook and populate worksheet data
Save the workbook into the WebAssembly virtual file system (VFS)
Read the generated XLSX file from VFS
Convert the file data into a Blob object
Trigger the browser download using a temporary URL

About the Virtual File System (VFS)

The file generated by SaveToFile() is stored in the WebAssembly virtual file system rather than the user's physical disk. This in-memory file system allows Spire.XLS to perform standard file operations securely inside the browser environment. The downloaded XLSX file is created after reading the generated file data from VFS and converting it into a browser Blob object.

Advantages of This Approach

Works entirely in the browser
No server-side processing required
Uses standard browser Blob download APIs
Supports direct XLSX file generation with Spire.XLS

If you also need to work with lightweight data exchange formats, you can further explore how to convert Excel files to CSV and import CSV data into Excel using JavaScript.

Export HTML Tables to Excel in JavaScript

In dashboard and reporting applications, business data is often displayed as HTML tables. Instead of rebuilding spreadsheet structures manually, you can directly convert existing frontend tables into Excel workbooks using Spire.XLS for JavaScript.

The following example demonstrates a complete browser-side workflow that:

Reads an existing HTML table from the page
Converts the HTML table into an Excel workbook
Applies Excel-native formatting
Downloads the generated XLSX file directly in the browser

HTML Table Export Example

async function exportTableToExcel() {

    if (!window.spirexls && !window.xlswasm) {
        alert("Spire.XLS module not loaded yet.");
        return;
    }

    const button = document.getElementById("exportBtn");

    button.disabled = true;
    button.innerText = "Exporting...";

    const wasmModule = window.spirexls || window.xlswasm;

    try {

        // Get HTML table
        const tableHtml =
            document.getElementById("salesTable").outerHTML;

        // Remove inline styles
        const safeTableHtml =
            tableHtml.replace(/style="[^"]*"/g, '');

        const htmlContent = `
            <!DOCTYPE html>
            <html>
            <head>
                <meta charset="UTF-8">
            </head>
            <body>
                ${safeTableHtml}
            </body>
            </html>
        `;

        const htmlFileName = "Table.html";

        window.dotnetRuntime.Module.FS.writeFile(
            htmlFileName,
            htmlContent
        );

        const workbook = new wasmModule.Workbook();

        workbook.LoadFromHtml(htmlFileName);

        const sheet = workbook.Worksheets.get(0);

        const lastRow = Number(sheet.LastRow);
        const lastCol = Number(sheet.LastColumn);

        const headerRow =
            sheet.Range.get_Item(1, 1, 1, lastCol);

        headerRow.BuiltInStyle =
            wasmModule.BuiltInStyles.Heading3;

        for (let i = 2; i <= lastRow; i++) {

            const row =
                sheet.Range.get_Item(i, 1, i, lastCol);

            row.BuiltInStyle =
                i % 2 === 0
                    ? wasmModule.BuiltInStyles.Accent3_20
                    : wasmModule.BuiltInStyles.Accent3_60;
        }

        for (let j = 1; j <= lastCol; j++) {
            sheet.AutoFitColumn(j);
        }

        const outputFileName = "SalesReport.xlsx";

        workbook.SaveToFile({
            fileName: outputFileName,
            version: wasmModule.ExcelVersion.Version2016
        });

        workbook.Dispose();

        const fileData =
            window.dotnetRuntime.Module.FS.readFile(outputFileName);

        const blob = new Blob([fileData], {
            type:
                "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
        });

        const url = URL.createObjectURL(blob);

        const a = document.createElement("a");

        a.href = url;
        a.download = outputFileName;

        document.body.appendChild(a);
        a.click();

        document.body.removeChild(a);

        URL.revokeObjectURL(url);

    } catch (error) {

        alert("Export failed: " + error.message);

    } finally {

        button.disabled = false;
        button.innerText = "Export Excel";
    }
}

The following screenshot shows the HTML-based sales report table example displayed in the browser before export.

HTML-based Sales Report Table

After exporting, the generated Excel workbook preserves the tabular structure and applies additional Excel-native formatting.

Export HTML Table to Excel in JavaScript

Why Use HTML-based Excel Export

Using HTML-based export provides several advantages for modern web applications:

Reuse existing frontend tables without rebuilding spreadsheet layouts
Reduce duplicate data formatting and export logic
Apply Excel-native styles after importing HTML tables
Export business reports directly from dashboard pages

With Spire.XLS for JavaScript, you can quickly convert browser-rendered HTML tables into downloadable Excel files while keeping the entire export workflow on the client side.

For scenarios that require rendering Excel spreadsheets as browser-based HTML tables, you can also refer to our article about converting Excel to HTML in JavaScript.

Export Excel in React with JavaScript

Integrating Excel export into React applications is straightforward. The key is initializing the WebAssembly runtime before rendering React components and properly releasing workbook resources after export operations.

Initialize Spire.XLS in React

Before creating export components, initialize the WebAssembly module in your app entry file (main.jsx or index.js):

import { StrictMode } from 'react';
import { createRoot } from 'react-dom/client';
import App from './App.jsx';

// Initialize Spire.XLS before mounting React
const initializeSpire = async () => {

    // Load the common runtime
    const commonModule = await import(
        '/node_modules/spire.office/spire.common.js'
    );

    // Initialize WebAssembly runtime
    await commonModule.initializeWasm();

    // Load Spire.XLS module
    await import(
        '/node_modules/spire.office/spire.xls.js'
    );

    // Optional: preload fonts if needed
    // await window.spire.FetchFileToVFS(
    //     'ARIAL.TTF',
    //     '/Library/Fonts/',
    //     '/'
    // );
};

// Start React app after initialization
initializeSpire().then(() => {

    createRoot(document.getElementById('root')).render(
        <StrictMode>
            <App />
        </StrictMode>
    );

});

Then use the React export component below in your application.

Simplified React Excel Export Component

Here's a minimal React component that demonstrates the core export pattern:

import { useState } from 'react'

const ExcelExportButton = () => {
    const [isProcessing, setIsProcessing] = useState(false);

    const handleExport = async () => {
        if ((!window.spirexls && !window.xlswasm) || isProcessing) return;

        setIsProcessing(true);
        const wasmModule = window.spirexls || window.xlswasm;

        try {
            // Create a new workbook and get the first default worksheet
            const workbook = new wasmModule.Workbook();
            const worksheet = workbook.Worksheets.get(0);

            // Insert data into the worksheet
            worksheet.Range.get("A1").Text = "Product";
            worksheet.Range.get("B1").Text = "Revenue";
            worksheet.Range.get("A2").Text = "Laptop";
            worksheet.Range.get("B2").NumberValue = 9999.90;
            worksheet.Range.get("A3").Text = "Smartphone";
            worksheet.Range.get("B3").NumberValue = 4999.99;

            const outputFileName = "Report.xlsx";

            // Save the workbook to a file in the VFS
            workbook.SaveToFile({
                fileName: outputFileName,
                version: wasmModule.ExcelVersion.Version2016
            });

            workbook.Dispose();

            const fileArray = window.dotnetRuntime.Module.FS.readFile(outputFileName);

            const excelBlob = new Blob([fileArray], {
                type: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
            });

            const url = URL.createObjectURL(excelBlob);

            const a = document.createElement('a');
            a.href = url;
            a.download = outputFileName;
            document.body.appendChild(a);
            a.click();
            document.body.removeChild(a);

            URL.revokeObjectURL(url);

        } catch (error) {
            console.error("Excel export failed:", error);
        } finally {
            setIsProcessing(false);
        }
    };

    return (
        <button onClick={handleExport} disabled={isProcessing}>
            {isProcessing ? "Generating..." : "Export to Excel"}
        </button>
    );
}

export default function App() {
    return (
        <div>
            <h1>Spire.XLS Demo</h1>
            <ExcelExportButton />
        </div>
    );
}

Key Implementation Details:

Minimal state – Only track isProcessing to disable the button during export
Direct download – Trigger download immediately without storing URLs in state
Resource cleanup – Always call Dispose() on workbook objects to prevent memory leaks
Error handling – Wrap export logic in try-catch blocks for robust error management
Loading states – Disable buttons during processing to prevent duplicate exports

Usage in Your App:

import { ExcelExportButton } from './ExcelExportButton';

function App() {
    return (
        <div>
            <h1>Sales Dashboard</h1>
            <ExcelExportButton />
        </div>
    );
}

This simplified approach focuses on the essential export flow without unnecessary complexity. For more advanced scenarios like loading external files or fonts, refer to the complete documentation.

If you also need browser-side document distribution workflows, you can further explore how to convert Excel files to PDF in JavaScript and React applications.

Client-Side Excel Generation in JavaScript Without Backend

Modern web applications increasingly generate Excel files directly in the browser instead of relying on backend services. With Spire.XLS for JavaScript, spreadsheet creation, formatting, and export operations run entirely on the client side using WebAssembly.

Why No Backend Server Is Needed

Traditional Excel export workflows usually require a server to:

Receive frontend data
Generate spreadsheet files
Return downloadable files to the browser

With WebAssembly-based processing, these steps happen entirely inside the browser runtime instead.

Benefits of Browser-side Excel Export

Compared with traditional server-side export workflows, client-side Excel generation provides several advantages:

Feature	Browser-side Export	Server-side Export
Data Processing	Runs locally in browser	Requires backend server
Privacy	Data stays on client device	Data sent over network
Response Speed	Instant local processing	Depends on network latency
Infrastructure Cost	No export server required	Requires backend resources
Offline Support	Supported	Usually unavailable
Scalability	Handled by client devices	Limited by server capacity

How Browser-side Export Works

When using Spire.XLS for JavaScript:

The WebAssembly runtime loads in the browser
Spreadsheet processing runs locally in memory
Files are temporarily stored in the browser virtual file system (VFS)
JavaScript converts the generated file into a downloadable Blob
The browser triggers the download directly

This architecture makes browser-based Excel export especially suitable for dashboards, reporting systems, internal business tools, and privacy-sensitive applications.

Troubleshooting and Best Practices

When using Spire.XLS for JavaScript in browser environments, the following issues are commonly encountered.

WASM Module Not Initialized

If window.spirexls or window.xlswasm is undefined, ensure the WebAssembly runtime is fully initialized before using the API:

await commonModule.initializeWasm();
await import('/node_modules/spire.office/spire.xls.js');

Missing Resource or ZIP Loading Errors

If the browser console shows 404 errors or WebAssembly loading failures:

Ensure ZIP and WASM resources are placed in the correct static directory
Vite projects should place assets in the public/ folder
Verify the browser can successfully load .zip and .wasm files

Font-related Warnings

Some environments may display warnings such as:

"Arial font is not installed"

You can preload fonts before creating workbooks:

await window.spire.FetchFileToVFS(
    'ARIAL.TTF',
    '/Library/Fonts/',
    '/'
);

Invalid or Corrupted XLSX Files

If Excel opens with repair warnings, explicitly specify the Excel version during export:

workbook.SaveToFile({
    fileName: outputFileName,
    version: wasmModule.ExcelVersion.Version2016
});

Memory Management

Always release workbook resources after export to avoid memory leaks in long-running applications:

const workbook = new wasmModule.Workbook();

try {
    // Excel operations
} finally {
    workbook.Dispose();
}

Browser-side Performance Considerations

For very large datasets, browser-side processing may become slow or memory-intensive. In such scenarios:

Show loading indicators during export
Avoid exporting extremely large datasets in a single operation
Consider server-side processing for enterprise-scale reports

Conclusion

Spire.XLS for JavaScript provides a practical way to generate and export Excel files directly in modern web applications using JavaScript and WebAssembly. Its browser-based architecture makes it suitable for dashboards, reporting systems, and frontend applications that require downloadable spreadsheet generation without relying on backend services.

The examples in this article demonstrate how to build browser-based Excel export workflows using JavaScript, React, and WebAssembly while keeping spreadsheet processing entirely on the client side. You can apply for a 30-day free license to evaluate all features before purchasing.

FAQ

Q1: Can I download Excel files in JavaScript without a backend server?

A1: Yes. Spire.XLS for JavaScript uses WebAssembly technology to generate and download Excel files entirely in the browser. The workbook is created in browser memory and downloaded directly without requiring any backend API or server-side processing.

Q2: How do I export HTML tables to Excel in JavaScript?

A2: You can extract an existing HTML table from the DOM, write the HTML into the WebAssembly virtual file system, and load it into a workbook using LoadFromHtml(). This approach allows you to reuse browser-rendered tables without rebuilding spreadsheet layouts manually.

Q3: Can I use Spire.XLS for JavaScript in React applications?

A3: Yes. Spire.XLS for JavaScript works with React, Vite, and other modern frontend frameworks. You only need to initialize the WebAssembly module before rendering components and then perform Excel operations directly inside React components or utility functions.

Q4: Why does Excel show a repair warning when opening exported files?

A4: This usually happens when the Excel version is not explicitly specified during export. To avoid compatibility issues, specify the output version when calling SaveToFile():

workbook.SaveToFile({
    fileName: outputFileName,
    version: wasmModule.ExcelVersion.Version2016
});

Published in Document Operation

Tagged under

XLS React Document Operation

Wednesday, 20 May 2026 07:15

Inserting Equations into Word in Python (LaTeX & MathML)

Tutorial on How to Insert Math Equations into Word in Python

Inserting mathematical equations into Word documents programmatically is essential for developers building scientific document generators, academic reporting systems, educational platforms, or engineering automation tools. Whether you're generating research papers, technical documentation, or mathematics worksheets, automating equation insertion greatly improves efficiency and consistency.

However, manually formatting equations in Microsoft Word is time-consuming, and building a mathematical rendering engine from scratch can be extremely complex. Developers often need a reliable way to add equations in Word while supporting standard mathematical formats such as LaTeX and MathML.

With Spire.Doc for Python, developers can insert mathematical equations into Word documents directly from LaTeX and MathML code using a straightforward API. This article demonstrates how to create Word equations in Python, including how to insert formulas, convert equations between LaTeX, MathML, and Office MathML (OMML), and export Word equations into different mathematical formats.

Quick Navigation

Understanding Mathematical Equations in Word Documents
Install Spire.Doc for Python
Insert Equations into Word from LaTeX in Python
Add MathML Equations to Word Documents in Python
Convert Word Equations to LaTeX or MathML
Render Equation as Image
Complete Example: Multi-Format Equation Processing
Common Pitfalls
FAQ

1. Understanding Mathematical Equations in Word Documents

Microsoft Word uses Office Math Markup Language (OMML) as its internal format for mathematical equations. OMML is an XML-based structure that controls equation layout, symbols, fractions, matrices, and other mathematical elements in Word documents. However, directly creating or editing OMML is cumbersome for most developers.

In real-world applications, mathematical content is more commonly written in LaTeX or MathML:

LaTeX is widely used in academia and scientific publishing because of its concise syntax and powerful mathematical typesetting capabilities.
MathML is an XML-based standard designed for mathematical content on the web and in educational systems.

To generate editable Word equations programmatically, developers often need to convert between these formats and Word's native equation objects.

Why Choose Spire.Doc for Python?

Spire.Doc for Python provides native support for Word equation processing through the OfficeMath class. Instead of manually generating OMML or relying on image-based workarounds, developers can directly create editable Word equations from LaTeX or MathML code.

Key capabilities include:

Capability	Supported
Insert equations from LaTeX	✓
Insert equations from MathML	✓
Export Word equations to LaTeX	✓
Export Word equations to MathML	✓
Access native OMML content	✓
Render equations as images	✓

These capabilities are particularly useful for academic report generation, educational platforms, MathML-to-Word conversion workflows, LaTeX publishing pipelines, and other automated document generation scenarios involving mathematical content.

2. Install Spire.Doc for Python

Install Spire.Doc for Python via pip:

pip install spire.doc

Import the required classes in your Python script:

from spire.doc import *

Alternatively, you can manually install the library from the Spire.Doc for Python download page.

3. Insert Equations into Word from LaTeX in Python

LaTeX is the most widely used format for writing mathematical equations in academic and scientific documents. With Spire.Doc for Python, you can convert LaTeX expressions into native Word equation objects and insert these equations directly into DOCX files.

The following example demonstrates how to insert multiple LaTeX equations into a Word document using the OfficeMath class.

from spire.doc import *

def insert_latex_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add a title paragraph
    title_para = section.AddParagraph()
    title_para.AppendText("Mathematical Equations from LaTeX")
    title_para.Format.HorizontalAlignment = HorizontalAlignment.Left
    
    # Define LaTeX equations to insert
    latex_equations = [
    r"x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}",  # Quadratic formula
    r"e^{i\pi} + 1 = 0",  # Euler's identity
    r"\int_0^\infty e^{-x} \, dx = 1",  # Definite integral
    # Summation formula
    r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}",
    r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}",  # Summation formula
    r"A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}",  # Matrix
    r"P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}",  # Probability formula
    r"\sin^2\theta + \cos^2\theta = 1",  # Trigonometric identity
    ]
    
    # Insert each LaTeX equation as a separate paragraph
    for latex_code in latex_equations:
        # Create an OfficeMath object from LaTeX code
        office_math = OfficeMath(doc)
        office_math.FromLatexMathCode(latex_code)
        
        # Add the equation to a new paragraph
        para = section.AddParagraph()
        para.Items.Add(office_math)
    
    # Save the document
    doc.SaveToFile("latex_equations.docx", FileFormat.Docx2019)
    doc.Close()
    print("LaTeX equations inserted successfully!")

if __name__ == "__main__":
    insert_latex_equations()

The following screenshot shows the generated Word document with equations converted from LaTeX code.

LaTeX equations inserted into Word document using Python

Key API Methods

Document – Represents the Word document container used to create sections and paragraphs
OfficeMath – Represents a mathematical equation object in Word documents
FromLatexMathCode() – Converts LaTeX mathematical code into an Office Math object that Word can render natively
Items.Add() – Adds the OfficeMath object to a paragraph's content collection
SaveToFile() – Saves the document to disk in DOCX format using FileFormat.Docx2019

This approach supports complex LaTeX constructs such as fractions, integrals, matrices, Greek letters, and other mathematical operators while preserving native Word equation formatting.

Adding Inline Equations

In addition to standalone equations, you can insert inline equations within text paragraphs. This is useful for embedding mathematical expressions within sentences or explanations.

from spire.doc import *

def insert_inline_equation():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add introductory text
    para = section.AddParagraph()
    para.AppendText("The quadratic formula is ")
    
    # Insert inline equation
    office_math = OfficeMath(doc)
    office_math.FromLatexMathCode(r"x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}")
    para.Items.Add(office_math)
    
    para.AppendText(", where a ≠ 0.")
    
    # Save the document
    doc.SaveToFile("inline_equation.docx", FileFormat.Docx2019)
    doc.Close()

if __name__ == "__main__":
    insert_inline_equation()

The inserted equation appears inline within the text:

Inline equation inserted into Word document using Python

This approach makes it easy to embed mathematical expressions directly within regular text content, which is useful for educational materials, research papers, and technical documentation.

If you need to combine equations with formatted text, headings, tables, and other structured document elements, you can also refer to our tutorial on creating structured Word documents in Python.

4. Add MathML Equations to Word Documents in Python

MathML (Mathematical Markup Language) is an XML-based standard for representing mathematical expressions on the web and in digital documents. It's commonly used in online education platforms, scientific databases, and content management systems. The following example shows how to convert MathML to Word equations using Spire.Doc for Python.

from spire.doc import *

def insert_mathml_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Add a title paragraph
    title_para = section.AddParagraph()
    title_para.AppendText("Mathematical Equations from MathML")
    
    # Define MathML equations to insert
    mathml_equations = [
    # Euler's identity
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup>'
    r'<mo>+</mo><mn>1</mn><mo>=</mo><mn>0</mn>'
    r'</math>',
    # Pythagorean theorem
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msup><mi>a</mi><mn>2</mn></msup>'
    r'<mo>+</mo>'
    r'<msup><mi>b</mi><mn>2</mn></msup>'
    r'<mo>=</mo>'
    r'<msup><mi>c</mi><mn>2</mn></msup>'
    r'</math>',
    # Fraction expression
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<mfrac>'
    r'<mrow><mi>x</mi><mo>+</mo><mi>y</mi></mrow>'
    r'<mrow><mi>z</mi><mo>−</mo><mn>1</mn></mrow>'
    r'</mfrac>'
    r'</math>',
    # Integral equation
    r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
    r'<msubsup><mo>∫</mo><mn>0</mn><mn>1</mn></msubsup>'
    r'<msup><mi>x</mi><mn>2</mn></msup>'
    r'<mi>d</mi><mi>x</mi>'
    r'<mo>=</mo>'
    r'<mfrac><mn>1</mn><mn>3</mn></mfrac>'
    r'</math>'
    ]
    
    # Insert each MathML equation as a separate paragraph
    for mathml_code in mathml_equations:
        # Create an OfficeMath object from MathML code
        office_math = OfficeMath(doc)
        office_math.FromMathMLCode(mathml_code)
        
        # Add the equation to a new paragraph
        para = section.AddParagraph()
        para.Items.Add(office_math)
    
    # Save the document
    doc.SaveToFile("mathml_equations.docx", FileFormat.Docx2019)
    doc.Close()
    print("MathML equations inserted successfully!")

if __name__ == "__main__":
    insert_mathml_equations()

The following screenshot shows the generated Word document with equations converted from MathML code.

MathML equations converted to Word format using Python

Key API Method

FromMathMLCode() – Parses MathML markup and converts it into a native Word equation object.

MathML support is especially useful when working with XML-based educational content, web-based equation systems, and STEM learning platforms that store mathematical expressions in MathML format.

Combining LaTeX and MathML in One Document

You can mix both LaTeX and MathML equations within the same document, allowing flexibility in content sources:

from spire.doc import *

def insert_mixed_equations():
    # Create a new Word document
    doc = Document()
    section = doc.AddSection()
    
    # Insert LaTeX equation
    latex_para = section.AddParagraph()
    latex_math = OfficeMath(doc)
    latex_math.FromLatexMathCode(r"E = mc^2")
    latex_para.Items.Add(latex_math)
    
    # Insert MathML equation
    mathml_para = section.AddParagraph()
    mathml_math = OfficeMath(doc)
    mathml_math.FromMathMLCode(
        r'<math xmlns="http://www.w3.org/1998/Math/MathML">'
        r'<mi>F</mi><mo>=</mo><mi>m</mi><mi>a</mi>'
        r'</math>'
    )
    mathml_para.Items.Add(mathml_math)
    
    # Save the document
    doc.SaveToFile("mixed_equations.docx", FileFormat.Docx2019)
    doc.Close()

if __name__ == "__main__":
    insert_mixed_equations()

This approach is useful when mathematical content comes from different sources, such as LaTeX-based publishing systems and MathML-based web applications.

If your mathematical content originates from web pages or HTML-based systems, you can also refer to our tutorial on converting HTML content to Word documents in Python.

5. Convert Word Equations to LaTeX, MathML, and OMML

Besides inserting equations into Word documents, Spire.Doc for Python also supports exporting Word equations to multiple mathematical markup formats. This is useful for interoperability between Word, LaTeX publishing systems, web-based MathML platforms, and custom XML workflows.

The following example demonstrates how to extract equations from a Word document and export them as LaTeX, MathML, and Office MathML (OMML).

from spire.doc import *

def export_equation_formats():
    # Load a Word document containing equations
    doc = Document()
    doc.LoadFromFile("equations.docx")

    # Access the first paragraph
    section = doc.Sections[0]
    para = section.Paragraphs[0]

    # Find OfficeMath objects
    for item in para.ChildObjects:
        if isinstance(item, OfficeMath):

            # Export to LaTeX
            latex_code = item.ToLaTexMathCode()
            print("LaTeX:")
            print(latex_code)
            print()

            # Export to MathML
            mathml_code = item.ToMathMLCode()
            print("MathML:")
            print(mathml_code)
            print()

            # Export to Office MathML (OMML)
            omml_code = item.ToOfficeMathMLCode()
            print("OMML:")
            print(omml_code)

            # Save outputs to files
            with open("equation.tex", "w", encoding="utf-8") as f:
                f.write(latex_code)

            with open("equation.xml", "w", encoding="utf-8") as f:
                f.write(mathml_code)

            with open("equation.omml", "w", encoding="utf-8") as f:
                f.write(omml_code)

            break

    doc.Close()

if __name__ == "__main__":
    export_equation_formats()

The following screenshot shows the exported equation formats printed in the Python console.

Export Word equations to LaTeX, MathML, and OMML using Python

Supported Export Formats

Format	Primary Use Case	Characteristics
LaTeX	Academic publishing and scientific papers	Compact syntax widely used in academia
MathML	Web-based mathematical content	XML-based format designed for browsers and educational systems
OMML	Microsoft Word integration	Native Office equation format with full Word compatibility

These export capabilities make it easier to:

Convert Word equations into LaTeX publishing workflows
Publish equations on websites using MathML
Integrate Word documents with XML-based systems
Inspect and debug Word equation structures using OMML

6. Render Office Math Equations to Images

In some scenarios, you may need to export equations as image files for use in presentations, web pages, or other non-editable contexts. Spire.Doc for Python allows you to render Office Math equations into image streams that can be saved as image files.

from spire.doc import *

def render_equation_as_image():
    # Create a new Word document with an equation
    doc = Document()
    section = doc.AddSection()
    para = section.AddParagraph()

    # Insert an equation
    office_math = OfficeMath(doc)
    office_math.FromLatexMathCode(
        r"\int_0^\infty e^{-x^2} dx = \frac{\sqrt{\pi}}{2}"
    )
    para.Items.Add(office_math)

    # Render the equation as an image stream
    image_stream = office_math.SaveImageToStream(ImageType.Bitmap)

    # Save the image to file
    with open("equations/equation.png", "wb") as f:
        f.write(image_stream.ToArray())

    # Release unmanaged resources
    image_stream.Dispose()
    doc.Close()

    print("Equation rendered as image successfully!")

if __name__ == "__main__":
    render_equation_as_image()

The following screenshot shows the equation rendered as an image file.

Mathematical equation rendered as image from Word

This feature is particularly useful for:

Embedding equations in presentations
Displaying formulas on web pages
Generating static previews for document systems

If you want to render complete Word documents as images rather than exporting individual equations, check out our tutorial on converting Word documents to images in Python.

7. Complete Example: Multi-Format Equation Processing

The following comprehensive example demonstrates a complete workflow that combines multiple equation operations: inserting equations from different sources, exporting to various formats, and rendering as images.

from spire.doc import *

def complete_equation_workflow():
    """
    Demonstrates a complete workflow for equation processing:
    - Create equations from LaTeX and MathML
    - Export equations to LaTeX and MathML
    - Render equations as images
    """

    # Create a new Word document
    doc = Document()
    section = doc.AddSection()

    # Add document title
    title_para = section.AddParagraph()
    title_text = title_para.AppendText("Complete Equation Processing Workflow")
    title_text.CharacterFormat.FontSize = 16
    title_text.CharacterFormat.Bold = True
    title_para.Format.HorizontalAlignment = HorizontalAlignment.Center

    # Insert equations from LaTeX
    latex_section_title = section.AddParagraph()
    latex_title_text = latex_section_title.AppendText("\nEquations from LaTeX:")
    latex_title_text.CharacterFormat.Bold = True

    latex_examples = [
        (r"E = mc^2", "Einstein's Mass-Energy Equivalence"),
        (r"\sum_{i=1}^{n} i = \frac{n(n+1)}{2}", "Sum of First n Integers"),
        (r"\frac{d}{dx}\left(\int_a^x f(t)dt\right) = f(x)", "Fundamental Theorem of Calculus")
    ]

    first_equation = None

    for latex_code, description in latex_examples:
        # Add description
        desc_para = section.AddParagraph()
        desc_para.AppendText(f"{description}:")

        # Insert equation
        office_math = OfficeMath(doc)
        office_math.FromLatexMathCode(latex_code)

        eq_para = section.AddParagraph()
        eq_para.Items.Add(office_math)

        if first_equation is None:
            first_equation = office_math

    # Insert equations from MathML
    mathml_section_title = section.AddParagraph()
    mathml_title_text = mathml_section_title.AppendText("\nEquations from MathML:")
    mathml_title_text.CharacterFormat.Bold = True

    mathml_examples = [
        (
            r'<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>+</mo><mi>b</mi><mo>=</mo><mi>c</mi></math>',
            "Simple Addition"
        ),
        (
            r'<math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>e</mi><mrow><mi>i</mi><mi>π</mi></mrow></msup><mo>+</mo><mn>1</mn><mo>=</mo><mn>0</mn></math>',
            "Euler's Identity"
        )
    ]

    for mathml_code, description in mathml_examples:
        # Add description
        desc_para = section.AddParagraph()
        desc_para.AppendText(f"{description}:")

        # Insert equation
        office_math = OfficeMath(doc)
        office_math.FromMathMLCode(mathml_code)

        eq_para = section.AddParagraph()
        eq_para.Items.Add(office_math)

    # Save the Word document
    output_docx = "complete_equations.docx"
    doc.SaveToFile(output_docx, FileFormat.Docx2019)
    print(f"Word document saved: {output_docx}")

    # Export the first equation to LaTeX
    latex_export = first_equation.ToLaTexMathCode()

    with open("exported_equation.tex", "w", encoding="utf-8") as f:
        f.write(latex_export)

    print(f"Exported to LaTeX: {latex_export}")

    # Export the first equation to MathML
    mathml_export = first_equation.ToMathMLCode()

    with open("exported_equation.xml", "w", encoding="utf-8") as f:
        f.write(mathml_export)

    print("Exported to MathML")

    # Render the first equation as an image
    image_stream = first_equation.SaveImageToStream(ImageType.Bitmap)

    with open("equation_render.png", "wb") as f:
        f.write(image_stream.ToArray())

    # Release unmanaged resources
    image_stream.Dispose()

    print("Equation rendered as image successfully!")

    # Clean up
    doc.Close()

    print("\nWorkflow completed successfully!")

if __name__ == "__main__":
    complete_equation_workflow()

The generated Word document will look like this:

Complete Equation Processing Workflow

This complete example demonstrates:

Multi-source equation insertion – Combining LaTeX and MathML inputs
Descriptive labeling – Adding context to each equation
Format conversion – Exporting to LaTeX and MathML
Image rendering – Creating visual representations
Resource management – Proper cleanup of document objects

The resulting Word document contains well-formatted equations with descriptions, while the exported files provide alternative formats for different use cases.

8. Common Pitfalls

Raw String Literals for LaTeX

When writing LaTeX code in Python strings, always use raw strings (prefix with r) to prevent escape sequence interpretation:

# Correct: Use raw string
latex_code = r"\int_0^\infty e^{-x} dx"

# Incorrect: Backslashes will be interpreted as escape sequences
latex_code = "\int_0^\infty e^{-x} dx"

Unsupported LaTeX Commands

Not all LaTeX commands are supported by Word's equation engine. Some advanced LaTeX constructs may not render correctly. Stick to standard mathematical notation whenever possible:

# Supported: Standard mathematical notation
office_math.FromLatexMathCode(r"\alpha + \beta = \gamma")

# Some advanced LaTeX constructs may not be supported
# office_math.FromLatexMathCode(r"\begin{align} ... \end{align}")

MathML Namespace Requirements

MathML code must include the proper namespace declaration to parse correctly:

# Correct: Include namespace
mathml = r'<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>'

# Incorrect: Missing namespace may fail
mathml = r'<math><mi>x</mi></math>'

Memory Management

Always close documents after processing to release resources, especially in batch operations:

doc = Document()

try:
    # Process equations
    doc.SaveToFile("output.docx", FileFormat.Docx2019)

finally:
    doc.Close()  # Ensure cleanup even if errors occur

Character Encoding

When saving exported LaTeX or MathML to files, ensure proper UTF-8 encoding for special characters:

with open("equation.tex", "w", encoding="utf-8") as f:
    f.write(latex_code)

Image Stream Disposal

Always dispose of image streams after use to properly release resources:

image_stream = office_math.SaveImageToStream(ImageType.Bitmap)

try:
    with open("equation.png", "wb") as f:
        f.write(image_stream.ToArray())

finally:
    image_stream.Dispose()

Conclusion

In this article, we demonstrated how to insert mathematical equations into Word documents in Python using Spire.Doc for Python. By leveraging the Spire API, developers can create Word equations from LaTeX and MathML code, convert between LaTeX, MathML, and Word’s native OMML format, and render equations as images. This capability is essential for automating scientific document generation, educational content creation, and mathematical publishing workflows.

Spire.Doc for Python provides comprehensive equation processing capabilities beyond basic insertion, including conversion between LaTeX and MathML into Word’s native OMML format, as well as exporting Word equations back to LaTeX, MathML, and OMML. The library simplifies complex mathematical typesetting while maintaining compatibility with Microsoft Word’s native equation engine.

If you want to evaluate the full capabilities of Spire.Doc for Python, you can apply for a 30-day free license.

9. FAQ

How do I insert equations into Word using Python?

Use the OfficeMath class from Spire.Doc for Python. Create an OfficeMath object, call FromLatexMathCode() or FromMathMLCode() with your equation code, then add it to a paragraph using para.Items.Add(office_math). Finally, save the document using doc.SaveToFile().

Can I add LaTeX equations to Word documents in Python?

Yes. Spire.Doc for Python supports inserting equations from LaTeX code using the FromLatexMathCode() method. Standard mathematical notation such as fractions, integrals, superscripts, subscripts, and Greek letters can be converted into Word-compatible equations.

Does Spire.Doc support MathML equations?

Yes. You can create Word equations from MathML using the FromMathMLCode() method. Make sure the MathML content includes the correct namespace declaration:

<math xmlns="http://www.w3.org/1998/Math/MathML">

Can I export Word equations back to LaTeX or MathML?

Yes. Spire.Doc for Python provides methods such as ToLaTexMathCode() and ToMathMLCode() to export Office Math equations into LaTeX or MathML formats. This is useful for content migration, storage, or integration with other mathematical systems.

How can I render equations as images?

Use the SaveImageToStream() method on an OfficeMath object to render the equation as an image stream. You can then save the stream as an image file and use it in presentations, web pages, or preview systems.

Published in Document Operation

Tagged under

doc Python Document Operation

Friday, 15 May 2026 09:23

Convert JavaScript to Word with Python Automation

JavaScript code displayed in a formatted Word document with syntax highlighting

Modern development teams often need to share JavaScript or JSX source code with project managers, clients, auditors, or educators who don't use code editors. However, raw .js and .jsx files are difficult to review outside tools like VS Code or WebStorm, while manually copying code into Word documents frequently breaks indentation, formatting, and readability.

Using Spire.Doc for Python together with Pygments, developers can convert JavaScript to Word in Python with syntax highlighting and customizable document formatting. This automated approach is useful for technical documentation, compliance archiving, educational materials, code reviews, and client deliverables.

In this article, you'll learn how to convert JavaScript and JSX files to Word documents in Python using Spire.Doc for Python, including basic conversion, advanced formatting techniques, batch processing, and PDF export.

Quick Navigation

Understanding the Conversion Workflow
Prerequisites
Basic Implementation of JavaScript to Word Conversion
Advanced Scenarios
Common Pitfalls
Conclusion
FAQ

1. Understanding the Conversion Workflow

The conversion process uses Pygments to generate syntax-highlighted HTML, then imports this HTML into a Word document using Spire.Doc's HTML import functionality:

Read source code from .js or .jsx files
Generate syntax-highlighted HTML using Pygments' highlight() function
Import the HTML into Word using AppendHTML()

This approach provides syntax coloring through Pygments' built-in styles, while Spire.Doc handles document structure including margins, headers, footers, and multi-format export. It provides a simple and flexible API for automating the conversion process.

2. Prerequisites

Before converting JavaScript files to Word documents in Python, you need to install Spire.Doc for Python and Pygments:

pip install spire.doc
pip install pygments

Verify the packages are available:

import spire.doc
from pygments import highlight
from pygments.formatters import HtmlFormatter

Alternatively, you can download Spire.Doc for Python and add it to your project.

3. Basic Implementation

The following example converts a JavaScript file to a Word document with syntax highlighting:

from spire.doc import *
from pygments import highlight
from pygments.lexers import JavascriptLexer
from pygments.formatters import HtmlFormatter

def convert_js_to_word(input_file: str, output_file: str) -> None:
    """Convert JavaScript file to Word document with syntax highlighting."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    title_paragraph = section.AddParagraph()
    title_text = title_paragraph.AppendText(f"Source Code: {input_file}")
    title_text.CharacterFormat.FontName = "Arial"
    title_text.CharacterFormat.FontSize = 14
    title_text.CharacterFormat.Bold = True
    title_paragraph.Format.AfterSpacing = 10
    
    html_formatter = HtmlFormatter(
        nowrap=True,
        style='colorful',
        noclasses=True
    )
    
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    code_paragraph = section.AddParagraph()
    code_paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()
    
    print(f"Converted {input_file} to {output_file}")

convert_js_to_word("app.js", "JavaScriptCode.docx")

Word document showing JavaScript code with blue keywords, green strings, and gray comments

Key Components

Document – Word document container for sections, paragraphs, and content
Section – Document section with page setup properties (margins, orientation)
Paragraph – Text container with formatting options
AppendHTML() – Imports HTML content into the paragraph, including inline styles for colors and fonts
highlight() – Pygments function that generates syntax-highlighted output
HtmlFormatter – Pygments formatter producing HTML with inline styles (use noclasses=True)
JavascriptLexer – Pygments lexer that identifies JavaScript syntax elements

Spire.Doc can import syntax-highlighted HTML generated by Pygments, allowing JavaScript code formatting and colors to be preserved in Word documents.

4. Advanced Scenarios

Convert JSX Files

For JSX files, it's recommended to use JsxLexer instead of JavascriptLexer to achieve more accurate syntax highlighting for component tags and embedded JSX expressions.

Example JSX input (App.jsx):

``jsx import React, { useState } from 'react';

const TodoList = () => { const [todos, setTodos] = useState([]);

return (
    <div className="todo-container">
        <h1>My Tasks</h1>
    </div>
);

};

export default TodoList;


Use `JsxLexer` when generating syntax-highlighted HTML:

```python
from pygments.lexers import JsxLexer

highlighted_html = highlight(
    jsx_code,
    JsxLexer(),
    html_formatter
)

Then convert the highlighted JSX content to Word using the same AppendHTML() workflow:

convert_js_to_word("App.jsx", "ReactComponent.docx")

The conversion result looks like this:

Word document showing JSX code with blue keywords, green strings, and gray comments

JsxLexer provides improved recognition for JSX tags, attributes, and embedded expressions compared to the standard JavaScript lexer, resulting in more accurate syntax coloring in the generated Word document.

Batch Convert Multiple Files

If you need to convert large numbers of JavaScript or JSX files, you can automate the process by scanning a folder and generating Word documents in batches.

import os
from pathlib import Path

def batch_convert_js_files(source_folder: str, output_folder: str) -> None:
    """Convert all JavaScript files in a folder to Word documents."""
    
    Path(output_folder).mkdir(parents=True, exist_ok=True)
    
    js_extensions = ('.js', '.jsx', '.mjs')
    
    converted_count = 0
    error_count = 0
    
    for filename in os.listdir(source_folder):
        if filename.lower().endswith(js_extensions):
            input_path = os.path.join(source_folder, filename)
            
            base_name = os.path.splitext(filename)[0]
            output_path = os.path.join(output_folder, f"{base_name}.docx")
            
            try:
                convert_js_to_word(input_path, output_path)
                converted_count += 1
            except Exception as e:
                print(f"Error converting {filename}: {str(e)}")
                error_count += 1
    
    print(f"\nBatch conversion complete:")
    print(f"  Converted: {converted_count} files")
    print(f"  Errors: {error_count} files")

batch_convert_js_files("src/scripts", "output/docs")

Add Line Numbers

Line numbers can improve readability during code reviews, audits, or technical documentation. Since Word HTML rendering may not fully support Pygments' built-in line number layouts, a practical approach is to prepend custom line numbers after syntax highlighting.

html_formatter = HtmlFormatter(
    nowrap=True,
    noclasses=True,
    style="colorful"
)

highlighted_html = highlight(
    js_code,
    JavascriptLexer(),
    html_formatter
)

highlighted_lines = highlighted_html.splitlines()

numbered_lines = []

for index, line in enumerate(highlighted_lines, start=1):

    numbered_line = (
        f'<span style="color: gray; font-weight: bold;">'
        f'{index:4d}  '
        f'</span>{line}'
    )

    numbered_lines.append(numbered_line)

combined_html = (
    '<pre style="font-family: Consolas; '
    'font-size: 10pt; line-height: 1.4;">'
    + '\n'.join(numbered_lines) +
    '</pre>'
)

paragraph.AppendHTML(combined_html)

The generated Word document with line numbers looks like this:

Word document showing JavaScript code with blue keywords, green strings, and gray comments with line numbers

Add Headers and Footers

Headers and footers help organize generated Word documents by adding titles, page numbers, and document metadata. This is especially useful for formal reports or exported technical documentation.

def add_document_metadata(section: Section, document_title: str) -> None:
    """Add header and footer to document section."""
    
    header = section.HeadersFooters.Header.AddParagraph()
    header_text = header.AppendText(document_title)
    header_text.CharacterFormat.FontName = "Arial"
    header_text.CharacterFormat.FontSize = 10
    header_text.CharacterFormat.TextColor = Color.get_Black()
    header.Format.HorizontalAlignment = HorizontalAlignment.Left
    header.Format.TextAlignment = TextAlignment.Top
    
    header.Format.Borders.Bottom.BorderType = BorderStyle.Single
    header.Format.Borders.Bottom.Color = Color.get_Black()
    
    footer = section.HeadersFooters.Footer.AddParagraph()
    footer.Format.HorizontalAlignment = HorizontalAlignment.Center
    footer.Format.TextAlignment = TextAlignment.Bottom
    
    page_field = footer.AppendField("page", FieldType.FieldPage)
    page_field.CharacterFormat.FontName = "Arial"
    page_field.CharacterFormat.FontSize = 9
    
    footer.AppendText(" of ")
    total_pages_field = footer.AppendField("numPages", FieldType.FieldNumPages)
    total_pages_field.CharacterFormat.FontName = "Arial"
    total_pages_field.CharacterFormat.FontSize = 9

document = Document()
document.LoadFromFile("CodeWithLines.docx")
section = document.Sections[0]
add_document_metadata(section, "JavaScript Source Code Documentation")
document.SaveToFile("CodeWithHeadersFooters.docx", FileFormat.Docx)

The generated Word document with headers and footers looks like this:

Word document showing JavaScript code with blue keywords, green strings, and gray comments with line numbers and headers and footers

For more advanced customization options, refer to our guide on how to add headers and footers to Word documents in Python.

Export to PDF Format

In addition to DOCX output, Spire.Doc can export syntax-highlighted JavaScript code directly to PDF format. This is useful when distributing read-only documentation or sharing code outside Microsoft Word environments.

def convert_js_to_pdf(input_file: str, output_file: str) -> None:
    """Convert JavaScript file directly to PDF."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(noclasses=True, style='colorful')
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    paragraph = section.AddParagraph()
    paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.PDF)
    document.Close()

convert_js_to_pdf("app.js", "JavaScriptCode.pdf")

For more advanced PDF conversion techniques, including layout control and document formatting, see our detailed guide on converting Word documents to PDF in Python.

Customize Syntax Highlighting Style

Pygments provides multiple built-in color schemes:

def convert_with_custom_style(input_file: str, output_file: str, style_name: str = 'monokai') -> None:
    """Convert JavaScript to Word with custom highlighting style."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        js_code = file.read()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(
        noclasses=True,
        style=style_name,
        nowrap=True
    )
    
    highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
    
    paragraph = section.AddParagraph()
    paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()

convert_with_custom_style("app.js", "CodeMonokai.docx", style_name='monokai')

Available styles include: 'monokai', 'colorful', 'vim', 'vs', 'tango', 'friendly', 'default'

5. Common Pitfalls

Missing HtmlFormatter Configuration

Problem: Default HtmlFormatter generates CSS classes instead of inline styles, which Word cannot process without external stylesheets.

Solution: Always use noclasses=True:

html_formatter = HtmlFormatter(noclasses=True, style='colorful')
highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)

Encoding Errors with Special Characters

Problem: Reading files without UTF-8 encoding causes character corruption on some platforms.

Solution: Explicitly specify UTF-8 encoding:

with open(input_file, "r", encoding="utf-8") as file:
    js_code = file.read()

For files with BOM (Byte Order Mark), use utf-8-sig:

with open(input_file, "r", encoding="utf-8-sig") as file:
    js_code = file.read()

Indentation Loss

Problem: Not wrapping highlighted code in <pre> tags causes indentation to disappear.

Solution: Wrap syntax-highlighted HTML in <pre> tags:

highlighted_html = highlight(js_code, JavascriptLexer(), html_formatter)
paragraph.AppendHTML(f'<pre style="font-family: Consolas;">{highlighted_html}</pre>')

ModuleNotFoundError

Problem: Package not installed in current Python environment.

Solution:

pip install spire.doc

For virtual environments, ensure activation before installation:

source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
pip install spire.doc

Performance with Large Files

Problem: Very large JavaScript files (10,000+ lines) may cause slow conversion.

Solution: Process files in chunks:

def convert_large_file(input_file: str, output_file: str, chunk_size: int = 500) -> None:
    """Convert large JavaScript file in chunks."""
    
    with open(input_file, "r", encoding="utf-8") as file:
        lines = file.readlines()
    
    document = Document()
    section = document.AddSection()
    section.PageSetup.Margins.All = 50
    
    html_formatter = HtmlFormatter(noclasses=True, style='colorful')
    
    for i in range(0, len(lines), chunk_size):
        chunk = ''.join(lines[i:i + chunk_size])
        highlighted_html = highlight(chunk, JavascriptLexer(), html_formatter)
        
        paragraph = section.AddParagraph()
        paragraph.AppendHTML(f'<pre style="font-family: Consolas; font-size: 10pt;">{highlighted_html}</pre>')
    
    document.SaveToFile(output_file, FileFormat.Docx)
    document.Close()

Conclusion

This article demonstrated how to convert JavaScript and JSX files to Word documents in Python using Spire.Doc for Python and Pygments. By leveraging the highlight() function with HtmlFormatter and Spire.Doc's AppendHTML() method, developers can automate code documentation workflows with syntax highlighting.

Spire.Doc for Python provides document generation capabilities including table creation, image insertion, header/footer management, and multi-format export.

You can apply for a 30-day free license to evaluate all features.

7. FAQ

Can Spire.Doc convert JSX files to Word documents?

Yes. Pygments can highlight many JSX constructs using the JavaScript lexer, including component tags, props, and embedded expressions. However, JSX-specific syntax may not receive dedicated highlighting categories.

Does this solution require Microsoft Word installation?

No. Spire.Doc for Python operates independently without requiring Microsoft Word. The library generates DOCX files directly, making it suitable for server environments and CI/CD pipelines.

Can I convert JavaScript to formats other than DOCX?

Yes. Spire.Doc supports multiple export formats:

document.SaveToFile("output.pdf", FileFormat.PDF)
document.SaveToFile("output.html", FileFormat.Html)
document.SaveToFile("output.rtf", FileFormat.Rtf)

How do I handle TypeScript files (.ts, .tsx)?

Use TypescriptLexer:

from pygments.lexers import TypescriptLexer

highlighted_html = highlight(ts_code, TypescriptLexer(), html_formatter)

Is this approach suitable for enterprise-scale projects?

Yes. Python automation integrates with CI/CD pipelines and batch processing workflows. Local execution avoids security risks from uploading source code to online converters. Consider implementing logging, progress reporting, and error tracking for large deployments.

Can I customize syntax highlighting colors?

Yes. Pygments offers numerous built-in styles:

html_formatter = HtmlFormatter(noclasses=True, style='monokai')

Available styles: 'monokai', 'colorful', 'vim', 'vs', 'tango', 'friendly', 'default'

Published in Conversion

Tagged under

doc Python Conversion

Friday, 08 May 2026 06:12

How to Embed an Office Document Editor in an HTML Page

Tutorial on How to Embed a Web-Based Office Document Editor into an HTML Page

Modern web applications increasingly require built-in document capabilities for viewing and editing Word, Excel, and PowerPoint files directly in the browser. Instead of redirecting users to external applications, developers often need to embed an Office editor in a web page as part of their existing interface.

Building a fully functional online document editor from scratch can be complex, involving document rendering, format compatibility, editing workflows, and responsive UI integration. With Spire.OfficeJS from e-iceblue, developers can quickly integrate a browser-based Office editor into HTML pages using JavaScript without requiring Microsoft Office installations on client devices.

This article demonstrates how to embed a document editor in HTML, including page layout design, editor initialization, and dynamic document loading with practical examples.

Table of Contents

Why Embed an Office Editor into a Web Page?
Prerequisites
Basic Page Layout for Integration
Embed the Office Editor into a Container
Load and Switch Documents Dynamically
Customize Editor Behavior
Integrating the Editor into Existing Business Systems
Framework Integration (React, Vue, Angular)
Common Integration Issues
Conclusion
FAQ

Why Embed an Office Editor into a Web Page?

Embedding a document editor as part of your page layout enables seamless workflows and better user experience. Common use cases include:

Document management systems (DMS) where users view and edit files without leaving the interface
CRM or ERP platforms with integrated file editing capabilities
Online collaboration tools requiring real-time document editing
Internal business dashboards with document preview functionality

Instead of opening documents in a separate application or dedicated page, users can work with documents directly inside the current web interface.

Embedded vs Full-Page Editors

There are two common integration approaches:

Approach	Description
Full-page editor	The entire page is dedicated to document editing
Embedded editor	The editor is integrated as part of a larger UI

This tutorial focuses on the embedded approach, where the document editor works alongside sidebars, file lists, navigation menus, and other application components.

Prerequisites

Before integrating the editor, ensure you have:

Server Setup

Download and Extract Spire.OfficeJS

Download the Spire.OfficeJS package and extract it to a local directory.
Initialize font
```
cd Spire.OfficeJS.Windows_11.5.7
run_genallfonts.bat
```
Before deployment, it is necessary to first execute "run_genallfonts. bat" to initialize the font. After execution, the "fontsweb" folder will appear in the web folder containing the basic font. If you need to add other fonts, please refer to: How to Add Custom Fonts in Spire.OfficeJS for Frontend Editors
Start Spire.OfficeJS Backend Service
```
run_servers.bat
```
This starts the editor service on http://localhost:3000
Start Example Server (provides sample documents)

The example server runs on with sample documents available at http://localhost:3000/public/samples/

If you need a complete setup guide for installing and deploying Spire.OfficeJS in JavaScript applications, see: How to Deploy Spire.OfficeJS in JavaScript

Requirements

Document files accessible from the browser
Modern browser with WebAssembly support

Note: The code examples below use localhost addresses for local development and testing. In production environments, replace them with your actual server URLs or domain names.

Basic Page Layout for Integration

The first step is to design a layout where the editor occupies only part of the page. Here's a common structure with a sidebar and editor area:

<!DOCTYPE html>
<html>
<head>
  <title>Document Editor Integration</title>
  <style>
    .app-container {
      display: flex;
      height: 100vh;
    }

    .sidebar {
      width: 250px;
      border-right: 1px solid #ddd;
      padding: 10px;
      background: #f5f5f5;
    }

    .editor-container {
      flex: 1;
      position: relative;
    }
  </style>
</head>
<body>
  <div class="app-container">
    <div class="sidebar">
      <h3>Documents</h3>
      <ul>
        <li onclick="openDocument('http://localhost:3000/public/samples/sample.docx', 'docx')">Sample Document.docx</li>
        <li onclick="openDocument('http://localhost:3000/public/samples/sample.xlsx', 'xlsx')">Sample Spreadsheet.xlsx</li>
        <li onclick="openDocument('http://localhost:3000/public/samples/sample.pptx', 'pptx')">Sample Presentation.pptx</li>
      </ul>
    </div>

    <div class="editor-container" id="editor"></div>
  </div>
</body>
</html>

A simple embedded document management interface may look like this before a document is opened:

Document Management Interface

Layout Explanation

The sidebar displays a file list with clickable document names
The editor-container is a flex item that will host the document editor
The editor fills the remaining space using flex: 1

This structure reflects a real-world application layout rather than a simple demo page.

Embed the Office Editor into a Container

Load the Spire.OfficeJS script and initialize the editor inside your designated container:

<script src="http://localhost:3000/web/editors/spireapi/SpireCloudEditor.js"></script>

<script>
function initEditor() {
  const config = {
    user: {
      id: 'user1',
      name: 'Demo User'
    },
    fileAttrs: {
      sourceUrl: "http://localhost:3000/public/samples/sample.docx",
      fileInfo: {
        ext: "docx",
        name: "sample.docx"
      }
    },
    editorAttrs: {
      editorType: "document",
      editorMode: "edit",
      editorWidth: "100%",
      editorHeight: "100%",
      platform: "desktop",
      viewLanguage: "en",
      canEdit: true,
      canDownload: true,
      canForcesave: true,
      useWebAssemblyDoc: true,
      useWebAssemblyExcel: true,
      useWebAssemblyPpt: true,
      useWebAssemblyPdf: true,
      serverless: {
        useServerless: true,
        baseUrl: "http://localhost:3000",
        coAuthorUrl: "http://localhost:8000" //Collaborative editing service address
      },
      embedded: {
        saveUrl: "",
        toolbarDocked: 'top'
      },
      events: {
        onDocumentReady: function() {
          console.log('Document is ready');
        },
        onError: function(event) {
          console.error('Editor error:', event);
        },
        onSave: function(data) {
          console.log('Document saved', data);
          if (data && data.data && data.data.length >= 2) {
            downloadFile(data.data[1], data.data[0]);
          }
        }
      }
    }
  };

  new SpireCloudEditor.OpenApi("editor", config);
}

function downloadFile(file, fileName) {
  const a = document.createElement('a');
  const url = URL.createObjectURL(file);
  a.href = url;
  a.download = fileName;
  document.body.appendChild(a);
  a.click();
  document.body.removeChild(a);
  URL.revokeObjectURL(url);
}

initEditor();
</script>

After initialization, the embedded Office editor loads directly inside the target container:

Embedded Editor

To help you get started quickly, you can download the complete runnable HTML example used in this article:

Download Embedded Editor Example

Note: Start the Spire.OfficeJS service before opening the sample editor. The downloadable demo dynamically detects the current host using window.location.hostname, so it should be opened via an HTTP server. For direct browser file preview, replace it with a fixed host address.

Configuration Breakdown

user: Required user configuration with customization settings
fileAttrs: Document source URL and file metadata
editorAttrs: Editor behavior including mode, dimensions, and language

The editor renders inside the specified container element with ID "editor", allowing it to function as a UI component rather than taking over the entire page.

Load and Switch Documents Dynamically

In real applications, users need to open different files dynamically. You can achieve this by reinitializing the editor with new configurations:

let editorInstance = null;

function openDocument(sourceUrl, ext) {
  const fileName = sourceUrl.split('/').pop();
  
  if (editorInstance) {
    editorInstance.destroy();
  }
  
  const container = document.getElementById("editor");
  container.innerHTML = "";
  
  const config = {
    user: {
      id: 'user1',
      name: 'Demo User'
    },
    fileAttrs: {
      sourceUrl: sourceUrl,
      fileInfo: {
        ext: ext,
        name: fileName
      }
    },
    editorAttrs: {
      editorType: getEditorType(ext),
      editorMode: "edit",
      editorWidth: "100%",
      editorHeight: "100%",
      platform: "desktop",
      viewLanguage: "en",
      canEdit: true,
      canDownload: true,
      canForcesave: true,
      useWebAssemblyDoc: true,
      useWebAssemblyExcel: true,
      useWebAssemblyPpt: true,
      useWebAssemblyPdf: true,
      serverless: {
        useServerless: true,
        baseUrl: "http://localhost:3000",
        coAuthorUrl:"http://localhost:8000" //Collaborative Editing Service Address
      },
      embedded: {
        saveUrl: "",
        toolbarDocked: 'top'
      },
      events: {
        onSave: function(data) {
          if (data && data.data && data.data.length >= 2) {
            downloadFile(data.data[1], data.data[0]);
          }
        }
      }
    }
  };

  editorInstance = new SpireCloudEditor.OpenApi("editor", config);
}

function getEditorType(ext) {
  const extLower = ext.toLowerCase();
  switch (extLower) {
    case 'docx':
    case 'doc':
    case 'rtf':
    case 'txt':
    case 'odt':
      return 'document';
    case 'xlsx':
    case 'xls':
    case 'csv':
    case 'ods':
      return 'spreadsheet';
    case 'pptx':
    case 'ppt':
    case 'odp':
      return 'presentation';
    default:
      return 'document';
  }
}

How It Works

Clicking a file in the sidebar triggers openDocument with the file URL and extension
The previous editor instance is destroyed and container is cleared
The editor reloads with the selected document
No page refresh is required, maintaining application state

This pattern is essential for building interactive document management systems.

Best Practices for Document Switching

When switching between documents dynamically, proper cleanup prevents UI issues:

Error Handling and Loading States

Always use try-catch for error handling and consider adding loading indicators:

let editorInstance = null;

async function openDocument(sourceUrl, ext) {
  try {
    if (editorInstance) {
      editorInstance.destroy();
    }
    
    const container = document.getElementById("editor");
    container.innerHTML = "";
    
    const config = { /* ... configuration ... */ };
    editorInstance = new SpireCloudEditor.OpenApi("editor", config);
  } catch (error) {
    console.error('Failed to load document:', error);
  }
}

Key points:

Always destroy old instances before creating new ones
Clear the container element to prevent UI conflicts
Use try-catch for robust error handling

Customize Editor Behavior

You can fine-tune the editor's behavior using configuration options in editorAttrs.

Read-Only Mode

Set the editor to view-only mode:

editorAttrs: {
  editorMode: "view",
  isReadOnly: true
}

Control User Permissions

Restrict specific actions:

editorAttrs: {
  canEdit: false,
  canDownload: false,
  canComment: true,
  canPrint: true
}

Change UI Language

Support internationalization by setting the interface language:

editorAttrs: {
  viewLanguage: "zh"
}

Supported languages include English ("en") and Chinese ("zh").

Configure Save Functionality

In serverless mode, saving is handled through the onSave event callback:

editorAttrs: {
  embedded: {
    saveUrl: "",  // Keep empty in serverless mode
    toolbarDocked: 'top'
  },
  events: {
    onSave: function(data) {
      console.log('Document saved', data);
      if (data && data.data && data.data.length >= 2) {
        // data.data[0] = filename, data.data[1] = file blob
        downloadFile(data.data[1], data.data[0]);
      }
    }
  }
}

function downloadFile(file, fileName) {
  const a = document.createElement('a');
  const url = URL.createObjectURL(file);
  a.href = url;
  a.download = fileName;
  document.body.appendChild(a);
  a.click();
  document.body.removeChild(a);
  URL.revokeObjectURL(url);
}

When users click save, the document is automatically downloaded to their local machine.

Dynamic Protocol Configuration

To support both HTTP and HTTPS environments, use dynamic protocol detection:

const currentHost = window.location.hostname;
const currentProtocol = window.location.protocol;

const baseUrl = `${currentProtocol}//${currentHost}:3000`;
const exampleBaseUrl = `${currentProtocol}//${currentHost}:3000`;
const coAuthorUrl = `${currentProtocol}//${currentHost}:8000`;

This prevents mixed content errors when the page is served over HTTPS.

Upload Local Files

Users can upload local documents for editing:

<input type="file" id="fileInput" accept=".docx,.xlsx,.pptx,.doc,.xls,.ppt" 
       onchange="handleFileUpload(event)">

async function handleFileUpload(event) {
  const file = event.target.files[0];
  const fileName = file.name;
  const ext = fileName.split('.').pop().toLowerCase();
  
  const fileData = await new Promise((resolve) => {
    const reader = new FileReader();
    reader.onload = (e) => resolve(e.target.result);
    reader.readAsArrayBuffer(file);
  });
  
  const config = {
    user: {
      id: 'user1',
      name: 'Demo User'
    },
    fileAttrs: {
      sourceUrl: 'upload://' + fileName,
      fileInfo: { ext, name: fileName }
    },
    editorAttrs: {
      editorType: getEditorType(ext),
      serverless: {
        useServerless: true,
        baseUrl: baseUrl,
        coAuthorUrl: coAuthorUrl, //Collaborative Editing Service Address
        fileData: fileData  // Pass file data directly
      }
    }
  };
  
  editorInstance = new SpireCloudEditor.OpenApi("editor", config);
}

Integrating the Editor into Existing Business Systems

In most real-world scenarios, the online document editor is not the entire application. Instead, it functions as one module within a larger business system.

Typical integration patterns include:

CRM systems with contract editing and proposal generation
ERP systems with invoice review and report modification
Document management systems (DMS) with embedded preview and editing workflows
Customer portals with downloadable and editable forms
Internal collaboration platforms combining document editing with chat, comments, and version control

Because the browser-based office editor is mounted into a standard DOM container, it can coexist seamlessly with:

Sidebars and navigation menus
File trees and folder structures
Tab systems for multi-document editing
Chat panels and comment threads
Dashboards and analytics widgets

This modular architecture allows developers to build rich document-centric applications without sacrificing existing UI patterns or user workflows.

Framework Integration (React, Vue, Angular)

Although the example uses plain JavaScript, the same concept applies to modern frameworks. The key principle remains the same: initialize the editor after the component is mounted and render it into a DOM container.

React

useEffect(() => {
  new SpireCloudEditor.OpenApi("editor-container", config);
}, []);

Vue

mounted() {
  new SpireCloudEditor.OpenApi("editor-container", config);
}

Angular

ngAfterViewInit(): void {
  new SpireCloudEditor.OpenApi("editor-container", config);
}

For complete framework-specific setup and deployment instructions, see the dedicated integration guides:

Common Integration Issues

Here are common problems developers encounter and their solutions:

Editor Does Not Load

Cause: Backend service is not running or script URL is incorrect
Solution: Verify the service is running on port 3000 and use the correct script path: http://localhost:3000/web/editors/spireapi/SpireCloudEditor.js

Script Loading Failed (CORS Error)

Cause: Opening HTML file directly using file:// protocol
Solution: Start a local HTTP server (python -m http.server 8080 or npx http-server -p 8080) and access via http://localhost:8080/your-file.html

File Fails to Load

Cause: Document URL is inaccessible or blocked by CORS
Solution: Ensure sourceUrl is publicly accessible via HTTP. Replace placeholder URLs like https://example.com/ with real accessible document URLs

404 Errors for /doc/*/c/info Endpoints

Cause: Missing serverless configuration in editorAttrs
Solution: Add serverless and useWebAssembly* settings to your configuration

Multiple Editors Overlapping

Cause: Old editor instance not properly destroyed before creating new one
Solution: Always call editorInstance.destroy() before creating a new instance

Blank Editor Container

Cause: Browser cache issues or missing dependencies
Solution: Clear browser cache, try incognito mode, or check browser console for errors

Service Connection Refused

Cause: Required ports are blocked or service is not started
Solution: Make sure port 3000 is open and the Spire.OfficeJS service is running

Editor Overflows Container

Cause: Incorrect width/height settings
Solution: Set editorWidth and editorHeight to "100%" and ensure the container has defined dimensions

Conclusion

In this article, we demonstrated how to embed a web-based Office document editor into an existing HTML page using Spire.OfficeJS. By treating the editor as a modular component, developers can integrate document editing capabilities directly into their web applications without redirecting users to separate pages.

The approach enables building rich document management interfaces where editors coexist with navigation, file lists, and other UI components. With proper configuration, the embedded editor provides the same powerful features as a full-page solution while maintaining a seamless user experience.

Spire.OfficeJS supports multiple document formats including Word (DOCX), Excel (XLSX), and PowerPoint (PPTX), making it a comprehensive solution for web-based document processing needs.

If you'd like to test Spire.OfficeJS in a real project environment, you can request a free temporary license here: Apply for a Temporary License

FAQ

How do I embed a document editor in a web page?

You can embed a document editor by initializing SpireCloudEditor.OpenApi inside a specific HTML container element with proper configuration for the document source and editor settings.

Does embedding require Microsoft Office installation?

No. Spire.OfficeJS uses WebAssembly for browser-side document processing while relying on the backend service to provide the editor interface and related resources. No Microsoft Office installation is required on client machines.

Can I integrate the editor into React or Vue applications?

Yes. The editor can be integrated into any JavaScript framework by mounting it into a DOM element during the component's lifecycle, such as useEffect in React or mounted in Vue.

What document formats are supported?

Spire.OfficeJS supports Word documents (DOCX, DOC), Excel spreadsheets (XLSX, XLS), and PowerPoint presentations (PPTX, PPT), as well as PDF viewing.

How do I handle document save operations?

In serverless mode, configure the onSave event callback in editorAttrs.events. When users save, the callback receives the file data which can be automatically downloaded or processed further.

Published in Operation

Tagged under

officejs opertion

Thursday, 30 April 2026 02:26

How to Convert PowerPoint to Video in C# (MP4 & WMV)

Tutorial on How to Convert PowerPoint to Video in C#

PowerPoint presentations are widely used for training materials, product demos, online courses, and business reporting. However, sharing raw PPT or PPTX files can be problematic—recipients may not have PowerPoint installed, animations may not play correctly, and manual exporting becomes inefficient for bulk processing.

Converting PowerPoint to video formats like MP4 or WMV solves these challenges by creating universally playable content that preserves formatting and animations. With Spire.Presentation from e-iceblue, developers can automate PowerPoint-to-video conversion programmatically without requiring Microsoft PowerPoint installation.

This article demonstrates how to convert PowerPoint presentations to MP4 and WMV video in C# using Spire.Presentation for .NET, including configuration options for frame rate, slide duration, and transition preservation.

1. Why Convert PowerPoint to Video Programmatically?

Developers often need to convert PowerPoint presentations to video as part of larger business workflows. Compared with manually exporting files in Microsoft PowerPoint, programmatic conversion offers more flexibility and scalability.

Common scenarios include:

Automatically converting uploaded PPT/PPTX files into MP4 videos in web applications
Batch-processing training presentations for LMS platforms
Generating product demo videos from presentation templates
Converting presentations on servers where Microsoft PowerPoint is not installed
Standardizing presentation delivery across different devices

Programmatic conversion is especially useful when you need repeatable workflows, server-side processing, or integration with existing document automation systems.

2. Set Up the Environment

Before converting PowerPoint presentations to video, you need to prepare two components:

Spire.Presentation for .NET – used to load and process PPT/PPTX files
FFmpeg – used to encode slide frames into MP4 or WMV video files

Spire handles presentation rendering, while FFmpeg generates the final video output. Both are required for successful conversion.

Install Spire.Presentation for .NET

Install the library from NuGet:

Install-Package Spire.Presentation

You can also download Spire.Presentation for .NET package and install it manually.

This package allows your C# application to open PowerPoint presentations, access slides, and export them programmatically.

Install FFmpeg

Spire.Presentation relies on FFmpeg to combine rendered slide frames into a playable video file. If FFmpeg is not installed or the path is configured incorrectly, the export process will fail.

On Windows

Follow these steps to install FFmpeg:

Download the FFmpeg essentials build

FFmpeg Essentials Build for Windows.
Extract the package to your local machine
Locate the bin folder path

Example:

D:\tools\ffmpeg\bin

This path will be used later when configuring SaveToVideoOption.

On Linux (CentOS)

Install FFmpeg using the following commands:

sudo yum install epel-release
sudo yum localinstall --nogpgcheck https://download1.rpmfusion.org/free/el/rpmfusion-free-release-7.noarch.rpm
sudo yum install ffmpeg ffmpeg-devel

After installation, you can run the following command to locate the FFmpeg path:

which ffmpeg

Note: Older FFmpeg versions may not fully support certain slide transition effects.

3. Convert PowerPoint to MP4 in C#

Once the environment is configured, you can convert PowerPoint presentations to MP4 using just a few lines of code.

The basic workflow includes:

Load the PowerPoint file
Configure video export settings
Export the presentation as MP4

Basic Conversion Example

The following example converts a PPTX file into an MP4 video:

using Spire.Presentation;

namespace PowerPointToVideo
{
    class Program
    {
        static void Main(string[] args)
        {
            string inputFile = "ProductDemo.pptx";
            string outputFile = "ProductDemo.mp4";

            Presentation presentation = new Presentation();
            presentation.LoadFromFile(inputFile);

            presentation.SaveToVideoOption = new SaveToVideoOption(
                @"D:\tools\ffmpeg\bin"
            );

            presentation.SaveToVideoOption.Fps = 30;
            presentation.SaveToVideoOption.DurationForEachSlide = 2;

            presentation.SaveToFile(outputFile, FileFormat.MP4);

            presentation.Dispose();
        }
    }
}

After running the code:

The PPTX file is loaded into memory
Each slide is rendered as individual video frames
FFmpeg combines the frames into a final MP4 file
Supported animations, transitions, and embedded videos are preserved during export

Below is a sample PowerPoint presentation along with its converted video output.

Input: PowerPoint Presentation

PowerPoint Presentation for PPTX to MP4 Video Conversion

Output: Converted MP4 Video

Click the preview above to watch how PowerPoint slides are converted into an MP4 video while preserving transitions and animations.

How the Core API Works

This example uses several key API methods:

LoadFromFile() loads the PowerPoint presentation into memory
SaveToVideoOption configures the FFmpeg path and playback settings
Fps controls video smoothness
DurationForEachSlide controls how long each slide appears
SaveToFile() exports the final video file
Dispose() releases system resources after conversion

This basic workflow is enough for most standard PowerPoint-to-video conversion tasks. If you need additional formats or customization options, continue to the advanced scenarios below.

If you need a static sharing format, you can also convert PowerPoint presentations to images (JPG/PNG) in C# for easier distribution and web display.

4. More PowerPoint to Video Options in C#

The basic example works for most scenarios, but some applications may require different output formats, custom playback settings, or bulk conversion workflows.

Convert PowerPoint to WMV

While MP4 is the most widely used video format, some legacy enterprise systems and Windows-based environments may still require WMV output.

To export a PowerPoint file as WMV, simply change the output file extension:

using Spire.Presentation;

Presentation presentation = new Presentation();
presentation.LoadFromFile("TrainingSlides.pptx");

presentation.SaveToVideoOption = new SaveToVideoOption(
    @"D:\tools\ffmpeg\bin"
);

presentation.SaveToFile("TrainingVideo.wmv", FileFormat.WMV);

presentation.Dispose();

Customize Video Settings

If your presentation contains complex animations or requires specific playback timing, you can adjust frame rate and slide duration settings.

using Spire.Presentation;

Presentation presentation = new Presentation();
presentation.LoadFromFile("MarketingPitch.pptx");

presentation.SaveToVideoOption = new SaveToVideoOption(
    @"D:\tools\ffmpeg\bin"
);

// Higher FPS for smoother playback
presentation.SaveToVideoOption.Fps = 60;

// Longer display time per slide
presentation.SaveToVideoOption.DurationForEachSlide = 10;

presentation.SaveToFile("MarketingPitch_HD.mp4", FileFormat.MP4);

presentation.Dispose();

Video Settings Reference

Setting	Default	Maximum	Purpose
Fps	30	60	Controls playback smoothness
DurationForEachSlide	5 seconds	5 minutes	Controls slide display duration

Higher values may increase processing time and temporary storage usage.

Batch Convert Multiple PPTX Files

Batch conversion is useful for LMS platforms, enterprise reporting systems, and document automation workflows that need to process multiple presentations automatically.

using Spire.Presentation;
using System.IO;

string ffmpegPath = @"D:\tools\ffmpeg\bin";
string inputFolder = @"C:\Presentations\";
string outputFolder = @"C:\Videos\";

string[] pptxFiles = Directory.GetFiles(inputFolder, "*.pptx");

foreach (string inputFile in pptxFiles)
{
    string fileName = Path.GetFileNameWithoutExtension(inputFile);
    string outputFile = Path.Combine(outputFolder, fileName + ".mp4");

    Presentation presentation = new Presentation();
    presentation.LoadFromFile(inputFile);

    presentation.SaveToVideoOption = new SaveToVideoOption(ffmpegPath);
    presentation.SaveToVideoOption.Fps = 30;
    presentation.SaveToVideoOption.DurationForEachSlide = 3;

    presentation.SaveToFile(outputFile, FileFormat.MP4);
    presentation.Dispose();
}

This approach helps automate large-scale PowerPoint-to-video conversion workflows without requiring manual exports in Microsoft PowerPoint.

You can edit the PowerPoint presentation in C# before conversion to ensure the resulting video has better layout and animation effects.

5. Supported Transitions and Animations

During PowerPoint-to-video conversion, Spire.Presentation preserves key visual effects to ensure the output video closely matches the original presentation experience.

Slide Transitions

PowerPoint slide transitions are rendered during video generation to maintain smooth visual flow between slides.

The following transitions are supported:

Fade
Push
Wipe (up, down, left, right)
Reveal
Cover
Split
Dissolve
Clockwise Clock

These transitions are applied during frame rendering to simulate natural slide progression in the final video.

Animation Effects

Animations are processed and rendered during video generation to simulate PowerPoint playback behavior.

Entrance Animations:

Fly In
Float In
Appear
Fade
Split
Wipe

Exit Animations:

Fly Out
Float Out
Disappear
Fade
Split
Wipe

Animation sequences are processed as a single playback unit to ensure consistent rendering in the final video.

Additional Features

Embedded Videos

Embedded media inside PowerPoint slides is included in the exported video, making it suitable for presentations with multimedia content.

Automatic Duration Handling

Slide timing and animation durations are automatically interpreted during conversion to ensure accurate playback in the final video output.

Cross-Platform Support

The conversion process can run on both Windows and Linux environments, making it suitable for server-side automation and enterprise workflows.

For more information on supported features, refer to the Spire.Presentation for .NET API documentation.

6. Common Pitfalls

When converting PowerPoint presentations to video, there are a few common issues that may affect output quality or runtime execution. Being aware of these helps ensure a smoother conversion process in production environments.

FFmpeg Path Not Found

The video export process depends on FFmpeg for encoding the final MP4 or WMV file.

Ensure that the FFmpeg path is correctly configured and points to the bin directory containing the FFmpeg executable.

On Windows, this typically looks like:

D:\tools\ffmpeg\bin

If the FFmpeg path is incorrect or not accessible, the video export process will fail at runtime.

Insufficient Disk Space

PowerPoint-to-video conversion involves rendering slides into intermediate frames before encoding them into a final video file.

As a result, disk usage may increase significantly depending on:

Number of slides
Slide duration
Frame rate (FPS)
Presentation resolution and content complexity

For high-quality or long-duration presentations, temporary disk usage can become substantial. It is recommended to ensure sufficient free disk space before processing large batch conversions.

Unsupported or Inconsistent Transitions

Most common PowerPoint transitions are supported during conversion. However, some complex or advanced transition effects may not be rendered exactly the same as in Microsoft PowerPoint.

In such cases, the final video will still preserve slide flow, but the visual effect may appear simplified compared to the original presentation.

It is recommended to test presentations with advanced transitions before using them in production workflows.

Font Rendering Differences

PowerPoint presentations rely on system-installed fonts. If a required font is missing on the environment where conversion is executed, the layout or text appearance in the final video may change.

To ensure consistent rendering:

Install required fonts on the system
Use widely available standard fonts when possible
Verify output on target deployment environments

This is especially important for multilingual presentations or server-side conversion scenarios.

Conclusion

In this article, we demonstrated how to convert PowerPoint presentations to MP4 and WMV video in C# using Spire.Presentation. By leveraging the Spire API, developers can automate video generation with customizable frame rates, slide durations, and transition preservation.

Beyond video conversion, Spire.Presentation can also be used for tasks such as slide editing, media extraction, and presentation generation, making it useful for broader document automation workflows.

If you would like to evaluate the full functionality without limitations, you can apply for a temporary license.

FAQ

Can I convert PowerPoint to MP4 without Microsoft PowerPoint?

Yes. Spire.Presentation performs conversion independently and does not require Microsoft PowerPoint installation.

Are animations preserved in the video?

Yes, many common slide transitions and entrance/exit animations are preserved during conversion.

What video formats are supported?

Currently, MP4 and WMV formats are supported for video export.

Is Spire.Presentation suitable for server-side applications?

Yes. Spire.Presentation supports server environments and is widely used in automated document processing workflows.

How much disk space does video conversion require?

Video generation creates temporary image frames. A presentation with 5 slides at 60 FPS and 5-minute duration may require approximately 25GB of temporary storage.

Published in Conversion

Tagged under

ppt net Conversion

12 3 4 »End

Page 1 of 4

How to Convert Word to JSON in Python (DOCX to JSON)

1. How Is Word Converted into JSON?

2. Install the Required Library

3. Method 1 – Convert Word Text to JSON

3.1 Read Paragraphs from a Word Document

3.2 Serialize the Extracted Text to JSON

Output Example

Conversion Result

3.3 Explanation

4. Method 2 – Convert Word Tables to JSON

Why Tables Need Special Handling

Extracting Tables from a Word Document

Output Example

Conversion Result

Explanation

5. Method 3 – Preserve Document Structure in JSON

How to Preserve Headings, Paragraphs, and Tables in a Hierarchical JSON Structure

Output Example

Conversion Result

Explanation

6. When to Use Word to JSON Conversion

7. Limitations and Best Practices

Limitations

Best Practices

8. FAQ

Can I convert DOCX to JSON in Python?

What is the best Word to JSON converter for developers?

Can I convert Word tables to JSON?

Does Word have a native JSON export option?

Can I preserve headings and structure when converting Word to JSON?

Can I convert Word to JSON online?

9. Conclusion

How to Convert JSON to Word in Python (JSON to DOCX)

1. Understanding JSON-to-Word Conversion

2. Install Spire.Doc for Python

Install via pip (Recommended)

3. Method 1: Convert JSON to Word as Formatted Text

Sample JSON

Python Code

Output

When to Use This Approach

4. Method 2: Convert JSON Arrays to Word Tables

Sample JSON

Python Code

Output

Why Use Tables for JSON Arrays

Enhancing JSON Tables with Formatting

5. Method 3: Generate Structured Word Reports from JSON

Sample JSON

Python Code

Output

Key Techniques

Why Structured Reports Matter

6. Handle Nested JSON Objects

Example JSON

Python Code

Output

How It Works

7. Handle Missing or Optional JSON Fields

Example JSON with Missing Fields

Python Code

Output

Key Techniques

8. Convert JSON Files to Word Documents

Python Code

Key Points

9. Why Use Spire.Doc for JSON-to-Word Conversion

Challenges of JSON-to-Word Conversion

Benefits of Spire.Doc for Python

10. FAQ

How do I convert JSON to Word in Python?

Can JSON arrays be converted into Word tables?

How do I create a DOCX report from API JSON responses?

Can nested JSON objects be exported to Word?

How do I convert a JSON file to a Word document?

What is the best way to generate Word documents from JSON data?

11. Conclusion

How to Convert Word Tables to CSV (DOC/DOCX to CSV)

Table of Contents

Related Links