Spire.Office Knowledgebase Page 15 | E-iceblue

Visual guide for converting Text File to Excel through Python

Text files (.txt) are a common way to store data due to their simplicity, but they lack the structure and analytical power of Excel spreadsheets. Converting TXT files to Excel allows for better data organization, visualization, and manipulation.

While manual import text file to Excel works for small datasets, automating this process saves time and reduces errors. Python, with its powerful libraries, offers an efficient solution. In this guide, you’ll learn how to convert TXT to Excel in Python using Spire.XLS for Python, a robust API for Excel file manipulation.

Prerequisites

Install Python and Spire.XLS

  • Install Python on your machine from python.org.
  • Install the Spire.XLS for Python library via PyPI. Open your terminal and run the following command:
pip install Spire.XLS

Prepare a TXT File

Ensure your TXT file follows a consistent structure, typically with rows separated by newlines and columns separated by delimiters (e.g., commas, tabs, or spaces). For example, a sample text file might look like this: A sample TXT file containing data.

Step-by-Step Guide to Convert Text File to Excel

Step 1: Import Required Modules

In your Python script, import the necessary classes from Spire.XLS:

from spire.xls import *
from spire.xls.common import *

Step 2: Read and Parse the TXT File

Read the text file and split it into rows and columns using Python’s built-in functions. Define your delimiter (tab, in this case):

with open("Data.txt", "r") as file:
    lines = file.readlines()
data = [line.strip().split("\t") for line in lines]

Note: If different delimiter was used, replace the parameter "\t" of the split() method (e.g., spaces: split(" ")).

Step 3: Create an Excel Workbook

Initialize a workbook object and access the first worksheet:

workbook = Workbook()
sheet = workbook.Worksheets[0]

Step 4: Write Data to the Worksheet

Loop through the parsed data and populate the Excel cells.

for row_num, row_data in enumerate(data):
    for col_num, cell_data in enumerate(row_data):
        sheet.Range[row_num + 1, col_num + 1].Value = cell_data
        sheet.Range[1, col_num + 1].Style.Font.IsBold = True

Step 5: Save the Excel File

Export the workbook as an XLSX file (you can also use .xls for older formats):

workbook.SaveToFile("TXTtoExcel.xlsx", ExcelVersion.Version2016)

TXT to Excel Full Code Example

from spire.xls import *
from spire.xls.common import *

# Read TXT data 
with open("Data.txt", "r") as file:
    lines = file.readlines()

# Split data by delimiter 
data = [line.strip().split("\t") for line in lines]

# Create an Excel workbook
workbook = Workbook()

# Get the first worksheet
sheet = workbook.Worksheets[0]

# Iterate through each row and column in the list 
for row_num, row_data in enumerate(data):
    for col_num, cell_data in enumerate(row_data):

        # Write the data into the corresponding Excel cells
        sheet.Range[row_num + 1, col_num + 1].Value = cell_data

        # Set the header row to bold
        sheet.Range[1, col_num + 1].Style.Font.IsBold = True

# Autofit column width
sheet.AllocatedRange.AutoFitColumns()

# Save as Excel (.xlsx or.xls) file
workbook.SaveToFile("TXTtoExcel.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

The Excel workbook converted from a text file:

Import a Txt file to an Excel file.

Conclusion

Converting TXT files to Excel in Python using Spire.XLS automates data workflows, saving time and reducing manual effort. Whether you’re processing logs, survey results, or financial records, this method ensures structured, formatted outputs ready for analysis.

Pro Tip: Explore Spire.XLS’s advanced features—such as charts, pivot tables, and encryption—to further enhance your Excel files.

FAQs

Q1: Can Spire.XLS convert large TXT files?

Yes, the Python Excel library is optimized for performance and can process large files efficiently. However, ensure your system has sufficient memory for very large datasets (e.g., millions of rows). For optimal results, process data in chunks or use batch operations.

Q2: Can I convert Excel back to TXT using Spire.XLS?

Yes, Spire.XLS allows to read Excel cells and write their values to a text file. A comprehensive guide is available at: Convert Excel to TXT in Python

Q3: How to handle the encoding issues during conversion?

Specify encoding if the text file uses non-standard characters (e.g., utf-8):

with open("Data.txt", "r", encoding='utf-8') as file:
    lines = file.readlines()

Get a Free License

To fully experience the capabilities of Spire.XLS for Python without any evaluation limitations, you can request a free 30-day trial license.

Want to count the frequency of words in a Word document? Whether you're analyzing content, generating reports, or building a document tool, Python makes it easy to find how often a specific word appears—across the entire document, within specific sections, or even in individual paragraphs. In this guide, you’ll learn how to use Python to count word occurrences accurately and efficiently, helping you extract meaningful insights from your Word files without manual effort.

Count Frequency of Words in Word with Python

In this tutorial, we’ll use Spire.Doc for Python, a powerful and easy-to-use library for Word document processing. It supports a wide range of features like reading, editing, and analyzing DOCX files programmatically—without requiring Microsoft Office.

You can install it via pip:

pip install spire.doc

Let’s see how it works in practice, starting with counting word frequency in an entire Word document.

How to Count Frequency of Words in an Entire Word Document

Let’s start by learning how to count how many times a specific word or phrase appears in an entire Word document. This is a common task—imagine you need to check how often the word "contract" appears in a 50-page file.
With the FindAllString() method from Spire.Doc for Python, you can quickly search through the entire document and get an exact count in just a few lines of code—saving you both time and effort.

Steps to count the frequency of a word in the entire Word document:

  • Create an object of Document class and read a source Word document.
  • Specify the keyword to find.
  • Find all occurrences of the keyword in the document using Document.FindAllString() method.
  • Count the number of matches and print it out.

The following code shows how to count the frequency of the keyword "AI-Generated Art" in the entire Word document:

from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word document
document.LoadFromFile("E:/Administrator/Python1/input/AI-Generated Art.docx")

# Customize the keyword to find
keyword = "AI-Generated Art"

# Find all matches (False: distinguish case; True: full text search)
textSelections = document.FindAllString(keyword, False, True)

# Count the number of matches
count = len(textSelections)

# Print the result
print(f'"{keyword}" appears {count} times in the entire document.')

# Close the document
document.Close()

Count Frequency of Word in the Entire Document with Python

How to Count Word Frequency by Section in a Word Document Using Python

A Word document is typically divided into multiple sections, each containing its own paragraphs, tables, and other elements. Sometimes, instead of counting a word's frequency across the entire document, you may want to know how often it appears in each section. To achieve this, we’ll loop through all the document sections and search for the target word within each one. Let’s see how to count word frequency by section using Python.

Steps to count the frequency of a word by section in Word documents:

  • Create a Document object and load the Word file.
  • Define the target keyword to search.
  • Loop through all sections in the document. Within each section, loop through all paragraphs.
  • Use regular expressions to count keyword occurrences.
  • Accumulate and print the count for each section and the total count.

This code demonstrates how to count how many times "AI-Generated Art" appears in each section of a Word document:

import re
from spire.doc import *
from spire.doc.common import *

# Create a Document object and load a Word file
document = Document()
document.LoadFromFile("E:/Administrator/Python1/input/AI.docx")

# Specify the keyword
keyword = "AI-Generated Art"

# The total count of the keyword
total_count = 0

# Get all sections
sections = document.Sections

# Loop through each section
for i in range(sections.Count):
    section = sections.get_Item(i)
    paragraphs = section.Paragraphs

    section_count = 0  
    print(f"\n=== Section {i + 1} ===")

    # Loop through each paragraph in the section
    for j in range(paragraphs.Count):
        paragraph = paragraphs.get_Item(j)
        text = paragraph.Text

        # Find all matches using regular expressions
        count = len(re.findall(re.escape(keyword), text, flags=re.IGNORECASE))
        section_count += count
        total_count += count

    print(f'Total in Section {i + 1}: {section_count} time(s)')

print(f'\n=== Total occurrences in all sections: {total_count} ===')

# Close the document
document.Close()

How to Count Word Frequency by Sections in a Word File

How to Count Word Frequency by Paragraph in a Word Document

When it comes to tasks like sensitive word detection or content auditing, it's crucial to perform a more granular analysis of word frequency. In this section, you’ll learn how to count word frequency by paragraph in a Word document, which gives you deeper insight into how specific terms are distributed across your content. Let’s walk through the steps and see a code example in action.

Steps to count the frequency of words by paragraph in Word files:

  • Instantiate a Document object and load a Word document from files.
  • Specify the keyword to search for.
  • Loop through each section and each paragraph in the document.
  • Find and count the occurrence of the keyword using regular expressions.
  • Print out the count for each paragraph where the keyword appears and the total number of occurrences.

Use the following code to calculate the frequency of "AI-Generated Art" by paragraphs in a Word document:

import re
from spire.doc import *
from spire.doc.common import *

# Create a Document object
document = Document()

# Load a Word document
document.LoadFromFile("E:/Administrator/Python1/input/AI.docx")

# Customize the keyword to find
keyword = "AI-Generated Art"

# Initialize variables
total_count = 0
paragraph_index = 1

# Loop through sections and paragraphs
sections = document.Sections
for i in range(sections.Count):
    section = sections.get_Item(i)
    paragraphs = section.Paragraphs
    for j in range(paragraphs.Count):
        paragraph = paragraphs.get_Item(j)
        text = paragraph.Text

        # Find all occurrences of the keyword while ignoring case
        count = len(re.findall(re.escape(keyword), text, flags=re.IGNORECASE))

        # Print the result
        if count > 0:
            print(f'Paragraph {paragraph_index}: "{keyword}" appears {count} time(s)')
            total_count += count
        paragraph_index += 1

# Print the total count
print(f'\nTotal occurrences in all paragraphs: {total_count}')
document.Close()

Count Word Frequency by Paragraphs Using Python

To Wrap Up

The guide demonstrates how to count the frequency of specific words across an entire Word document, by section, and by paragraph using Python. Whether you're analyzing long reports, filtering sensitive terms, or building smart document tools, automating the task with Spire.Doc for Python can save time and boost accuracy. Give them a try in your own projects and take full control of your Word document content.

FAQs about Counting the Frequency of Words

Q1: How to count the number of times a word appears in Word?

A: You can count word frequency in Word manually using the “Find” feature, or automatically using Python and libraries like Spire.Doc. This lets you scan the entire document or target specific sections or paragraphs.

Q2: Can I analyze word frequency across multiple Word files?

A: Yes. By combining a loop in Python to load multiple documents, you can apply the same word-count logic to each file and aggregate the results—ideal for batch processing or document audits.

Comprehensive Guide for Converting PDF to CSV by Extracting Tables Using Python

Working with PDFs that contain tables, reports, or invoice data? Manually copying that information into spreadsheets is slow, error-prone, and just plain frustrating. Fortunately, there's a smarter way: you can convert PDF to CSV in Python automatically — making your data easy to analyze, import, or automate.

In this guide, you’ll learn how to use Python for PDF to CSV conversion by directly extracting tables with Spire.PDF for Python — a pure Python library that doesn’t require any external tools.

✅ No Adobe or third-party tools required

✅ High-accuracy table recognition

✅ Ideal for structured data workflows

In this guide, we’ll cover:

Convert PDF to CSV in Python Using Table Extraction

The best way to convert PDF to CSV using Python is by extracting tables directly — no need for intermediate formats like Excel. This method is fast, clean, and highly effective for documents with structured data such as invoices, bank statements, or reports. It gives you usable CSV output with minimal code and high accuracy, making it ideal for automation and data analysis workflows.

Step 1: Install Spire.PDF for Python

Before writing code, make sure to install the required library. You can install Spire.PDF for Python via pip:

pip install spire.pdf

You can also install Free Spire.PDF for Python if you're working on smaller tasks:

pip install spire.pdf.free

Step 2: Python Code — Extract Table from PDF and Save as CSV

  • Python
from spire.pdf import PdfDocument, PdfTableExtractor
import csv
import os

# Load the PDF document
pdf = PdfDocument()
pdf.LoadFromFile("Sample.pdf")

# Create a table extractor
extractor = PdfTableExtractor(pdf)

# Ensure output directory exists
os.makedirs("output/Tables", exist_ok=True)

# Loop through each page in the PDF
for page_index in range(pdf.Pages.Count):
    # Extract tables on the current page
    tables = extractor.ExtractTable(page_index)
    for table_index, table in enumerate(tables):
        table_data = []

        # Extract all rows and columns
        for row in range(table.GetRowCount()):
            row_data = []
            for col in range(table.GetColumnCount()):
                # Get cleaned cell text
                cell_text = table.GetText(row, col).replace("\n", "").strip()
                row_data.append(cell_text)
            table_data.append(row_data)

        # Write the table to a CSV file
        output_path = os.path.join("output", "Tables", f"Page{page_index + 1}-Table{table_index + 1}.csv")
        with open(output_path, "w", newline="", encoding="utf-8") as csvfile:
            writer = csv.writer(csvfile)
            writer.writerows(table_data)

# Release PDF resources
pdf.Dispose()

The conversion result:

The Result of Converting PDF to CSV with Python Using Spire.PDF

What is PdfTableExtractor?

PdfTableExtractor is a utility class provided by Spire.PDF for Python that detects and extracts table structures from PDF pages. Unlike plain text extraction, it maintains the row-column alignment of tabular data, making it ideal for converting PDF tables to CSV with clean structure.

Best for:

  • PDFs with structured tabular data
  • Automated Python PDF to CSV conversion
  • Fast Python-based data workflows

Relate Article: How to Convert PDFs to Excel XLSX Files with Python

Related Use Cases

If your PDF doesn't contain traditional tables — such as when it's formatted as paragraphs, key-value pairs, or scanned as an image — the following approaches can help you convert such PDFs to CSV using Python effectively:

Useful when data is in paragraph or report form — format it into table-like CSV using Python logic.

Perfect for image-based PDFs — use OCR to detect and export tables to CSV.

Why Choose Spire.PDF for Python?

Spire.PDF for Python is a robust PDF SDK tailored for developers. Whether you're building automated reports, analytics tools, or ETL pipelines — it just works.

Key Benefits:

  • Accurate Table Recognition

Smartly extracts structured data from tables

  • Pure Python, No Adobe Needed

Lightweight and dependency-free

  • Multi-Format Support

Also supports conversion to text, images, Excel, and more

Frequently Asked Questions

Can I convert PDF to CSV using Python?

Yes, you can convert PDF to CSV in Python using Spire.PDF. It supports both direct table extraction to CSV and an optional workflow that converts PDFs to Excel first. No Adobe Acrobat or third-party tools are required.

What's the best way to extract tables from PDFs in Python?

The most efficient way is using Spire.PDF’s PdfTableExtractor class. It automatically detects tables on each page and lets you export structured data to CSV with just a few lines of Python code — ideal for invoices, reports, and automated processing.

Why would I convert PDF to Excel before CSV?

You might convert PDF to Excel first if the layout is complex or needs manual review. This gives you more control over formatting and cleanup before saving as CSV, but it's slower than direct extraction and not recommended for automation workflows.

Does Spire.PDF work without Adobe Acrobat?

Yes. Spire.PDF for Python is a 100% standalone library that doesn’t rely on Adobe Acrobat or any external software. It's a pure Python solution for converting, extracting, and manipulating PDF content programmatically.

Conclusion

Converting PDF to CSV in Python doesn’t have to be a hassle. With Spire.PDF for Python, you can:

  • Automatically extract structured tables to CSV
  • Build seamless, automated workflows in Python
  • Handle both native PDFs and scanned ones (with OCR)

Get a Free License

Spire.PDF for Python offers a free edition suitable for basic tasks. If you need access to more features, you can also apply for a free license for evaluation use. Simply submit a request, and a license key will be sent to your email after approval.

page 15