Explore the easy solutions to convert HTML to plain text

In today's digital world, HTML (HyperText Markup Language) is the backbone of web content. However, there are countless scenarios where you need clean, unformatted plain text instead. Whether you're extracting data for analysis, simplifying content for an email, or preparing text for SEO analysis, knowing how to convert HTML to text is an essential skill.

This comprehensive guide will walk you through the most effective methods to convert HTML to plain text, from simple online tools for beginners to powerful code libraries for developers.


Key Benefits of HTML to Text Conversion

Stripping away the HTML markup to get clean, readable text serves several crucial purposes:

  • Data Processing and Analysis: For data scientists and analysts, plain text is the starting point for Natural Language Processing (NLP), sentiment analysis, and keyword extraction. HTML tags are just noise for these algorithms.
  • Search Engine Optimization (SEO): When search engines crawl your site, they primarily index the plain text content. Ensuring that your core message is easily extractable from HTML helps with on-page SEO and ranking.
  • Content Repurposing: Plain text is lightweight and versatile, perfect for email newsletters, app notifications, social media previews, or document snippets.
  • Web Scraping Efficiency: Web scrapers are designed to extract specific data. Converting the raw HTML response to text is often the first step in filtering out the essential information from the presentation layer.

Two Easy Methods to Convert HTML to Plain Text

For non-technical users, these user-friendly tools deliver fast results without manual tag removal.

1. Online HTML to Text Converters

For quick, one-off conversions, online tools automate markup stripping with minimal effort.

Top Tools:

CLOUDXDOCS, Convertio, CodeBeautify (all free, no sign-up).

General Steps:

  • Upload your HTML file.
  • Select “Convert” or “Extract Text.”
  • Download the plain-text output (usually as a .txt file).

Example of using CLOUDXDOCS:

HTML to text free online converter

Pros: Fast, require no technical skills, and often preserve basic formatting like line breaks.
Cons: Not suitable for batch processing; privacy concerns with sensitive data.

You may also like: 5 Best Free HTML to Word Converters (Tested & Recommended)

2. Word Processors (Microsoft Word, Google Docs)

Leverage familiar office tools to change HTML to text effectively—no extra software required.

Microsoft Word:

  • Open Word and go to “File > Open
  • Select your HTML file (choose “All Files” from the dropdown to see it).
  • Word will convert the HTML to an editable document.
  • Go to “File > Save As” and select “Plain Text (*.txt)” as the format.

Save an HTML file to a text file using MS Word.

Google Docs:

  • Upload the HTML file to Google Drive.
  • Right-click the file and select “Open with > Google Docs
  • Google Docs will render the HTML as text.
  • Download as “Plain text (.txt)” via “File > Download

Download an HTML file as a plain text file using Google Docs.

Best For: Users who are already comfortable with these applications and need to perform this task infrequently.


Advanced: Convert HTML to Text with Code (For Developers)

If you need to automate conversions (e.g., bulk processing, web scraping), using programming languages like Python or C# is the most powerful approach.

1. Convert HTML to Text in Python

The Spire.Doc for Python library provides the SaveToFile method for saving HTML files as TXT files directly.

  • Install via Pypi:
pip install Spire.Doc
  • Write the Python script:
from spire.doc import *
from spire.doc.common import *

# Load an HTML file
document = Document()
document.LoadFromFile("sample.html", FileFormat.Html, XHTMLValidationType.none)
# Save as a plain text file
document.SaveToFile("HtmlToText.txt", FileFormat.Txt)
document.Close()

If you need to process HTML strings, check this: How to Convert an HTML String to Text in Python

2. Convert HTML to Plain Text in C#

For .NET applications, Spire.Doc for .NET is a robust solution to extract text from HTML.

  • Install via Nuget:
Install-Package Spire.Doc
  • Sample C# code:
using Spire.Doc;
using Spire.Doc.Documents;

namespace HtmlToText
{
    class Program
    {
        static void Main()
        {
            // Create a Document object
            Document doc = new Document();

            // Load an HTML file
            doc.LoadFromFile("sample.html", FileFormat.Html, XHTMLValidationType.None);

            // Convert HTML to plain text
            doc.SaveToFile("HTMLtoText.txt", FileFormat.Txt);
            doc.Dispose();
        }
    }
}

The converted TXT file:

Convert an HTML file to a TXT file programmatically using Python/C#

Best For: Developers, data scientists, and anyone needing automated, scalable, or customized text extraction.


Conclusion

Knowing how to convert HTML to text is a fundamental skill that bridges the gap between web content and usable data. Beginners can use online tools or word processors for quick tasks, while developers can automate with code for bulk or custom workflows. No matter your skill level, the goal is to get clean, usable text that fits your use case.

By following these methods, you’ll save time, avoid formatting headaches, and unlock the full potential of plain text in your digital workflow.


Frequently Asked Questions (FAQs)

Q: Why can't I just copy and paste text from a website?

A: Copying and pasting directly from a webpage often pulls hidden formatting, extra spaces, or partial HTML tags. This leads to messy text that requires manual cleaning. HTML-to-text tools/ methods strip only the markup while preserving the core content, saving you time.

Q: Can I convert HTML to rich text (RTF) instead of plain text?

A: Yes, most online tools (e.g., Convertio) and word processors support RTF output. For coding, use Spire.Doc to save HTML as RTF while preserving formatting like bold, italics, and headings.

Q: What is the best method for converting multiple HTML files at once?

A: For bulk conversion, using a script is the most efficient method. You can write a simple Python or C# script to loop through all files in a directory and convert them one by one.

Q: Are online HTML-to-text converters safe to use?

A: You should avoid pasting sensitive, confidential, or proprietary HTML code into online tools. While most reputable sites are safe, there's a risk that your data could be intercepted or stored. For sensitive information, always use a local method like a script on your own computer.


See Also

Learn three easy ways to convert Excel to XML

If you’re working with data that needs to be shared across software, APIs, or websites, converting Excel to XML is a critical skill. XML (Extensible Markup Language) is a universal, machine-readable format ideal for data interchange, while Excel excels at organizing spreadsheets. Bridging the gap between them allows your data to flow seamlessly across platforms.

Whether you're a beginner looking for a quick solution, an analyst ensuring data integrity, or a developer building an automated pipeline, this guide will walk you through three effective and error-free Excel to XML conversion methods.


Why Convert Excel to XML?

The core value of converting Excel to XML lies in:

  • Cross-Platform Compatibility: XML works with APIs, CMS platforms (WordPress, Shopify), enterprise tools (SAP, Salesforce), and programming languages (Python, Java).​
  • Structured Data: XML preserves Excel’s rows/columns as tagged elements (e.g., <product><name>Laptop</name><price>999</price></product>), making it easy for software to parse.​
  • Data Integrity: XML avoids formatting issues that plague Excel files (e.g., broken formulas, merged cells) when shared across systems.​
  • Use Cases: E-commerce product feeds, API data exports, CMS content imports, and enterprise data integration.

Structuring Excel Data for Successful Conversion​

The first and most critical step happens in Excel. A well-structured source sheet guarantees a smooth conversion.

  • Use unique, short column headers (no spaces/special characters—e.g., "ProductName" instead of "Product Name").​
  • Remove empty rows/columns (they cause XML validation errors).​
  • Ensure consistent data types (e.g., all prices as numbers, dates in a uniform format).​
  • Avoid merged cells or complex formulas; use raw data instead.

Method 1: Manual Excel to XML Conversion (Built-in Developer Tab)​

Excel’s built-in Developer Tab provides native support for XML tasks, making it accessible to Excel users without programming knowledge.

Step-by-Step Guide:

Step 1: Prepare Your Excel Data

Structure your data in a clear, tabular format. The first row should contain your column headers, which will become your XML tags. For this example, let's use a simple product list:

Sample Excel spreadsheet showing product data.

Step 2: Create an XML Schema

An XML schema (.xsd file) defines the structure of your XML output. It specifies which tags to use and how they relate to each other.

Example Schema (save this as “products.xsd” on your computer):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Product" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="ID" type="xs:integer"/>
              <xs:element name="ProductName" type="xs:string"/>
              <xs:element name="Category" type="xs:string"/>
              <xs:element name="Price" type="xs:decimal"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>
  • Adjust the name attributes to match your Excel column headers (case-sensitive).
  • Use appropriate types (e.g., xsd:integer for numbers, xsd:string for text).

Step 3: Add the Schema to Excel

  1. Open your Excel file.
  2. Go to the Developer tab. (If you don’t see it, go to File > Options > Customize Ribbon, check "Developer," and click "OK.")
  3. In the Developer tab, click Source to open the XML Source pane.
  4. Click XML Maps... and then Add....
  5. Attach the .xsd schema file you created using the XML Maps dialog

Excel Developer Tab interface showing the XML Source pane and XML Maps dialog.

Step 4: Map XML Elements to Your Cells

  1. In the XML Source pane, you will see your elements (“Catalog”, “Product”, “ID”, etc.).
  2. Drag the “Product” element and drop it onto cell A1. Excel will create a nicely formatted table.
  3. The elements will now be mapped. Your data is now ready for export.

Excel worksheet with XML elements mapped to column headers.

Step 5: Export to XML

  1. With your mapped data selected, go back to the Developer tab.
  2. Click Export.
  3. Choose a location and name for your new XML file and click OK.

The exported XML file will match your schema, like this:

Resulting XML file displayed in a browser.

You may also like: Convert XLS/XLSX to HTML Easily — Keep Excel Formatting Intact


Method 2: Free Excel to XML Online Converters

If you need a quick, one-time conversion without the technical setup, online tools are your best friend.

Top tools to convert Excel into XML online (100% free, no sign-up):​

1. CloudxDocs (Most User-Friendly)​

  • Go to CloudxDocs EXCEL TO XML converter.
  • Upload your Excel file (supports XLS, XLSX, XLSB).​
  • Wait for the conversion to complete and download your XML file.

This converter defaults to producing an XML Spreadsheet file that preserves Excel’s formatting.

Convert Excel into XML online using CloudxDocs converter.

2. WizlyTools (Supports Custom XML)

  • Go to WizlyTools Excel to XML converter.​
  • Upload an Excel file (.xlsx, .xls).
  • Enter the root element name and row element name.
  • Select an indentation size for your XML output.
  • Click "Convert to XML" and then copy or download the resulting XML

This converter gives you more control over the XML structure, such as defining a specific root element, row element, and allows you to preview XML before downloading

WizlyTools converter page with options to define XML root and row elements.

Note: Avoid uploading sensitive data (e.g., financial records) to online tools. Use manual methods for confidential files.


Method 3: Automate Excel to XML Conversion in Python

For developers and IT professionals working in Python, Spire.XLS for Python provides a robust, code-driven solution for converting Excel spreadsheets to XML without requiring Microsoft Excel to be installed. This is ideal for server-side automation and application integration.

Step 1: Installation

First, you need to install the Spire.XLS for Python library. It's available via PyPI.

pip install Spire.XLS

Step 2: Write the Python Script

This script uses Spire.XLS to read Excel data, then generates XML with Python’s built-in xml.etree.ElementTree (for structure) and minidom (for pretty formatting).

from spire.xls import *
import xml.etree.ElementTree as ET
from xml.dom import minidom

# Load the Excel file
workbook = Workbook()
workbook.LoadFromFile("Orders.xlsx")

# Get the first worksheet
worksheet = workbook.Worksheets[0]

# Get row and column counts (Spire.XLS uses 1-based indexing)
max_row = worksheet.LastRow
max_col = worksheet.LastColumn

# Check if data exists (headers in row 1, data in row 2+)
if max_row < 2:
    raise ValueError("No data found! Ensure headers are in row 1 and data starts in row 2.")

# Create XML root element
root = ET.Element("Info")

# Loop through data rows (skip row 1: headers)
for row in range(2, max_row + 1):
    # Create a row element (e.g., <Order>)
    row_element = ET.SubElement(root, "Order")

    # Loop through columns to get headers and values
    for col in range(1, max_col + 1):
        # Get header (XML tag)
        header = worksheet.Range[1, col].Value

        # Get cell value (data for the XML tag)
        cell_value = worksheet.Range[row, col].Value

        # Create sub-element and add value
        ET.SubElement(row_element, header).text = str(cell_value)

# Pretty-print XML (add indentation for readability)
rough_string = ET.tostring(root, 'utf-8')
reparsed = minidom.parseString(rough_string)
pretty_xml = reparsed.toprettyxml(indent="  ")

# Save to XML file
with open("Orders.xml", 'w', encoding='utf-8') as f:
    f.write(pretty_xml)

# Close the workbook
workbook.Dispose()

Code Explanation:

  • Load Excel File: Uses Spire.XLS to open an Excel file and access its first worksheet.
  • Define Data Range: Finds the last row and column with data (1-based indexing in Spire.XLS) to know the bounds of the data.
  • Build XML Structure:
    • Creates a root XML element <Info>.
    • Loops through each data row (starting from row 2), making a <Order> element for each.
    • For each column in the row: Use the header from row 1 as the XML tag, and the cell value as the tag’s content.
  • Save XML: Formats the XML with indentation (for readability) and saves it to an XML file.

Example Output XML

Convert Excel to a structured XML file using Python

Spire.XLS for Python also provides a direct SaveAsXml() method to convert Excel to Open XML format.


Final Thoughts

Converting Excel to XML is a fundamental skill that unlocks the true potential of your data, allowing it to flow seamlessly into web applications, configuration files, and other business systems. By understanding the methods available, from the controlled power of Excel's XML Maps to the rapid-fire ease of online tools and the industrial strength of Python scripting, you can choose the right tool for every job.


FAQs About Excel to XML Conversion

Q1: How do I batch-convert multiple Excel files to XML at once?

Batch conversion saves time for 10+ files—use these methods based on your skill level:

  • Online tools: Zamzar supports up to 50 files per batch (upload all Excel files, select “XML” as output, and get a zip of XMLs).
  • Spire.XLS for Python: Extend the provided script to loop through a folder
import os
from spire.xls import *
import xml.etree.ElementTree as ET
from xml.dom import minidom

input_folder = "C:/Excel_Batch"
output_folder = "C:/XML_Batch"
os.makedirs(output_folder, exist_ok=True)

for filename in os.listdir(input_folder):
    if filename.endswith((".xlsx", ".xls")):
        workbook = Workbook()
        workbook.LoadFromFile(os.path.join(input_folder, filename))
        # Add conversion logic (same as single file) here
        workbook.Dispose()

Q2: Can I customize the XML structure (e.g., change root/row tags)?

Yes, all methods support customization:

  • Manual: Edit the XML schema (.xsd file) to change root (e.g., <Catalog>) or row (e.g., <Product>) tags.
  • Online: Tools like WizlyTools let you set root/row element names in settings.
  • Python: Modify the script’s ET.Element("RootName") and ET.SubElement(root, "RowName") lines to match your needs.

Q3: My data has formulas. Will the XML output contain the formula or the calculated value?

In almost all cases, the XML export will contain the calculated value (the result of the formula), not the formula itself. XML is primarily a format for representing data, not the logic or calculations behind it. The conversion process reads the visible cell values.

Q4: My XML file looks messy when I open it. How can I format it properly?

The XML generated by these methods is often in a compressed format without line breaks or indentation. This is still valid XML. To view it in a readable format:

  • Open the file in a modern web browser (Chrome, Firefox, Edge), which will automatically format and display it neatly.
  • Use a code editor like VS Code, Notepad++, or Sublime Text, which have built-in XML formatting or plugins to re-indent the code.

See Also

A guide illustrating different methods to change the PDF page size

Have you ever downloaded a PDF only to find it’s an awkward, non-standard size that doesn’t fit your needs? Perhaps it's a poster-sized document you need to print on A4, or a legal document that must be converted to Letter size for a submission. Changing PDF page size can resolve issues like misaligned content, wasted paper, or poor readability.

In this guide, we’ll introduce how to resize PDF pages, including step-by-step methods for everyday users (no tech skills required) and a developer-friendly C# solution to automate bulk tasks.


Standard Paper Sizes You Should Know

When changing your PDF size, you'll likely use one of these common standards:

  • A4 (210 x 297 mm / 8.27" x 11.69"): The international standard, used everywhere except North America.
  • Letter (8.5" x 11" / 216 x 279 mm): The standard in the United States and Canada.
  • Legal (8.5" x 14" / 216 x 356 mm): Often used for contracts and official documents.
  • A3 (297 x 420 mm / 11.69" x 16.54"): Twice the size of A4, common for diagrams and small posters.
  • Tabloid/Ledger (11" x 17" / 279 x 432 mm): Often used for newspapers or large presentations.

Method 1: Using Free Online PDF Tools

For quick, one-off jobs without installing software, free online tools are excellent. They work on any browser and support basic to moderate resizing needs.

Popular Tools: Smallpdf, PDF2Go, and Soda PDF.

General process to change PDF page size online:

  • Go to the tool’s “Resize PDF” page (e.g., PDF2Go Resize PDF tool).
  • Upload your PDF file by dragging and dropping or selecting it from your computer.
  • Select a preset page size (A4, A3, Letter, etc.) from the dropdown menu, or enter custom dimensions.
  • Click the "Start" button and then download the resized PDF.

Change PDF page size using free online tool

Note: While convenient, online PDF resizing tools may pose privacy risks for sensitive documents, and free plans usually limit file size and number of tasks.

Also Read: How to Crop PDFs for Free (3 Easy Methods)


Method 2: Using Adobe Acrobat (Standard/Pro)

For precise control, especially with professional or complex documents, Adobe Acrobat (Pro or Standard) is the industry standard.

How to resize PDF pages with Adobe Acrobat:

  • Open your PDF in Acrobat and go to “File > Print” (or press “Ctrl+P”).
  • Choose "Microsoft Print to PDF" as your printer.
  • Go to "Page Setup" and select your desired paper size (e.g., Letter, A4).
  • In the “Page Size & Handling” section, choose “Fit” to scale the content to the new page size.
  • Click "Print" to apply the changes.

Resize PDF pages using the print dialog in Adobe Acrobat

Best for: Users who need high precision, batch processing, or are working with sensitive or professional documents.


Method 3: Using Windows Built-In Printer Feature

This is a clever workaround available to all Windows 10 and 11 users. It uses the printer settings to effectively create a new PDF with a different size.

Steps to resize PDF pages in Windows:

  • Open your file in any PDF viewer (like the built-in Microsoft Edge).
  • Press “Ctrl + P” or click the printer icon.
  • Choose “Microsoft Print to PDF” as your printer, and click "More settings."
  • Look for a "Paper Size" dropdown and select your desired size (e.g., A4, Letter).
  • Check "Fit to printable area” to scale the content appropriately to the new size.
  • Click “Print” to save the new, resized PDF under a different name or location.

Change PDF page size Using the Windows print dialog

When to Use This Method: You’re a Windows 10/11 user and want a free, no-installation solution to resize simple PDF documents.


Method 4: Using C# to Change PDF Page Size

For developers, automating PDF resizing (e.g., in apps, batch jobs, or workflows) saves time and reduces errors. Spire.PDF for .NET is a robust library that allows you to adjust the PDF page sizes via C#.

How to set PDF page size in C#:

The code below creates a new PDF with A3-sized pages, then scales and copies content from the original PDF to fit the new dimensions.

using Spire.Pdf;
using Spire.Pdf.Graphics;
using System.Drawing;

namespace ChangePDFSize
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a PdfDocument instance
            PdfDocument originPdf = new PdfDocument();
            // Load the original PDF document
            originPdf.LoadFromFile("Sample.pdf");

            // Create a new PDF document
            PdfDocument newPdf = new PdfDocument();

            // Loop through the pages in the original PDF
            foreach(PdfPageBase page in originPdf.Pages)
            {
                // Add an A3-sized page to the new PDF
                PdfPageBase newPage = newPdf.Pages.Add(PdfPageSize.A3, new PdfMargins(0));
                // Create a PdfTextLayout instance
                PdfTextLayout layout = new PdfTextLayout();
                // Set text layout as one page to ensure the content will scale to fit page size
                layout.Layout = PdfLayoutType.OnePage;
                // Create a template from the original page
                PdfTemplate template = page.CreateTemplate();
                // Draw the templates onto the pages in the new PDF
                template.Draw(newPage, new PointF(0, 0), layout);
            }

            // Save the resized document
            newPdf.SaveToFile("ChangePageSizeToA3.pdf");
        }
    }
}

Core Methods:

  • PdfDocument.Pages.Add(PdfPageSize size, PdfMargins margins): Adds a new page to a PdfDocument with a specified size and margins.
  • PdfPageBase.CreateTemplate(): Captures content (text, images, shapes, etc.) from the original page.
  • PdfTemplate.Draw(PdfPageBase targetPage, PointF location, PdfTextLayout layout): Draws the content of a PdfTemplate onto the new PDF page, with positioning and layout rules.

Here’s the result PDF file with a page size of A3 (11.69 x 16.54 inch):

Change PDF page size with C#

If you need to customize the PDF page size, refer to: Change PDF Page Size to a Custom Paper Size in C#


FAQ: Common PDF Resizing Questions

Q1: Will resizing a PDF page distort my text or images?

A: When done correctly, the content should scale proportionally. However, drastic size changes (e.g., A4 to A6) can make text very small or images pixelated. Always check the output file. Using the "Fit to page" or similar scaling option is crucial to maintain proportions.

Q2: Is there a free Mac alternative to Microsoft Print to PDF?

A: Yes, use Mac’s default Preview app:

  1. Open the PDF in Preview.
  2. Go to “File > Print”.
  3. Click the “Paper Size” dropdown and select your desired paper size from the list.
  4. Click “PDF” (bottom-left) and choose "Save as PDF" to export.

Q3: Which method is best for batch resizing PDF pages?

A: Free online tools often have limitations on batch processing. Adobe Acrobat Pro has built-in batch actions for this purpose. For large-scale, automated batch processing, the programmatic C# method using Spire.PDF is the most efficient and powerful solution.

Q4: As a developer, is Spire.PDF free to use?

A: Spire.PDF for .NET offers a free community edition with limitations, which is good for evaluation and small projects. If you want to test the full functionality, you can apply for a 30-day trial license here.


Final Thoughts

Adjusting PDF page size is straightforward with online tools or Adobe Acrobat for general users. If you’re on Windows and want a reliable, offline option for simple documents, the built-in “Microsoft Print to PDF” feature is perfect. For automation or app integration, Spire.PDF for .NET offers a powerful C# solution. No matter which tool you choose, always check the final PDF to ensure content and formatting stay correct. With this guide, you are now equipped to handle any PDF page size challenge efficiently.


See Also

Page 4 of 4