Convert HTML to Text: Easy Solutions for Beginners & Devs

In today's digital world, HTML (HyperText Markup Language) is the backbone of web content. However, there are countless scenarios where you need clean, unformatted plain text instead. Whether you're extracting data for analysis, simplifying content for an email, or preparing text for SEO analysis, knowing how to convert HTML to text is an essential skill.
This comprehensive guide will walk you through the most effective methods to convert HTML to plain text, from simple online tools for beginners to powerful code libraries for developers.
- Key Benefits of HTML to Text Conversion
- Two Easy Methods to Convert HTML to Plain Text
- Advanced: Convert HTML to Text with Code (For Developers)
- Frequently Asked Questions (FAQs)
Key Benefits of HTML to Text Conversion
Stripping away the HTML markup to get clean, readable text serves several crucial purposes:
- Data Processing and Analysis: For data scientists and analysts, plain text is the starting point for Natural Language Processing (NLP), sentiment analysis, and keyword extraction. HTML tags are just noise for these algorithms.
- Search Engine Optimization (SEO): When search engines crawl your site, they primarily index the plain text content. Ensuring that your core message is easily extractable from HTML helps with on-page SEO and ranking.
- Content Repurposing: Plain text is lightweight and versatile, perfect for email newsletters, app notifications, social media previews, or document snippets.
- Web Scraping Efficiency: Web scrapers are designed to extract specific data. Converting the raw HTML response to text is often the first step in filtering out the essential information from the presentation layer.
Two Easy Methods to Convert HTML to Plain Text
For non-technical users, these user-friendly tools deliver fast results without manual tag removal.
1. Online HTML to Text Converters
For quick, one-off conversions, online tools automate markup stripping with minimal effort.
Top Tools:
CLOUDXDOCS, Convertio, CodeBeautify (all free, no sign-up).
General Steps:
- Upload your HTML file.
- Select “Convert” or “Extract Text.”
- Download the plain-text output (usually as a .txt file).
Example of using CLOUDXDOCS:

Pros: Fast, require no technical skills, and often preserve basic formatting like line breaks.
Cons: Not suitable for batch processing; privacy concerns with sensitive data.
You may also like: 5 Best Free HTML to Word Converters (Tested & Recommended)
2. Word Processors (Microsoft Word, Google Docs)
Leverage familiar office tools to change HTML to text effectively—no extra software required.
Microsoft Word:
- Open Word and go to “File > Open”
- Select your HTML file (choose “All Files” from the dropdown to see it).
- Word will convert the HTML to an editable document.
- Go to “File > Save As” and select “Plain Text (*.txt)” as the format.

Google Docs:
- Upload the HTML file to Google Drive.
- Right-click the file and select “Open with > Google Docs”
- Google Docs will render the HTML as text.
- Download as “Plain text (.txt)” via “File > Download”

Best For: Users who are already comfortable with these applications and need to perform this task infrequently.
Advanced: Convert HTML to Text with Code (For Developers)
If you need to automate conversions (e.g., bulk processing, web scraping), using programming languages like Python or C# is the most powerful approach.
1. Convert HTML to Text in Python
The Spire.Doc for Python library provides the SaveToFile method for saving HTML files as TXT files directly.
- Install via Pypi:
pip install Spire.Doc
- Write the Python script:
from spire.doc import *
from spire.doc.common import *
# Load an HTML file
document = Document()
document.LoadFromFile("sample.html", FileFormat.Html, XHTMLValidationType.none)
# Save as a plain text file
document.SaveToFile("HtmlToText.txt", FileFormat.Txt)
document.Close()
If you need to process HTML strings, check this: How to Convert an HTML String to Text in Python
2. Convert HTML to Plain Text in C#
For .NET applications, Spire.Doc for .NET is a robust solution to extract text from HTML.
- Install via Nuget:
Install-Package Spire.Doc
- Sample C# code:
using Spire.Doc;
using Spire.Doc.Documents;
namespace HtmlToText
{
class Program
{
static void Main()
{
// Create a Document object
Document doc = new Document();
// Load an HTML file
doc.LoadFromFile("sample.html", FileFormat.Html, XHTMLValidationType.None);
// Convert HTML to plain text
doc.SaveToFile("HTMLtoText.txt", FileFormat.Txt);
doc.Dispose();
}
}
}
The converted TXT file:

Best For: Developers, data scientists, and anyone needing automated, scalable, or customized text extraction.
Conclusion
Knowing how to convert HTML to text is a fundamental skill that bridges the gap between web content and usable data. Beginners can use online tools or word processors for quick tasks, while developers can automate with code for bulk or custom workflows. No matter your skill level, the goal is to get clean, usable text that fits your use case.
By following these methods, you’ll save time, avoid formatting headaches, and unlock the full potential of plain text in your digital workflow.
Frequently Asked Questions (FAQs)
Q: Why can't I just copy and paste text from a website?
A: Copying and pasting directly from a webpage often pulls hidden formatting, extra spaces, or partial HTML tags. This leads to messy text that requires manual cleaning. HTML-to-text tools/ methods strip only the markup while preserving the core content, saving you time.
Q: Can I convert HTML to rich text (RTF) instead of plain text?
A: Yes, most online tools (e.g., Convertio) and word processors support RTF output. For coding, use Spire.Doc to save HTML as RTF while preserving formatting like bold, italics, and headings.
Q: What is the best method for converting multiple HTML files at once?
A: For bulk conversion, using a script is the most efficient method. You can write a simple Python or C# script to loop through all files in a directory and convert them one by one.
Q: Are online HTML-to-text converters safe to use?
A: You should avoid pasting sensitive, confidential, or proprietary HTML code into online tools. While most reputable sites are safe, there's a risk that your data could be intercepted or stored. For sensitive information, always use a local method like a script on your own computer.
See Also
Convert Excel to XML: Free Tools + Python Automation

If you’re working with data that needs to be shared across software, APIs, or websites, converting Excel to XML is a critical skill. XML (Extensible Markup Language) is a universal, machine-readable format ideal for data interchange, while Excel excels at organizing spreadsheets. Bridging the gap between them allows your data to flow seamlessly across platforms.
Whether you're a beginner looking for a quick solution, an analyst ensuring data integrity, or a developer building an automated pipeline, this guide will walk you through three effective and error-free Excel to XML conversion methods.
- Method 1: Manual Excel to XML Conversion (Built-in Developer Tab)
- Method 2: Free Excel to XML Online Converters
- Method 3: Automate Excel to XML Conversion in Python
Why Convert Excel to XML?
The core value of converting Excel to XML lies in:
- Cross-Platform Compatibility: XML works with APIs, CMS platforms (WordPress, Shopify), enterprise tools (SAP, Salesforce), and programming languages (Python, Java).
- Structured Data: XML preserves Excel’s rows/columns as tagged elements (e.g.,
<product><name>Laptop</name><price>999</price></product>), making it easy for software to parse. - Data Integrity: XML avoids formatting issues that plague Excel files (e.g., broken formulas, merged cells) when shared across systems.
- Use Cases: E-commerce product feeds, API data exports, CMS content imports, and enterprise data integration.
Structuring Excel Data for Successful Conversion
The first and most critical step happens in Excel. A well-structured source sheet guarantees a smooth conversion.
- Use unique, short column headers (no spaces/special characters—e.g., "ProductName" instead of "Product Name").
- Remove empty rows/columns (they cause XML validation errors).
- Ensure consistent data types (e.g., all prices as numbers, dates in a uniform format).
- Avoid merged cells or complex formulas; use raw data instead.
Method 1: Manual Excel to XML Conversion (Built-in Developer Tab)
Excel’s built-in Developer Tab provides native support for XML tasks, making it accessible to Excel users without programming knowledge.
Step-by-Step Guide:
Step 1: Prepare Your Excel Data
Structure your data in a clear, tabular format. The first row should contain your column headers, which will become your XML tags. For this example, let's use a simple product list:

Step 2: Create an XML Schema
An XML schema (.xsd file) defines the structure of your XML output. It specifies which tags to use and how they relate to each other.
Example Schema (save this as “products.xsd” on your computer):
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="Product" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="ID" type="xs:integer"/>
<xs:element name="ProductName" type="xs:string"/>
<xs:element name="Category" type="xs:string"/>
<xs:element name="Price" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
- Adjust the name attributes to match your Excel column headers (case-sensitive).
- Use appropriate types (e.g., xsd:integer for numbers, xsd:string for text).
Step 3: Add the Schema to Excel
- Open your Excel file.
- Go to the Developer tab. (If you don’t see it, go to File > Options > Customize Ribbon, check "Developer," and click "OK.")
- In the Developer tab, click Source to open the XML Source pane.
- Click XML Maps... and then Add....
- Attach the .xsd schema file you created using the XML Maps dialog

Step 4: Map XML Elements to Your Cells
- In the XML Source pane, you will see your elements (“Catalog”, “Product”, “ID”, etc.).
- Drag the “Product” element and drop it onto cell A1. Excel will create a nicely formatted table.
- The elements will now be mapped. Your data is now ready for export.

Step 5: Export to XML
- With your mapped data selected, go back to the Developer tab.
- Click Export.
- Choose a location and name for your new XML file and click OK.
The exported XML file will match your schema, like this:

You may also like: Convert XLS/XLSX to HTML Easily — Keep Excel Formatting Intact
Method 2: Free Excel to XML Online Converters
If you need a quick, one-time conversion without the technical setup, online tools are your best friend.
Top tools to convert Excel into XML online (100% free, no sign-up):
1. CloudxDocs (Most User-Friendly)
- Go to CloudxDocs EXCEL TO XML converter.
- Upload your Excel file (supports XLS, XLSX, XLSB).
- Wait for the conversion to complete and download your XML file.
This converter defaults to producing an XML Spreadsheet file that preserves Excel’s formatting.

2. WizlyTools (Supports Custom XML)
- Go to WizlyTools Excel to XML converter.
- Upload an Excel file (.xlsx, .xls).
- Enter the root element name and row element name.
- Select an indentation size for your XML output.
- Click "Convert to XML" and then copy or download the resulting XML
This converter gives you more control over the XML structure, such as defining a specific root element, row element, and allows you to preview XML before downloading

Note: Avoid uploading sensitive data (e.g., financial records) to online tools. Use manual methods for confidential files.
Method 3: Automate Excel to XML Conversion in Python
For developers and IT professionals working in Python, Spire.XLS for Python provides a robust, code-driven solution for converting Excel spreadsheets to XML without requiring Microsoft Excel to be installed. This is ideal for server-side automation and application integration.
Step 1: Installation
First, you need to install the Spire.XLS for Python library. It's available via PyPI.
pip install Spire.XLS
Step 2: Write the Python Script
This script uses Spire.XLS to read Excel data, then generates XML with Python’s built-in xml.etree.ElementTree (for structure) and minidom (for pretty formatting).
from spire.xls import *
import xml.etree.ElementTree as ET
from xml.dom import minidom
# Load the Excel file
workbook = Workbook()
workbook.LoadFromFile("Orders.xlsx")
# Get the first worksheet
worksheet = workbook.Worksheets[0]
# Get row and column counts (Spire.XLS uses 1-based indexing)
max_row = worksheet.LastRow
max_col = worksheet.LastColumn
# Check if data exists (headers in row 1, data in row 2+)
if max_row < 2:
raise ValueError("No data found! Ensure headers are in row 1 and data starts in row 2.")
# Create XML root element
root = ET.Element("Info")
# Loop through data rows (skip row 1: headers)
for row in range(2, max_row + 1):
# Create a row element (e.g., <Order>)
row_element = ET.SubElement(root, "Order")
# Loop through columns to get headers and values
for col in range(1, max_col + 1):
# Get header (XML tag)
header = worksheet.Range[1, col].Value
# Get cell value (data for the XML tag)
cell_value = worksheet.Range[row, col].Value
# Create sub-element and add value
ET.SubElement(row_element, header).text = str(cell_value)
# Pretty-print XML (add indentation for readability)
rough_string = ET.tostring(root, 'utf-8')
reparsed = minidom.parseString(rough_string)
pretty_xml = reparsed.toprettyxml(indent=" ")
# Save to XML file
with open("Orders.xml", 'w', encoding='utf-8') as f:
f.write(pretty_xml)
# Close the workbook
workbook.Dispose()
Code Explanation:
- Load Excel File: Uses Spire.XLS to open an Excel file and access its first worksheet.
- Define Data Range: Finds the last row and column with data (1-based indexing in Spire.XLS) to know the bounds of the data.
- Build XML Structure:
- Creates a root XML element
<Info>. - Loops through each data row (starting from row 2), making a
<Order>element for each. - For each column in the row: Use the header from row 1 as the XML tag, and the cell value as the tag’s content.
- Creates a root XML element
- Save XML: Formats the XML with indentation (for readability) and saves it to an XML file.
Example Output XML

Spire.XLS for Python also provides a direct SaveAsXml() method to convert Excel to Open XML format.
Final Thoughts
Converting Excel to XML is a fundamental skill that unlocks the true potential of your data, allowing it to flow seamlessly into web applications, configuration files, and other business systems. By understanding the methods available, from the controlled power of Excel's XML Maps to the rapid-fire ease of online tools and the industrial strength of Python scripting, you can choose the right tool for every job.
FAQs About Excel to XML Conversion
Q1: How do I batch-convert multiple Excel files to XML at once?
Batch conversion saves time for 10+ files—use these methods based on your skill level:
- Online tools: Zamzar supports up to 50 files per batch (upload all Excel files, select “XML” as output, and get a zip of XMLs).
- Spire.XLS for Python: Extend the provided script to loop through a folder
import os
from spire.xls import *
import xml.etree.ElementTree as ET
from xml.dom import minidom
input_folder = "C:/Excel_Batch"
output_folder = "C:/XML_Batch"
os.makedirs(output_folder, exist_ok=True)
for filename in os.listdir(input_folder):
if filename.endswith((".xlsx", ".xls")):
workbook = Workbook()
workbook.LoadFromFile(os.path.join(input_folder, filename))
# Add conversion logic (same as single file) here
workbook.Dispose()
Q2: Can I customize the XML structure (e.g., change root/row tags)?
Yes, all methods support customization:
- Manual: Edit the XML schema (.xsd file) to change root (e.g.,
<Catalog>) or row (e.g.,<Product>) tags. - Online: Tools like WizlyTools let you set root/row element names in settings.
- Python: Modify the script’s ET.Element("RootName") and ET.SubElement(root, "RowName") lines to match your needs.
Q3: My data has formulas. Will the XML output contain the formula or the calculated value?
In almost all cases, the XML export will contain the calculated value (the result of the formula), not the formula itself. XML is primarily a format for representing data, not the logic or calculations behind it. The conversion process reads the visible cell values.
Q4: My XML file looks messy when I open it. How can I format it properly?
The XML generated by these methods is often in a compressed format without line breaks or indentation. This is still valid XML. To view it in a readable format:
- Open the file in a modern web browser (Chrome, Firefox, Edge), which will automatically format and display it neatly.
- Use a code editor like VS Code, Notepad++, or Sublime Text, which have built-in XML formatting or plugins to re-indent the code.
See Also
How to Change PDF Page Size: Online, Adobe & With Code
Table of Contents

Have you ever downloaded a PDF only to find it’s an awkward, non-standard size that doesn’t fit your needs? Perhaps it's a poster-sized document you need to print on A4, or a legal document that must be converted to Letter size for a submission. Changing PDF page size can resolve issues like misaligned content, wasted paper, or poor readability.
In this guide, we’ll introduce how to resize PDF pages, including step-by-step methods for everyday users (no tech skills required) and a developer-friendly C# solution to automate bulk tasks.
- Standard Paper Sizes You Should Know
- Method 1: Using Free Online PDF Tools
- Method 2: Using Adobe Acrobat (Standard/Pro)
- Method 3: Using Windows Built-In Printer Feature
- Method 4: Using C# to Change PDF Page Size
- FAQ: Common PDF Resizing Questions
- Final Thoughts
Standard Paper Sizes You Should Know
When changing your PDF size, you'll likely use one of these common standards:
- A4 (210 x 297 mm / 8.27" x 11.69"): The international standard, used everywhere except North America.
- Letter (8.5" x 11" / 216 x 279 mm): The standard in the United States and Canada.
- Legal (8.5" x 14" / 216 x 356 mm): Often used for contracts and official documents.
- A3 (297 x 420 mm / 11.69" x 16.54"): Twice the size of A4, common for diagrams and small posters.
- Tabloid/Ledger (11" x 17" / 279 x 432 mm): Often used for newspapers or large presentations.
Method 1: Using Free Online PDF Tools
For quick, one-off jobs without installing software, free online tools are excellent. They work on any browser and support basic to moderate resizing needs.
Popular Tools: Smallpdf, PDF2Go, and Soda PDF.
General process to change PDF page size online:
- Go to the tool’s “Resize PDF” page (e.g., PDF2Go Resize PDF tool).
- Upload your PDF file by dragging and dropping or selecting it from your computer.
- Select a preset page size (A4, A3, Letter, etc.) from the dropdown menu, or enter custom dimensions.
- Click the "Start" button and then download the resized PDF.

Note: While convenient, online PDF resizing tools may pose privacy risks for sensitive documents, and free plans usually limit file size and number of tasks.
Also Read: How to Crop PDFs for Free (3 Easy Methods)
Method 2: Using Adobe Acrobat (Standard/Pro)
For precise control, especially with professional or complex documents, Adobe Acrobat (Pro or Standard) is the industry standard.
How to resize PDF pages with Adobe Acrobat:
- Open your PDF in Acrobat and go to “File > Print” (or press “Ctrl+P”).
- Choose "Microsoft Print to PDF" as your printer.
- Go to "Page Setup" and select your desired paper size (e.g., Letter, A4).
- In the “Page Size & Handling” section, choose “Fit” to scale the content to the new page size.
- Click "Print" to apply the changes.

Best for: Users who need high precision, batch processing, or are working with sensitive or professional documents.
Method 3: Using Windows Built-In Printer Feature
This is a clever workaround available to all Windows 10 and 11 users. It uses the printer settings to effectively create a new PDF with a different size.
Steps to resize PDF pages in Windows:
- Open your file in any PDF viewer (like the built-in Microsoft Edge).
- Press “Ctrl + P” or click the printer icon.
- Choose “Microsoft Print to PDF” as your printer, and click "More settings."
- Look for a "Paper Size" dropdown and select your desired size (e.g., A4, Letter).
- Check "Fit to printable area” to scale the content appropriately to the new size.
- Click “Print” to save the new, resized PDF under a different name or location.

When to Use This Method: You’re a Windows 10/11 user and want a free, no-installation solution to resize simple PDF documents.
Method 4: Using C# to Change PDF Page Size
For developers, automating PDF resizing (e.g., in apps, batch jobs, or workflows) saves time and reduces errors. Spire.PDF for .NET is a robust library that allows you to adjust the PDF page sizes via C#.
How to set PDF page size in C#:
The code below creates a new PDF with A3-sized pages, then scales and copies content from the original PDF to fit the new dimensions.
using Spire.Pdf;
using Spire.Pdf.Graphics;
using System.Drawing;
namespace ChangePDFSize
{
class Program
{
static void Main(string[] args)
{
// Create a PdfDocument instance
PdfDocument originPdf = new PdfDocument();
// Load the original PDF document
originPdf.LoadFromFile("Sample.pdf");
// Create a new PDF document
PdfDocument newPdf = new PdfDocument();
// Loop through the pages in the original PDF
foreach(PdfPageBase page in originPdf.Pages)
{
// Add an A3-sized page to the new PDF
PdfPageBase newPage = newPdf.Pages.Add(PdfPageSize.A3, new PdfMargins(0));
// Create a PdfTextLayout instance
PdfTextLayout layout = new PdfTextLayout();
// Set text layout as one page to ensure the content will scale to fit page size
layout.Layout = PdfLayoutType.OnePage;
// Create a template from the original page
PdfTemplate template = page.CreateTemplate();
// Draw the templates onto the pages in the new PDF
template.Draw(newPage, new PointF(0, 0), layout);
}
// Save the resized document
newPdf.SaveToFile("ChangePageSizeToA3.pdf");
}
}
}
Core Methods:
- PdfDocument.Pages.Add(PdfPageSize size, PdfMargins margins): Adds a new page to a PdfDocument with a specified size and margins.
- PdfPageBase.CreateTemplate(): Captures content (text, images, shapes, etc.) from the original page.
- PdfTemplate.Draw(PdfPageBase targetPage, PointF location, PdfTextLayout layout): Draws the content of a PdfTemplate onto the new PDF page, with positioning and layout rules.
Here’s the result PDF file with a page size of A3 (11.69 x 16.54 inch):

If you need to customize the PDF page size, refer to: Change PDF Page Size to a Custom Paper Size in C#
FAQ: Common PDF Resizing Questions
Q1: Will resizing a PDF page distort my text or images?
A: When done correctly, the content should scale proportionally. However, drastic size changes (e.g., A4 to A6) can make text very small or images pixelated. Always check the output file. Using the "Fit to page" or similar scaling option is crucial to maintain proportions.
Q2: Is there a free Mac alternative to Microsoft Print to PDF?
A: Yes, use Mac’s default Preview app:
- Open the PDF in Preview.
- Go to “File > Print”.
- Click the “Paper Size” dropdown and select your desired paper size from the list.
- Click “PDF” (bottom-left) and choose "Save as PDF" to export.
Q3: Which method is best for batch resizing PDF pages?
A: Free online tools often have limitations on batch processing. Adobe Acrobat Pro has built-in batch actions for this purpose. For large-scale, automated batch processing, the programmatic C# method using Spire.PDF is the most efficient and powerful solution.
Q4: As a developer, is Spire.PDF free to use?
A: Spire.PDF for .NET offers a free community edition with limitations, which is good for evaluation and small projects. If you want to test the full functionality, you can apply for a 30-day trial license here.
Final Thoughts
Adjusting PDF page size is straightforward with online tools or Adobe Acrobat for general users. If you’re on Windows and want a reliable, offline option for simple documents, the built-in “Microsoft Print to PDF” feature is perfect. For automation or app integration, Spire.PDF for .NET offers a powerful C# solution. No matter which tool you choose, always check the final PDF to ensure content and formatting stay correct. With this guide, you are now equipped to handle any PDF page size challenge efficiently.