Knowledgebase (2329)
Children categories
Comments in Word documents often hold valuable information, such as feedback, suggestions, and notes. Unfortunately, editors like Microsoft Word lack a built-in feature for batch-extracting comments, leaving users to rely on cumbersome methods like copying and pasting or using VBA macros. To simplify this process, this article demonstrates how to use Java to extract comments from Word documents with Spire.Doc for Java. With a streamlined approach, you can easily retrieve all comment text and images in a single operation—quickly, efficiently, and error-free. Let's explore how it’s done.
- Extract Comments Text from Word Documents in Java
- Extract Comment Images from Word Documents in Java
Install Spire.Doc for Java
First of all, you're required to add the Spire.Doc.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.doc</artifactId>
<version>14.4.9</version>
</dependency>
</dependencies>
Extract Comments Text from Word Documents in Java
Using Java to extract all comment text is easy and quick. Firstly, loop through all comments in the Word file and get the current comment using the Document.getComments().get() method offered by Spire.Doc for Java. Then iterate through all paragraphs in the comment body and get the current paragraph. Finally, text from comment paragraphs will be extracted using the Paragraph.getText() method. Let's dive into the detailed steps.
Steps to extract comment text from Word files:
- Create an object of Document class.
- Load a Word document from files using Document.loadFromFile() method.
- Iterate through all comments in the Word file.
- Get the current comment with Document.getComments().get() method.
- Loop through paragraphs in the comment and access the current paragraph through Comment.getBody().getParagraphs().get() method.
- Extract the text of the paragraphs in comments by calling Paragraph.getText() method.
- Get the current comment with Document.getComments().get() method.
- Save the extracted comments.
The code example below demonstrates how to extract all comment text from a Word document:
- Java
import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.*;
import java.io.*;
public class ExtractComments {
public static void main(String[] args) throws IOException {
// Create a new Document instance
Document doc = new Document();
// Load the document from the specified input file
doc.loadFromFile("/comments.docx");
// Iterate over each comment in the document
for (int i = 0; i < doc.getComments().getCount(); i++) {
// Get the comment at the current index
Comment comment = doc.getComments().get(i);
// Iterate over each paragraph in the comment's body
for (int j = 0; j < comment.getBody().getParagraphs().getCount(); j++) {
// Get the paragraph at the current index
Paragraph para = comment.getBody().getParagraphs().get(j);
// Get the text of the paragraph and append a line break
String result = para.getText() + "\r\n";
// Write the extracted comment a text file
writeStringToTxt(result, "/commenttext.txt");
}
}
// Dispose of the document resources
doc.dispose();
}
// Custom method to write a string to a text file
public static void writeStringToTxt(String content, String txtFileName) throws IOException {
FileWriter fWriter = new FileWriter(txtFileName, true);
try {
// Write the content to the text file
fWriter.write(content);
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
// Flush and close the FileWriter
fWriter.flush();
fWriter.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}

Extract Comments Images from Word Documents with Java
Sometimes, comments in a document may contain not only text but also images. With the methods provided by Spire.Doc for Java, you can easily extract all images from comments in bulk. The process is similar to extracting text: you need to iterate through each comment, the paragraphs in the comment body, and the child objects of each paragraph. Then, check if the object is a DocPicture. If it is, use the DocPicture.getImageBytes() method to extract the image.
Steps to extract comment images from Word documents:
- Create an instance of Document class.
- Specify the file path to load a source Word file through Document.loadFromFile() method.
- Create a list to store extracted data.
- Loop through comments in the Word file and get the current comment using Document.getComments().get() method.
- Loop through all paragraphs in a comment, and get the current paragraph with Comment.getBody().getParagraphs().get() method.
- Iterate through each child object of a paragraph, and access a child object through Paragraph.getChildObjects().get() method.
- Check if the child object is DocPicture, if it is, get the image data using DocPicture.getImageBytes() method.
- Loop through all paragraphs in a comment, and get the current paragraph with Comment.getBody().getParagraphs().get() method.
- Add the image data to the list and save it as image files.
Here is the code example of extracting all comment images from a Word file:
- Java
import com.spire.doc.*;
import com.spire.doc.documents.*;
import com.spire.doc.fields.*;
import java.io.*;
import java.nio.file.*;
import java.util.ArrayList;
import java.util.List;
public class ExtractCommentImages {
public static void main(String[] args) {
// Create an object of the Document class
Document document = new Document();
// Load a Word document with comments
document.loadFromFile("/comments.docx");
// Create a list to store the extracted image data
List<byte[]> images = new ArrayList<>();
// Loop through the comments in the document
for (int i = 0; i < document.getComments().getCount(); i++) {
Comment comment = document.getComments().get(i);
// Iterate through the paragraphs in the comment body
for (int j = 0; j < comment.getBody().getParagraphs().getCount(); j++) {
Paragraph paragraph = comment.getBody().getParagraphs().get(j);
// Loop through the child objects in the paragraph
for (int k = 0; k < paragraph.getChildObjects().getCount(); k++) {
DocumentObject obj = paragraph.getChildObjects().get(k);
// Check if it is a picture
if (obj instanceof DocPicture) {
DocPicture picture = (DocPicture) obj;
// Get the image date and add it to the list
images.add(picture.getImageBytes());
}
}
}
}
// Specify the output file path
String outputDir = "/comment_images/";
new File(outputDir).mkdirs();
// Save the image data as image files
for (int i = 0; i < images.size(); i++) {
String fileName = String.format("comment-image-%d.png", i);
Path filePath = Paths.get(outputDir, fileName);
try (FileOutputStream fos = new FileOutputStream(filePath.toFile())) {
fos.write(images.get(i));
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Applying styles is one of the simplest ways to enhance the professionalism and readability of Excel spreadsheets. Excel provides a wide range of built-in styles that allow users to quickly format cells, ranges, or worksheets. Additionally, users can create custom styles to specify fonts, colors, borders, number formats, and more, tailored to their individual preferences. Whether you're designing professional reports, sales presentations, or project management plans, knowing how to use styles effectively helps make data more visually appealing and easier to understand.
In this guide, you will learn how to apply styles to cells or worksheets in Excel in Python using Spire.XLS for Python.
- Apply a Built-in Style to Cells in Excel in Python
- Apply a Custom Style to Cells in Excel in Python
- Apply a Custom Style to a Worksheet in Excel in Python
Install Spire.XLS for Python
This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.
pip install Spire.XLS
If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows
Apply a Built-in Style to Cells in Excel in Python
Spire.XLS for Python offers the CellRange.BuiltInStyle property, which enables developers to apply built-in styles, such as Title, Heading 1, and Heading 2 to individual cells or ranges in Excel. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet by its index using the Workbook.Worksheets[index] property.
- Get the desired cell or range of cells using the Worksheet.Range[] property.
- Apply a built-in style to the cell or range of cells using the CellRange.BuiltInStyle property.
- Save the resulting file using the Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
# Create an object of the Workbook class
workbook = Workbook()
# Load the Excel file
workbook.LoadFromFile("Sample.xlsx")
# Get the first sheet
sheet = workbook.Worksheets[0]
# Get the desired cell range
range = sheet.Range["A1:H1"]
# Apply a built-in style to the cell range
range.BuiltInStyle = BuiltInStyles.Heading2
# Save the resulting file
workbook.SaveToFile("ApplyBuiltinStyle.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Apply a Custom Style to Cells in Excel in Python
Developers can use the Workbook.Styles.Add() method to create a custom style, which can then be applied to individual cells or ranges using the CellRange.Style property. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet by its index using the Workbook.Worksheets[index] property.
- Get the desired cell or range of cells using the Worksheet.Range[] property.
- Add a custom style to the workbook using the Workbook.Styles.Add() method.
- Define the formatting, such as the font size, font color, text alignment, cell borders and cell background color, using the properties of the CellStyle class.
- Apply the custom style to the cell or range of cells using the CellRange.Style property.
- Save the resulting file using the Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
# Create an object of the Workbook class
workbook = Workbook()
# Load the Excel file
workbook.LoadFromFile("Sample.xlsx")
# Get the first sheet
sheet = workbook.Worksheets[0]
# Get the desired cell range
range = sheet.Range["A1:H1"]
# Add a custom style to the workbook
style = workbook.Styles.Add("CustomCellStyle")
# Set the font size
style.Font.Size = 13
# Set the font color
style.Font.Color = Color.get_White()
# Bold the text
style.Font.IsBold = True
# Set the vertical text alignment
style.VerticalAlignment = VerticalAlignType.Bottom
# Set the horizontal text alignment
style.HorizontalAlignment = HorizontalAlignType.Left
# Set the bottom border color
style.Borders[BordersLineType.EdgeBottom].Color = Color.get_GreenYellow()
# Set the bottom border type
style.Borders[BordersLineType.EdgeBottom].LineStyle = LineStyleType.Medium
# Set the background color
style.Color = Color.get_CornflowerBlue()
# Apply the custom style to the cell range
range.Style = style
# Save the resulting file
workbook.SaveToFile("ApplyCustomStyle.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Apply a Custom Style to a Worksheet in Excel in Python
In certain cases, it may be necessary to apply a custom style to an entire worksheet rather than to specific cells or ranges. This can be accomplished using the Worksheet.ApplyStyle() method. The detailed steps are as follows.
- Create an object of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet by its index using the Workbook.Worksheets[index] property.
- Add a custom style to the workbook using the Workbook.Styles.Add() method.
- Define the formatting, such as the font size, font color, and cell background color, using the properties of the CellStyle class.
- Apply the custom style to the worksheet using the Worksheet.ApplyStyle() method.
- Save the resulting file using the Workbook.SaveToFile() method.
- Python
from spire.xls import *
from spire.xls.common import *
# Create an object of the Workbook class
workbook = Workbook()
# Load the Excel file
workbook.LoadFromFile("Sample.xlsx")
# Get the first sheet
sheet = workbook.Worksheets[0]
# Add a custom style to the workbook
style = workbook.Styles.Add("CustomSheetStyle")
# Set the font size
style.Font.Size = 12
# Set the font color
style.Font.Color = Color.FromRgb(91, 155, 213)
# Set the cell background color
style.Color = Color.FromRgb(242, 242, 242)
# Apply the custom style to the worksheet
sheet.ApplyStyle(style)
# Save the resulting file
workbook.SaveToFile("ApplyCustomStyleToSheet.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
Converting PDF to HTML is important for improving accessibility and interactivity in web environments. While PDFs are widely used for their reliable layout and ease of sharing, they can be restrictive when it comes to online use. HTML provides greater flexibility, allowing content to be displayed more effectively on websites and mobile devices. By converting a PDF document into HTML, developers can enhance search engine visibility, enable easier editing, and create more user-friendly experiences. In this article, we will demonstrate how to convert PDF to HTML in React with JavaScript and the Spire.PDF for JavaScript library.
- Convert PDF to HTML in React
- Customize PDF to HTML Conversion Settings in React
- Convert PDF to HTML Stream in React
Install Spire.PDF for JavaScript
To get started with converting PDF to HTML with JavaScript in a React application, you can either download Spire.PDF for JavaScript from our website or install it via npm with the following command:
npm i spire.pdf
After that, copy the "Spire.Pdf.Base.js" and "Spire.Pdf.Base.wasm" files to the public folder of your project. Additionally, include the required font files to ensure accurate and consistent text rendering.
For more details, refer to the documentation: How to Integrate Spire.PDF for JavaScript in a React Project
Convert PDF to HTML in React
The PdfDocument.SaveToFile() method offered by Spire.PDF for JavaScript allows developers to effortlessly convert a PDF file into HTML format. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Save the PDF file to HTML format using the PdfDocument.SaveToFile() method.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
// State to hold the loaded WASM module
const [wasmModule, setWasmModule] = useState(null);
// useEffect hook to load the WASM module when the component mounts
useEffect(() => {
const loadWasm = async () => {
try {
// Access the Module and spirepdf from the global window object
const { Module, spirepdf } = window;
// Set the wasmModule state when the runtime is initialized
Module.onRuntimeInitialized = () => {
setWasmModule(spirepdf);
};
} catch (err) {
// Log any errors that occur during loading
console.error('Failed to load WASM module:', err);
}
};
// Create a script element to load the WASM JavaScript file
const script = document.createElement('script');
script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
script.onload = loadWasm;
// Append the script to the document body
document.body.appendChild(script);
// Cleanup function to remove the script when the component unmounts
return () => {
document.body.removeChild(script);
};
}, []);
// Function to convert PDF to HTML
const ConvertPdfToHTML = async () => {
if (wasmModule) {
// Load the necessary font file into the virtual file system (VFS)
await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
// Load the input PDF file into the VFS
let inputFileName = 'Input.pdf';
await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
// Create a new document
const doc = wasmModule.PdfDocument.Create();
// Load the PDF file
doc.LoadFromFile(inputFileName);
// Define the output file name
const outputFileName = 'PdfToHtml.html';
// Save the document to an HTML file
doc.SaveToFile({fileName: outputFileName, fileFormat: wasmModule.FileFormat.HTML});
// Clean up resources
doc.Close();
doc.Dispose();
// Read the saved file and convert it to a Blob object
const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' });
// Create a URL for the Blob and initiate the download
const url = URL.createObjectURL(modifiedFile);
const a = document.createElement('a');
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Convert PDF to HTML in React Using JavaScript</h1>
<button onClick={ConvertPdfToHTML} disabled={!wasmModule}>
Convert
</button>
</div>
);
}
export default App;
Run the code to launch the React app at localhost:3000. Once it's running, click on the "Convert" button to convert the PDF file to HTML format:

Here is the screenshot of the input PDF file and the converted HTML file:

Customize PDF to HTML Conversion Settings in React
Developers can use the PdfDocument.ConvertOptions.SetPdfToHtmlOptions() method to customize settings during the PDF to HTML conversion process. For instance, they can choose whether to embed SVG or images in the resulting HTML and set the maximum number of pages included in each HTML file. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Customize the PDF to HTML conversion settings using the PdfDocument.ConvertOptions.SetPdfToHtmlOptions() method.
- Save the PDF document to HTML format using the PdfDocument.SaveToFile() method.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
// State to hold the loaded WASM module
const [wasmModule, setWasmModule] = useState(null);
// useEffect hook to load the WASM module when the component mounts
useEffect(() => {
const loadWasm = async () => {
try {
// Access the Module and spirepdf from the global window object
const { Module, spirepdf } = window;
// Set the wasmModule state when the runtime is initialized
Module.onRuntimeInitialized = () => {
setWasmModule(spirepdf);
};
} catch (err) {
// Log any errors that occur during loading
console.error('Failed to load WASM module:', err);
}
};
// Create a script element to load the WASM JavaScript file
const script = document.createElement('script');
script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
script.onload = loadWasm;
// Append the script to the document body
document.body.appendChild(script);
// Cleanup function to remove the script when the component unmounts
return () => {
document.body.removeChild(script);
};
}, []);
// Function to convert PDF to HTML
const ConvertPdfToHTML = async () => {
if (wasmModule) {
// Load the necessary font file into the virtual file system (VFS)
await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
// Load the input PDF file into the VFS
let inputFileName = 'Input.pdf';
await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
// Create a new document
const doc = wasmModule.PdfDocument.Create();
// Load the PDF file
doc.LoadFromFile(inputFileName);
// Customize the conversion settings
// Parameters: useEmbeddedSvg: false, useEmbeddedImg: true, maxPageOneFile: 1
doc.ConvertOptions.SetPdfToHtmlOptions(false, true, 1);
// Define the output file name
const outputFileName = 'CutomizePdfToHtmlConversion.html';
// Save the document to an HTML file
doc.SaveToFile({fileName: outputFileName, fileFormat: wasmModule.FileFormat.HTML});
// Clean up resources
doc.Close();
doc.Dispose();
// Read the saved file and convert it to a Blob object
const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' });
// Create a URL for the Blob and initiate the download
const url = URL.createObjectURL(modifiedFile);
const a = document.createElement('a');
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Convert PDF to HTML in React Using JavaScript</h1>
<button onClick={ConvertPdfToHTML} disabled={!wasmModule}>
Convert
</button>
</div>
);
}
export default App;
Convert PDF to HTML Stream in React
Spire.PDF for JavaScript also supports converting a PDF to an HTML stream using the PdfDocument.SaveToStream() method. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Create a memory stream using the wasmModule.Stream.CreateByFile() method.
- Save the PDF document as an HTML stream using the PdfDocument.SaveToStream() method.
- Write the content of the stream to an HTML file using the wasmModule.FS.writeFile() method.
- JavaScript
import React, { useState, useEffect } from 'react';
function App() {
// State to hold the loaded WASM module
const [wasmModule, setWasmModule] = useState(null);
// useEffect hook to load the WASM module when the component mounts
useEffect(() => {
const loadWasm = async () => {
try {
// Access the Module and spirepdf from the global window object
const { Module, spirepdf } = window;
// Set the wasmModule state when the runtime is initialized
Module.onRuntimeInitialized = () => {
setWasmModule(spirepdf);
};
} catch (err) {
// Log any errors that occur during loading
console.error('Failed to load WASM module:', err);
}
};
// Create a script element to load the WASM JavaScript file
const script = document.createElement('script');
script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`;
script.onload = loadWasm;
// Append the script to the document body
document.body.appendChild(script);
// Cleanup function to remove the script when the component unmounts
return () => {
document.body.removeChild(script);
};
}, []);
// Function to convert PDF to HTML
const ConvertPdfToHTML = async () => {
if (wasmModule) {
// Load the necessary font file into the virtual file system (VFS)
await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`);
// Load the input PDF file into the VFS
let inputFileName = 'Input.pdf';
await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`);
// Create a new document
const doc = wasmModule.PdfDocument.Create();
// Load the PDF file
doc.LoadFromFile(inputFileName);
// Define the output file name
const outputFileName = 'PdfToHtmlStream.html';
// Create a new memory stream
let ms = wasmModule.Stream.CreateByFile(outputFileName);
// Save the PDF document to an HTML stream
doc.SaveToStream({stream: ms, fileformat: wasmModule.FileFormat.HTML});
// Write the content of the memory stream to an HTML file
wasmModule.FS.writeFile(outputFileName, ms.ToArray());
// Clean up resources
doc.Close();
doc.Dispose();
// Read the saved file and convert it to a Blob object
const modifiedFileArray = wasmModule.FS.readFile(outputFileName);
const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' });
// Create a URL for the Blob and initiate the download
const url = URL.createObjectURL(modifiedFile);
const a = document.createElement('a');
a.href = url;
a.download = outputFileName;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
};
return (
<div style={{ textAlign: 'center', height: '300px' }}>
<h1>Convert PDF to HTML in React Using JavaScript</h1>
<button onClick={ConvertPdfToHTML} disabled={!wasmModule}>
Convert
</button>
</div>
);
}
export default App;
Get a Free License
To fully experience the capabilities of Spire.PDF for JavaScript without any evaluation limitations, you can request a free 30-day trial license.