Spire.Office Knowledgebase Page 49

Knowledgebase (2329)

Children categories

Spire.OfficeJs (5)

View items...

Python: Highlight Text in PowerPoint Presentation

2024-05-22 01:21:13 Written by Koohji

Highlighting important text in your PowerPoint slides can be an effective way to draw your audience's attention and emphasize key points. Whether you are presenting complex information or delivering a persuasive pitch, using text highlighting can make your slides more visually engaging and help your message stand out. In this article, we will demonstrate how to highlight text in a PowerPoint presentation in Python using Spire.Presentation for Python.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python. It can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.Presentation

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Presentation for Python on Windows

Highlight Text in PowerPoint Presentation in Python

Spire.Presentation for Python provides a method called IAutoShape.TextFrame.HighLightText(text: str, color: Color, options: TextHighLightingOptions) to highlight specific text within the shapes of a PowerPoint presentation.

Follow the steps below to highlight specified text in your presentation using Spire.Presentation for Python:

Create an instance of the Presentation class.
Load a PowerPoint presentation using the Presentation.LoadFromFile() method.
Create an instance of the TextHighLightingOptions class, and set the text highlighting options such as whole words only and case sensitive through the TextHighLightingOptions.WholeWordsOnly and TextHighLightingOptions.CaseSensitive properties.
Loop through the slides in the presentation and the shapes on each slide.
Check if the current shape is of IAutoShape type.
If the result is true, typecast it to an IAutoShape object.
Highlight all matches of specific text in the shape using the IAutoShape.TextFrame.HighLightText(text: str, color: Color, options: TextHighLightingOptions) method.
Save the result presentation to a new file using the Presentation.SaveToFile() method.

Python

from spire.presentation.common import *
from spire.presentation import *

# Specify the input and output file paths
input_file = "Example.pptx"
output_file = "HighlightText.pptx"

# Create an instance of the Presentation class
ppt = Presentation()
# Load the PowerPoint presentation
ppt.LoadFromFile(input_file)

# Specify the text to highlight
text_to_highlight = "Spire.Presentation"
# Specify the highlight color
highlight_color = Color.get_Yellow()

# Create an instance of the TextHighLightingOptions class
options = TextHighLightingOptions()
# Set the highlight options (case sensitivity and whole word highlighting)
options.WholeWordsOnly = True
options.CaseSensitive = True

# Loop through the slides in the presentation
for slide in ppt.Slides:
    # Loop through the shapes on each slide
    for shape in slide.Shapes:
            # Check if the shape is of IAutoShape type
            if isinstance (shape, IAutoShape):
                # Typecast the shape to an IAutoShape object
                auto_shape = IAutoShape(shape)
                # Search and highlight specified text within the shape
                auto_shape.TextFrame.HighLightText(text_to_highlight, highlight_color, options)

# Save the result presentation to a new PPTX file
ppt.SaveToFile(output_file, FileFormat.Pptx2013)
ppt.Dispose()

Python: Highlight Text in PowerPoint Presentation

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Paragraph and Text

Tagged under

ppt Python Paragraph and Text

Python: Get Coordinates of the Specified Text or Image in PDF

2024-05-21 01:58:08 Written by Administrator

Retrieving the coordinates of text or images within a PDF document can quickly locate specific elements, which is valuable for extracting content from PDFs. This capability also enables adding annotations, marks, or stamps to the desired locations in a PDF, allowing for more advanced document processing and manipulation.

In this article, you will learn how to get coordinates of the specified text or image in a PDF document using Spire.PDF for Python.

Get Coordinates of the Specified Text in PDF in Python
Get Coordinates of the Specified Image in PDF in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Coordinate System in Spire.PDF

When using Spire.PDF to process an existing PDF document, the origin of the coordinate system is located at the top left corner of the page. The X-axis extends horizontally from the origin to the right, and the Y-axis extends vertically downward from the origin (shown as below).

Python: Get Coordinates of the Specified Text or Image in PDF

Get Coordinates of the Specified Text in PDF in Python

To find the coordinates of a specific piece of text within a PDF document, you must first use the PdfTextFinder.Find() method to locate all instances of the target text on a particular page. Once you have found these instances, you can then access the PdfTextFragment.Positions property to retrieve the precise (X, Y) coordinates for each instance of the text.

The steps to get coordinates of the specified text in PDF are as follows.

Create a PdfDocument object.
Load a PDF document from a specified path.
Get a specific page from the document.
Create a PdfTextFinder object.
Specify find options through PdfTextFinder.Options property.
Search for a string within the page using PdfTextFinder.Find() method.
Get a specific instance of the search results.
Get X and Y coordinates of the text through PdfTextFragment.Positions[0].X and PdfTextFragment.Positions[0].Y properties.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Privacy Policy.pdf")

# Get a specific page
page = doc.Pages.get_Item(0)

# Create a PdfTextFinder object
textFinder = PdfTextFinder(page)

# Specify find options
findOptions = PdfTextFindOptions()
findOptions.Parameter = TextFindParameter.IgnoreCase
findOptions.Parameter = TextFindParameter.WholeWord
textFinder.Options = findOptions
 
# Search for the string "PRIVACY POLICY" within the page
findResults = textFinder.Find("PRIVACY POLICY") 

# Get the first instance of the results
result = findResults[0]

# Get X/Y coordinates of the found text
x = int(result.Positions[0].X)
y = int(result.Positions[0].Y)
print("The coordinates of the first instance of the found text are:", (x, y))

# Dispose resources
doc.Dispose()

Python: Get Coordinates of the Specified Text or Image in PDF

Get Coordinates of the Specified Image in PDF in Python

Spire.PDF for Python provides the PdfImageHelper class, which allows users to extract image details from a specific page within a PDF file. By doing so, you can leverage the PdfImageInfo.Bounds property to retrieve the (X, Y) coordinates of an individual image.

The steps to get coordinates of the specified image in PDF are as follows.

Create a PdfDocument object.
Load a PDF document from a specified path.
Get a specific page from the document.
Create a PdfImageHelper object.
Get the image information from the page using PdfImageHelper.GetImagesInfo() method.
Get X and Y coordinates of a specific image through PdfImageInfo.Bounds property.

Python

from spire.pdf.common import *
from spire.pdf import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Privacy Policy.pdf")

# Get a specific page 
page = doc.Pages.get_Item(0)

# Create a PdfImageHelper object
imageHelper = PdfImageHelper()

# Get image information from the page
imageInformation = imageHelper.GetImagesInfo(page)

# Get X/Y coordinates of a specific image
x = int(imageInformation[0].Bounds.X)
y = int(imageInformation[0].Bounds.Y)
print("The coordinates of the specified image are:", (x, y))

# Dispose resources
doc.Dispose()

Python: Get Coordinates of the Specified Text or Image in PDF

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Extract/Read

Tagged under

pdf Python Extract Read

Efficient PDF Compression in Python (A Practical Guide)

2024-05-20 01:17:35 Written by Koohji

Compress PDF in Python using Spire.PDF

Large PDF files can slow down email delivery, break upload limits, and consume unnecessary storage. This is especially common in PDFs that include high-resolution scans, images, or embedded fonts. If you're working with Python and need to automate PDF compression without compromising quality, this guide will help you get started.

In this tutorial, you’ll learn how to compress PDF files in Python using the Spire.PDF for Python library. We'll cover several effective techniques, including image recompression, font optimization, metadata removal, and batch compression—perfect for web, backend, or desktop applications.

Common Scenarios Requiring PDF Compression
Prerequisites
Practical PDF Compression Techniques in Python
Summary

Common Scenarios Requiring PDF Compression

Reducing the size of PDF documents is often essential in the following situations:

Use Case	Benefit
Email Attachments	Avoid size limits and improve deliverability
Web Uploads	Reduce upload time and server storage
Mobile Access	Faster loading and less data consumption
Cloud Archiving	Lower storage cost for backups
App Submissions	Meet strict file size limits

Prerequisites

Before you begin compressing PDFs with Python, make sure the following requirements are met:

Python 3.7 or above
Ensure that Python (version 3.7 or later) is installed on your system. You can download it from the official Python website.
Spire.PDF for Python
This is a powerful PDF library that allows you to programmatically create, manipulate, and compress PDF documents—without relying on external software like Adobe Acrobat.

To install Spire.PDF for Python, run the following command in your terminal or command prompt:

pip install spire.pdf

Need help with the installation? See our step-by-step guide: How to Install Spire.PDF for Python on Windows_

Practical PDF Compression Techniques in Python

In this section, you'll explore five practical techniques for reducing PDF file size:

Font compression and unembedding
Image compression
Full-document compression
Metadata and attachment removal
Batch compressing multiple PDFs

Font Compression and Unembedding

Fonts embedded in a PDF—especially those from large font libraries or multilingual character sets—can significantly increase the file size. Spire.PDF allows you to:

Compress embedded fonts to minimize space usage
Unembed fonts that are not essential for rendering

from spire.pdf import *

# Create a PdfCompressor object and load the PDF file
compressor = PdfCompressor("C:/Users/Administrator/Documents/Example.pdf")

# Get the OptimizationOptions object
compression_options = compressor.OptimizationOptions

# Enable font compression
compression_options.SetIsCompressFonts(True)

# Optional: unembed fonts to further reduce size
# compression_options.SetIsUnembedFonts(True)

# Compress the PDF and save the result
compressor.CompressToFile("CompressFonts.pdf")

Image Compression

Spire.PDF lets you reduce the size of all images in a PDF by creating a PdfCompressor instance, enabling the image resizing and compression options, and specifying the image quality level. This approach applies compression uniformly across the entire document.

from spire.pdf import *

# Create a PdfCompressor object and load the PDF file
compressor = PdfCompressor("C:/Users/Administrator/Documents/Example.pdf")

# Get the OptimizationOptions object
compression_options = compressor.OptimizationOptions

# Enable image resizing
compression_options.SetResizeImages(True)

# Enable image compression
compression_options.SetIsCompressImage(True)

# Set image quality (available options: Low, Medium, High)
compression_options.SetImageQuality(ImageQuality.Medium)

# Compress and save the PDF file
compressor.CompressToFile("Compressed.pdf")

Compress PDF Images in Python with Spire.PDF

Full Document Compression

Beyond optimizing individual elements, Spire.PDF also supports full-document compression. By adjusting the document's CompressionLevel and disabling incremental updates, you can apply comprehensive optimization to reduce overall file size.

from spire.pdf import *

# Create a PdfDocument object
pdf = PdfDocument()

# Load the PDF file
pdf.LoadFromFile("C:/Users/Administrator/Documents/Example.pdf")

# Disable incremental update
pdf.FileInfo.IncrementalUpdate = False

# Set the compression level to the highest
pdf.CompressionLevel = PdfCompressionLevel.Best

# Save the optimized PDF
pdf.SaveToFile("OptimizeDocumentContent.pdf")
pdf.Close()

Removing Metadata and Attachments

Cleaning up metadata and removing embedded attachments is a quick way to reduce PDF size. Spire.PDF lets you remove unnecessary information like author/title fields and attached files:

from spire.pdf import *

# Load the PDF
pdf = PdfDocument()
pdf.LoadFromFile("Example.pdf")

# Disable the incremental update
pdf.FileInfo.IncrementalUpdate = False

# Remove metadata
pdf.DocumentInformation.Author = ""
pdf.DocumentInformation.Title = ""

# Remove attachments
pdf.Attachments.Clear()

# Save the optimized PDF
pdf.SaveToFile("Cleaned.pdf")
pdf.Close()

Batch Compressing Multiple PDFs

You can compress multiple PDFs at once by looping through files in a folder and applying the same optimization settings:

import os
from spire.pdf import *

# Folder containing the PDF files to compress
input_folder = "C:/PDFs/" 

# Loop through all files in the input folder
for file in os.listdir(input_folder):
    # Process only PDF files
    if file.endswith(".pdf"):  
        # Create a PdfCompressor instance and load the file
        compressor = PdfCompressor(os.path.join(input_folder, file))
        
        # Access compression options
        opt = compressor.OptimizationOptions        
        # Enable image resizing
        opt.SetResizeImages(True)        
        # Enable image compression
        opt.SetIsCompressImage(True)        
        # Set image quality to medium (options: Low, Medium, High)
        opt.SetImageQuality(ImageQuality.Medium)
        
        # Define output file path with "compressed_" prefix
        output_path = os.path.join(input_folder, "compressed_" + file)        
        # Perform compression and save the result
        compressor.CompressToFile(output_path)

Summary

Reducing the size of PDF files is a practical step toward faster workflows, especially when dealing with email sharing, web uploads, and large-scale archiving. With Spire.PDF for Python, developers can implement smart compression techniques—ranging from optimizing images and fonts to stripping unnecessary elements like metadata and attachments.

Whether you're building automation scripts, integrating PDF handling into backend services, or preparing documents for long-term storage, these tools give you the flexibility to control file size without losing visual quality. By combining multiple strategies—like full-document compression and batch processing—you can keep your PDFs lightweight, efficient, and ready for distribution across platforms.

Want to explore more ways to work with PDFs in Python? Explore the full range of Spire.PDF for Python tutorials to learn how to merge/split PDFs, convert PDF to PDF/A, add password protection, and more.

Frequently Asked Questions

Q1: Can I use Spire.PDF for Python on Linux or macOS?

A1: Yes. Spire.PDF for Python is compatible with Windows, Linux, and macOS.

Q2: Is Spire.PDF for Python free?

A2: Spire.PDF for Python offers a free version suitable for small-scale and non-commercial use. For full functionality, including unrestricted use in commercial applications, a commercial version is available. You can request a free 30-day trial license to explore all its premium features.

Q3: Will compressing the PDF reduce the visual quality?

A3: Not necessarily. Spire.PDF’s compression methods are designed to preserve visual fidelity while optimizing file size. You can fine-tune image quality or leave it to the default settings.

Published in Document Operation

Tagged under

pdf Python Document Operation

News Category

Knowledgebase (2329)

Children categories

Install Spire.Presentation for Python

Highlight Text in PowerPoint Presentation in Python

Apply for a Temporary License

Install Spire.PDF for Python

Coordinate System in Spire.PDF

Get Coordinates of the Specified Text in PDF in Python

Get Coordinates of the Specified Image in PDF in Python

Apply for a Temporary License

Table of Contents

Common Scenarios Requiring PDF Compression

Prerequisites

Practical PDF Compression Techniques in Python

Font Compression and Unembedding

Image Compression

Full Document Compression

Removing Metadata and Attachments

Batch Compressing Multiple PDFs

Summary

Frequently Asked Questions

Q1: Can I use Spire.PDF for Python on Linux or macOS?

Q2: Is Spire.PDF for Python free?

Q3: Will compressing the PDF reduce the visual quality?

More...