Spire.Office Knowledgebase Page 64

Knowledgebase (2330)

Children categories

Spire.OfficeJs (6)

View items...

Python: Convert RTF to PDF, HTML

2023-12-25 01:23:16 Written by Koohji

RTF (Rich Text Format) is a versatile file format that can be opened and viewed by various word processing software. It supports a wide range of text formatting options, such as font style, size, color, tables, images, and more. When working with RTF files, you may sometimes need to convert them to PDF files for better sharing and printing, or to HTML format for publishing on the web. In this article, you will learn how to convert RTF to PDF or HTML with Python using Spire.Doc for Python.

Convert RTF to PDF in Python
Convert RTF to HTML in Python

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

Package Manager

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Convert RTF to PDF in Python

To convert an RTF file to PDF, simply load a file with .rtf extension and then save it as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method. The following are the detailed steps.

Create a Document object.
Load an RTF file using Document.LoadFromFile() method.
Save the RTF file as a PDF file using Document.SaveToFile(fileName, FileFormat.PDF) method.

Python

from spire.doc import *
from spire.doc.common import *

inputFile = "input.rtf"
outputFile = "RtfToPDF.pdf"

# Create a Document object
doc = Document()

# Load an RTF file from disk
doc.LoadFromFile(inputFile)

# Save the RTF file as a PDF file
doc.SaveToFile(outputFile, FileFormat.PDF)
doc.Close()

Python: Convert RTF to PDF, HTML

Convert RTF to HTML in Python

Spire.Doc for Python also allows you to use the Document.SaveToFile(fileName, FileFormat.Html) method to convert the loaded RTF file to HTML format. The following are the detailed steps.

Create a Document object.
Load an RTF file using Document.LoadFromFile() method.
Save the RTF file in HTML format using Document.SaveToFile(fileName, FileFormat.Html) method.

Python

from spire.doc import *
from spire.doc.common import *

inputFile = "input.rtf"
outputFile = "RtfToHtml.html"
               
# Create a Document object
doc = Document()

# Load an RTF file from disk
doc.LoadFromFile(inputFile)

# Save the RTF file in HTML format
doc.SaveToFile(outputFile, FileFormat.Html)
doc.Close()

Python: Convert RTF to PDF, HTML

Get a Free License

To fully experience the capabilities of Spire.Doc for Python without any evaluation limitations, you can request a free 30-day trial license.

Published in Conversion

Tagged under

doc Python Conversion

Python: Convert PDF to PDF/A and Vice Versa

2023-12-22 01:04:24 Written by Koohji

PDF/A is a specialized format designed specifically for long-term archiving and preservation of electronic documents. It guarantees that the content, structure, and visual appearance of the documents remain unchanged over time. By converting PDF files to PDF/A format, you ensure the long-term accessibility of the documents, regardless of software, operating systems, or future technological advancements. Conversely, converting PDF/A files to standard PDF format makes it easier to edit, share, and collaborate on the documents, ensuring better compatibility across different applications, devices, and platforms. In this article, we will explain how to convert PDF to PDF/A and vice versa in Python using Spire.PDF for Python.

Convert PDF to PDF/A in Python
Convert PDF/A to PDF in Python

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

Package Manager

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Convert PDF to PDF/A in Python

The PdfStandardsConverter class provided by Spire.PDF for Python supports converting PDF to various PDF/A formats, including PDF/A-1a, 2a, 3a, 1b, 2b and 3b. Moreover, it also supports converting PDF to PDF/X-1a:2001. The detailed steps are as follows.

Specify the input file path and output folder.
Create a PdfStandardsConverter object and pass the input file path to the constructor of the class as a parameter.
Convert the input file to a Pdf/A-1a conformance file using PdfStandardsConverter.ToPdfA1A() method.
Convert the input file to a Pdf/A-1b file using PdfStandardsConverter.ToPdfA1B() method.
Convert the input file to a Pdf/A-2a file using PdfStandardsConverter.ToPdfA2A() method.
Convert the input file to a Pdf/A-2b file using PdfStandardsConverter.ToPdfA2B() method.
Convert the input file to a Pdf/A-3a file using PdfStandardsConverter.ToPdfA3A() method.
Convert the input file to a Pdf/A-3b file using PdfStandardsConverter.ToPdfA3B() method.
Convert the input file to a PDF/X-1a:2001 file using PdfStandardsConverter.ToPdfX1A2001() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Specify the input file path and output folder
inputFile = "Sample.pdf"
outputFolder = "Output/"

# Create an object of the PdfStandardsConverter class
converter = PdfStandardsConverter(inputFile)

# Convert the input file to PdfA1A
converter.ToPdfA1A(outputFolder + "ToPdfA1A.pdf")

# Convert the input file to PdfA1B
converter.ToPdfA1B(outputFolder + "ToPdfA1B.pdf")

# Convert the input file to PdfA2A
converter.ToPdfA2A(outputFolder + "ToPdfA2A.pdf")

# Convert the input file to PdfA2B
converter.ToPdfA2B(outputFolder + "ToPdfA2B.pdf")

# Convert the input file to PdfA3A
converter.ToPdfA3A(outputFolder + "ToPdfA3A.pdf")

# Convert the input file to PdfA3B
converter.ToPdfA3B(outputFolder + "ToPdfA3B.pdf")

# Convert the input file to PDF/X-1a:2001
converter.ToPdfX1A2001(outputFolder + "ToPdfX1a.pdf")

Python: Convert PDF to PDF/A and Vice Versa

Convert PDF/A to PDF in Python

To convert a PDF/A file back to a standard PDF format, you need to create a new standard PDF file, and then draw the page content of the PDF/A file to the newly created PDF file. The detailed steps are as follows.

Create a PdfDocument object.
Load a PDF/A file using PdfDocument.LoadFromFile() method.
Create a PdfNewDocument object and set its compression level as none.
Loop through the pages in the original PDF/A file.
Add pages to the newly created PDF using PdfDocumentBase.Pages.Add() method.
Draw the page content of the original PDF/A file to the corresponding pages of the newly created PDF using PdfPageBase.CreateTemplate.Draw() method.
Create a Stream object and then save the new PDF to the stream using PdfNewDocument.Save() method.

Python

from spire.pdf.common import *
from spire.pdf import *

# Specify the input and output file paths
inputFile = "Output/ToPdfA1A.pdf"
outputFile = "PdfAToPdf.pdf"

# Create an object of the PdfDocument class
doc = PdfDocument()
# Load a PDF file
doc.LoadFromFile(inputFile)

# Create a new standard PDF file
newDoc = PdfNewDocument()
newDoc.CompressionLevel = PdfCompressionLevel.none

# Add pages to the newly created PDF and draw the page content of the loaded PDF onto the corresponding pages of the newly created PDF
for i in range(doc.Pages.Count):
    page = doc.Pages.get_Item(i)
    size = page.Size
    p = newDoc.Pages.Add(size, PdfMargins(0.0))
    page.CreateTemplate().Draw(p, 0.0, 0.0)   

# Save the new PDF to a PDF file   
fileStream = Stream(outputFile)
newDoc.Save(fileStream)
fileStream.Close()
newDoc.Close(True)

Python: Convert PDF to PDF/A and Vice Versa

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Conversion

Tagged under

pdf Python Conversion

How to Scan and Recognize Text from Images in Java Projects

2023-12-21 03:17:23 Written by Koohji

OCR (Optical Character Recognition) technology is the primary method to extract text from images. Spire.OCR for Java provides developers with a quick and efficient solution to scan and extract text from images in Java projects. This article will guide you on how to use Spire.OCR for Java to recognize and extract text from images in Java projects.

Obtaining Spire.OCR for Java

To scan and recognize text in images using Spire.OCR for Java, you need to first import the Spire.OCR.jar file along with other relevant dependencies into your Java project.

You can download Spire.OCR for Java from our website. If you are using Maven, you can add the following code to your project's pom.xml file to import the JAR file into your application.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.ocr</artifactId>
        <version>2.1.5</version>
    </dependency>
</dependencies>

Please download the other dependencies based on your operating system:

Linux

Windows x64

Install Dependencies

Step 1: Create a Java project in IntelliJ IDEA.

How to Scan and Recognize Text from Images in Java Projects

Step 2: Go to File > Project Structure > Modules > Dependencies in the menu and add Spire.OCR.jar as a project dependency.

How to Scan and Recognize Text from Images in Java Projects

Step 3: Download and extract the other dependency files. Copy all the files from the extracted "dependencies" folder to your project directory.

How to Scan and Recognize Text from Images in Java Projects

Scanning and Recognizing Text from a Local Image

Java

import com.spire.ocr.OcrScanner;
import java.io.*;

public class ScanLocalImage {
    public static void main(String[] args) throws Exception {
        // Specify the path to the dependency files
        String dependencies = "dependencies/";
        // Specify the path to the image file to be scanned
        String imageFile = "data/Sample.png";
        // Specify the path to the output file
        String outputFile = "ScanLocalImage_out.txt";
        
        // Create an OcrScanner object
        OcrScanner scanner = new OcrScanner();
        // Set the dependency file path for the OcrScanner object
        scanner.setDependencies(dependencies);
        
        // Use the OcrScanner object to scan the specified image file
        scanner.scan(imageFile);
        
        // Get the scanned text content
        String scannedText = scanner.getText().toString();
        
        // Create an output file object
        File output = new File(outputFile);
        // If the output file already exists, delete it
        if (output.exists()) {
            output.delete();
        }
        // Create a BufferedWriter object to write content to the output file
        BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile));
        // Write the scanned text content to the output file
        writer.write(scannedText);
        // Close the BufferedWriter object to release resources
        writer.close();
    }
}

Specify the Language File to Scan and Recognize Text from an Image

Java

import com.spire.ocr.OcrScanner;
import java.io.*;

public class ScanImageWithLanguageSelection {
    public static void main(String[] args) throws Exception {
        // Specify the path to the dependency files
        String dependencies = "dependencies/";
        // Specify the path to the language file
        String languageFile = "data/japandata";
        // Specify the path to the image file to be scanned
        String imageFile = "data/JapaneseSample.png";
        // Specify the path to the output file
        String outputFile = "ScanImageWithLanguageSelection_out.txt";
        
        // Create an OcrScanner object
        OcrScanner scanner = new OcrScanner();
        // Set the dependency file path for the OcrScanner object
        scanner.setDependencies(dependencies);
        // Load the specified language file
        scanner.loadLanguageFile(languageFile);
        
        // Use the OcrScanner object to scan the specified image file
        scanner.scan(imageFile);
        // Get the scanned text content
        String scannedText = scanner.getText().toString();

        // Create an output file object
        File output = new File(outputFile);
        // If the output file already exists, delete it
        if (output.exists()) {
            output.delete();
        }

        // Create a BufferedWriter object to write content to the output file
        BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile));
        // Write the scanned text content to the output file
        writer.write(scannedText);
        // Close the BufferedWriter object to release resources
        writer.close();
    }
}

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Published in Recognize Text

Tagged under

ocr java

News Category

Knowledgebase (2330)

Children categories

Purchase (7)

Licensing (7)

Benchmark (1)

Java (482)

.NET (1323)

Cloud (13)

CPP (76)

Python (359)

AI (4)

JavaScript (51)

Spire.OfficeJs (6)

Python: Convert RTF to PDF, HTML

Install Spire.Doc for Python

Convert RTF to PDF in Python

Convert RTF to HTML in Python

Get a Free License

Python: Convert PDF to PDF/A and Vice Versa

Install Spire.PDF for Python

Convert PDF to PDF/A in Python

Convert PDF/A to PDF in Python

Apply for a Temporary License

How to Scan and Recognize Text from Images in Java Projects

Obtaining Spire.OCR for Java

Install Dependencies

Scanning and Recognizing Text from a Local Image

Specify the Language File to Scan and Recognize Text from an Image

Apply for a Temporary License

More...

Convert PDF to HTML Using Python: Complete Developer’s Guide

Python: Convert Word to RTF and Vice Versa

Python: Convert PowerPoint to HTML

Python: Change or Get PDF Page Size