Spire.Office Knowledgebase Page 10 | E-iceblue

Java Convert Byte Array to PDF

In modern Java applications, PDF data is not always stored as files on disk. Instead, it may be transmitted over a network, returned by a REST API, or stored as a byte array in a database. In such cases, you’ll often need to convert a byte array back into a PDF file or even generate a new PDF from plain text bytes.

This tutorial will walk you through both scenarios using Spire.PDF for Java, a powerful library for working with PDF documents.

Table of Contents:

Getting Started with Spire.PDF for Java

Spire.PDF is a powerful and feature-rich API that allows Java developers to create, read, edit, convert, and print PDF documents without any dependencies on Adobe Acrobat.

Key Features:

To get started, download Spire.PDF for Java from our website and add the JAR files to your project's build path. If you’re using Maven, include the following dependency in your pom.xml.

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>12.4.4</version>
    </dependency>
</dependencies>

Once set up, you can now proceed to convert byte arrays to PDFs and perform other PDF-related operations.

Understanding PDF Bytes vs. Text Bytes

Before coding, it’s important to distinguish between two very different kinds of byte arrays :

  • PDF File Bytes : These represent the actual binary structure of a valid PDF document. They always start with %PDF-1.x and contain objects, cross-reference tables, and streams. Such byte arrays can be loaded directly into a PdfDocument.
  • Text Bytes : These are simply ASCII or UTF-8 encodings of characters. For example,
byte[] bytes = {84, 104, 105, 115};  
System.out.println(new String(bytes)); // Output: "This"

Such arrays are not valid PDFs, but you can create a new PDF and write the text into it.

Loading PDF from Byte Array in Java

Suppose you want to download a PDF from a URL and work with it in memory as a byte array. With Spire.PDF for Java, you can easily load and save it back as a PDF document.

import com.spire.pdf.PdfDocument;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class LoadPdfFromByteArray throws Exception{

    public static void main(String[] args) {
        
        // The PDF URL
        String fileUrl = "https://www.e-iceblue.com/resource/sample.pdf";

        // Download PDF into a byte array
        byte[] pdfBytes = downloadPdfAsBytes(fileUrl);

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load PDF from byte array
        doc.loadFromStream(new ByteArrayInputStream(pdfBytes));

        // Save the document locally
        doc.saveToFile("downloaded.pdf");
        doc.close();
    }

    // Helper method: download file as byte[]
    private static byte[] downloadPdfAsBytes(String fileUrl) throws Exception {
        URL url = new URL(fileUrl);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setRequestMethod("GET");

        InputStream inputStream = conn.getInputStream();
        ByteArrayOutputStream buffer = new ByteArrayOutputStream();

        byte[] data = new byte[4096];
        int nRead;
        while ((nRead = inputStream.read(data, 0, data.length)) != -1) {
            buffer.write(data, 0, nRead);
        }

        buffer.flush();
        inputStream.close();
        conn.disconnect();

        return buffer.toByteArray();
    }
}

How this works

  1. Make an HTTP request to fetch the PDF file.
  2. Convert the InputStream into a byte array using ByteArrayOutputStream .
  3. Pass the byte array into Spire.PDF via loadFromStream .
  4. Save or manipulate the document as needed.

Output:

Java Load PDF from Byte Array

Creating PDF from Text Bytes in Java

If you only have plain text bytes (e.g., This document is created from text bytes.), you can decode them into a string and then draw the text onto a new PDF document.

import com.spire.pdf.*;
import com.spire.pdf.graphics.*;
import java.awt.*;

public class TextFromBytesToPdf {
    public static void main(String[] args) {

        // Your text bytes
        byte[] byteArray = {
                84, 104, 105, 115, 32,
                100, 111, 99, 117, 109, 101, 110, 116, 32,
                105, 115, 32,
                99, 114, 101, 97, 116, 101, 100, 32,
                102, 114, 111, 109, 32,
                116, 101, 120, 116, 32,
                98, 121, 116, 101, 115, 46
        };
        String text = new String(byteArray);

        // Create a PDF document
        PdfDocument doc = new PdfDocument();

        // Configure the page settings
        doc.getPageSettings().setSize(PdfPageSize.A4);
        doc.getPageSettings().setMargins(40f);

        // Add a page
        PdfPageBase page = doc.getPages().add();

        // Draw the string onto PDF
        PdfFont font = new PdfFont(PdfFontFamily.Helvetica, 20f);
        PdfSolidBrush brush = new PdfSolidBrush(new PdfRGBColor(Color.black));
        page.getCanvas().drawString(text, font, brush, 20, 40);

        // Save the document to a PDF file
        doc.saveToFile("TextBytes.pdf");
        doc.close();
    }
}

This will produce a new PDF named TextBytes.pdf (shown below) containing the sentence represented by your byte array.

Java Create PDF from Text Bytes

You might be interested in: How to Generate PDF Documents in Java

Common Pitfalls to Avoid

When converting byte arrays to PDFs, watch out for these issues:

  • Confusing plain text with PDF bytes

Not every byte array is a valid PDF. Unless the array starts with %PDF-1.x and contains the proper structure, you can’t load it directly with PdfDocument.loadFromStream .

  • Incorrect encoding

If your text bytes are in UTF-16, ISO-8859-1, or another encoding, you need to specify the charset when creating a string:

String text = new String(byteArray, StandardCharsets.UTF_8);
  • Large byte arrays

When dealing with large PDFs, consider streaming instead of holding everything in memory to avoid OutOfMemoryError .

  • Forgetting to close documents

Always call doc.close() to release resources after saving or processing a PDF.

Frequently Asked Questions

Q1. Can I store a PDF as a byte array in a database?

Yes. You can store a PDF as a BLOB in a relational database. Later, you can retrieve it, load it into a PdfDocument , and save or manipulate it.

Q2. How do I check if a byte array is a valid PDF?

Check if the array begins with the %PDF- header. You can do:

String header = new String(Arrays.copyOfRange(bytes, 0, 5));
if (header.startsWith("%PDF-")) {
    // valid PDF
}

Q3. Can Spire.PDF load a PDF directly from an InputStream?

Yes. Instead of converting to a byte array, you can pass the InputStream directly to loadFromStream() .

Q4. Can I convert a PdfDocument back into a byte array?

You can save the document into a ByteArrayOutputStream instead of a file:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.saveToStream(baos);
byte[] pdfBytes = baos.toByteArray();

Q5. What if my byte array contains images instead of text or PDF?

In that case, you’ll need to create a new PDF and insert the image using Spire.PDF’s drawing APIs.

Conclusion

In this article, we explored how to efficiently convert byte arrays to PDF documents using Spire.PDF for Java. Whether you're loading existing PDF files from byte arrays retrieved via APIs or creating new PDFs from plain text bytes, Spire.PDF provides a robust solution to meet your needs.

We covered essential concepts, including the distinction between PDF file bytes and text bytes, and highlighted common pitfalls to avoid during the conversion process. With the right understanding and tools, you can seamlessly integrate PDF functionalities into your Java applications, enhancing your ability to manage and manipulate document data.

For further exploration, consider experimenting with additional features of Spire.PDF, such as editing, encrypting, and converting PDFs to other formats. The possibilities are extensive, and mastering these techniques will undoubtedly improve your development skills and project outcomes.

Scan QR codes and barcodes in ASP.NET Core using C# and Spire.Barcode

Many business applications today need the ability to scan barcodes and QR codes in ASP.NET environments. From ticket validation and payment processing to inventory management, an ASP.NET QR code scanner or barcode reading feature can greatly improve efficiency and accuracy for both web and enterprise systems.

This tutorial demonstrates how to build a complete solution to scan barcodes in ASP.NET with C# code using Spire.Barcode for .NET. We’ll create an ASP.NET Core web application that can read both QR codes and various barcode formats from uploaded images, delivering high recognition accuracy and easy integration into existing projects.

Guide Overview


1. Project Setup

Step 1: Create the Project

Create a new ASP.NET Core Razor Pages project, which will serve as the foundation for the scanning feature. Use the following command to create a new project or manually configure it in Visual Studio:

dotnet new webapp -n QrBarcodeScanner
cd QrBarcodeScanner

Step 2: Install Spire.Barcode for .NET

Install the Spire.Barcode for .NET NuGet package, which supports decoding a wide range of barcode types with a straightforward API. Search for the package in the NuGet Package Manager or use the command below to install it:

dotnet add package Spire.Barcode

Spire.Barcode for .NET offers built-in support for both QR codes and multiple barcode formats such as Code128, EAN-13, and Code39, making it suitable for ASP.NET Core integration without requiring additional image processing libraries. To find out all the supported barcode types, refer to the BarcodeType API reference.

You can also use Free Spire.Barcode for .NET for smaller projects.


2. Implementing QR Code and Barcode Scanning Feature with C# in ASP.NET

A reliable scanning feature involves two main parts:

  1. Backend logic that processes and decodes uploaded images.
  2. A simple web interface that lets users upload files for scanning.

We will first focus on the backend implementation to ensure the scanning process works correctly, then connect it to a minimal Razor Page frontend.

Backend: QR & Barcode Scanning Logic with Spire.Barcode

The backend code reads the uploaded file into memory and processes it with Spire.Barcode, using either a memory stream or a file path. The scanned result is then returned. This implementation supports QR codes and other barcode types without requiring format-specific logic.

Index.cshtml.cs

using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Mvc.RazorPages;
using Spire.Barcode;

public class IndexModel : PageModel
{
    [BindProperty]
    public IFormFile Upload { get; set; }  // Uploaded file

    public string Result { get; set; }     // Scanning result
    public string UploadedImageBase64 { get; set; } // Base64 string for preview

    public void OnPost()
    {
        if (Upload != null && Upload.Length > 0)
        {
            using (var ms = new MemoryStream())
            {
                // Read the uploaded file into memory
                Upload.CopyTo(ms);

                // Convert the image to Base64 for displaying in HTML <img>
                UploadedImageBase64 = "data:" + Upload.ContentType + ";base64," +
                                      Convert.ToBase64String(ms.ToArray());

                // Reset the stream position before scanning
                ms.Position = 0;

                // Scan the barcode or QR code from the stream
                try
                {
                    string[] scanned = BarcodeScanner.Scan(ms);
                    // Return the scanned result
                    Result = scanned != null && scanned.Length > 0
                        ? string.Join(", ", scanned)
                        : "No code detected.";
                }
                catch (Exception ex)
                {
                    Result = "Error while scanning: " + ex.Message;
                }
            }
        }
    }
}

Explanation of Key Classes and Methods

  • BarcodeScanner: A static class in Spire.Barcode that decodes images containing QR codes or barcodes.
  • BarcodeScanner.Scan(Stream imageStream): Scans an uploaded image directly from a memory stream and returns an array of decoded strings. This method scans all barcodes in the given image.
  • Supplementary methods (optional):
    • BarcodeScanner.Scan(string imagePath): Scans an image from a file path.
    • BarcodeScanner.ScanInfo(string imagePath): Scans an image from a file path and returns additional barcode information such as type, location, and data.

These methods can be used in different ways, depending on the application requirements.

Frontend: QR & Barcode Upload & Scanning Result Interface

The following page design provides a simple upload form where users can submit an image containing a QR code or barcode. Once uploaded, the image is displayed along with the recognized result, which can be copied with a single click. The layout is intentionally kept minimal for fast testing, yet styled for a clear and polished presentation.

Index.cshtml

@page
@model IndexModel
@{
    ViewData["Title"] = "QR & Barcode Scanner";
}

<div style="max-width:420px;margin:40px auto;padding:20px;border:1px solid #ccc;border-radius:8px;background:#f9f9f9;">
    <h2>QR & Barcode Scanner</h2>
    <form method="post" enctype="multipart/form-data" id="uploadForm">
        <input type="file" name="upload" accept="image/*" required onchange="this.form.submit()" style="margin:10px 0;" />
    </form>

    @if (!string.IsNullOrEmpty(Model.UploadedImageBase64))
    {
        <div style="margin-top:15px;text-align:center;">
            <img src="/@Model.UploadedImageBase64" style="width:300px;height:300px;object-fit:contain;border:1px solid #ddd;background:#fff;" />
        </div>
    }

    @if (!string.IsNullOrEmpty(Model.Result))
    {
        <div style="margin-top:15px;padding:10px;background:#e8f5e9;border-radius:6px;">
            <b>Scan Result:</b>
            <p id="scanText">@Model.Result</p>
            <button type="button" onclick="navigator.clipboard.writeText(scanText.innerText)" style="background:#28a745;color:#fff;padding:6px 10px;border:none;border-radius:4px;">Copy</button>
        </div>
    }
</div>

Below is a screenshot showing the scan page after successfully recognizing both a QR code and a Code128 barcode, with the results displayed and a one-click copy button available.

ASP.NET Core QR code and Code128 barcode scan page with recognized results and copy button

This ASP.NET Core application can scan QR codes and other barcodes from uploaded images. If you're looking to generate QR codes or barcodes, check out How to Generate QR Codes in ASP.NET Core.


3. Testing and Troubleshooting

After running the application, test the scanning feature with:

  • A QR code image containing a URL or plain text.
  • A barcode image such as Code128 or EAN-13.

If recognition fails:

  • Ensure the image has good contrast and minimal distortion.
  • Use images of reasonable resolution (not excessively large or pixelated).
  • Test with different file formats such as JPG, PNG, or BMP.
  • Avoid images with reflections, glare, or low lighting.
  • When scanning multiple barcodes in one image, ensure each code is clearly separated to improve recognition accuracy.

A good practice is to maintain a small library of sample QR codes and barcodes to test regularly after making code changes.


4. Extending to Other .NET Applications

The barcode scanning logic in this tutorial works the same way across different .NET application types — only the way you supply the image file changes. This makes it easy to reuse the core decoding method, BarcodeScanner.Scan(), in various environments such as:

  • ASP.NET Core MVC controllers or Web API endpoints
  • Desktop applications like WinForms or WPF
  • Console utilities for batch processing

Example: Minimal ASP.NET Core Web API Endpoint — receives an image file via HTTP POST and returns decoded results as JSON:

[ApiController]
[Route("api/[controller]")]
public class ScanController : ControllerBase
{
    [HttpPost]
    public IActionResult Scan(IFormFile file)
    {
        if (file == null) return BadRequest("No file uploaded");
        using var ms = new MemoryStream();
        file.CopyTo(ms);
        ms.Position = 0;
        string[] results = BarcodeScanner.Scan(ms);
        return Ok(results);
    }
}

Example: Console application — scans a local image file and prints the decoded text:

string[] result = BarcodeScanner.Scan(@"C:\path\to\image.png");
Console.WriteLine(string.Join(", ", result));

This flexibility makes it simple for developers to quickly add QR code and barcode scanning to new projects or extend existing .NET applications.


5. Conclusion

This tutorial has shown how to implement a complete QR code and barcode scanning solution in ASP.NET Core using Spire.Barcode for .NET. From receiving uploaded images to decoding and displaying the results, the process is straightforward and adaptable to a variety of application types. With this approach, developers can quickly integrate reliable scanning functionality into e-commerce platforms, ticketing systems, document verification tools, and other business-critical web applications.

For more advanced scenarios, Spire.Barcode for .NET provides additional features such as customizing the recognition process, handling multiple image formats and barcode types, and more. Apply for a free trial license to unlock all the advanced features.

Download Spire.Barcode for .NET today and start building your own ASP.NET barcode scanning solution.

Perform OCR on Scanned PDFs in C#

Optical Character Recognition (OCR) technology has become essential for developers working with scanned documents and image-based PDFs. In this tutorial, you learn how to perform OCR on PDFs in C# to extract text from scanned documents or images within a PDF using the Spire.PDF for .NET and Spire.OCR for .NET libraries. By transferring scanned PDFs into editable and searchable formats, you can significantly improve your document management processes.

Table of Contents :

Why OCR is Needed for Scanned PDFs?

Scanned PDFs are essentially image files —they contain pictures of text rather than actual selectable and searchable text content. When you scan a paper document or receive an image-based PDF, the text exists only as pixels , making it impossible to edit, search, or extract. This creates significant limitations for businesses and individuals who need to work with these documents digitally.

OCR technology solves this problem by analyzing the shapes of letters and numbers in scanned images and converting them into machine-readable text. This process transforms static PDFs into usable, searchable, and editable documents—enabling text extraction, keyword searches, and seamless integration with databases and workflow automation tools.

In fields such as legal, healthcare, and education, where large volumes of scanned documents are common, OCR plays a crucial role in document digitization, making important data easily accessible and actionable.

Setting Up: Installing Required Libraries

Before we dive into the code, let's first set up our development environment with the necessary components: Spire.PDF and Spire.OCR . Spire.PDF handles PDF operations, while Spire.OCR performs the actual text recognition.

Step 1. Install Spire.PDF and Spire.OCR via NuGet

To begin, open the NuGet Package Manager in Visual Studio, and search for "Spire.PDF" and "Spire.OCR" to install them in your project. Alternatively, you can use the Package Manager Console :

Install-Package Spire.PDF
Install-Package Spire.OCR

Step 2. Download OCR Models:

Spire.OCR requires pre-trained language models for text recognition. Download the appropriate model files for your operating system (Windows, Linux, or MacOS) and extract them to a directory (e.g., D:\win-x64).

Important Note : Ensure your project targets x64 platform (Project Properties > Build > Platform target) as Spire.OCR only supports 64-bit systems.

Set platform target to x64.

Performing OCR on Scanned PDFs in C#

With the necessary libraries installed, we can now perform OCR on scanned PDFs. Below is a sample code snippet demonstrating this process.

using Spire.OCR;
using Spire.Pdf;
using Spire.Pdf.Graphics;
using System.Drawing;

namespace OCRPDF
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of the OcrScanner class
            OcrScanner scanner = new OcrScanner();

            // Configure the scanner
            ConfigureOptions configureOptions = new ConfigureOptions
            {
                ModelPath = @"D:\win-x64", // Set model path
                Language = "English"        // Set language
            };

            // Apply the configuration options
            scanner.ConfigureDependencies(configureOptions);

            // Load a PDF document
            PdfDocument doc = new PdfDocument();
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Input5.pdf");

            // Iterate through all pages
            for (int i = 0; i < doc.Pages.Count; i++)
            {
                // Convert page to image
                Image image = doc.SaveAsImage(i, PdfImageType.Bitmap);

                // Convert the image to a MemoryStream
                using (MemoryStream stream = new MemoryStream())
                {
                    image.Save(stream, System.Drawing.Imaging.ImageFormat.Png);
                    stream.Position = 0; // Reset the stream position

                    // Perform OCR on the image stream
                    scanner.Scan(stream, OCRImageFormat.Png);
                    string pageText = scanner.Text.ToString();

                    // Save extracted text to a separate file
                    string outputTxtPath = Path.Combine(@"C:\Users\Administrator\Desktop\Output", $"Page-{i + 1}.txt");
                    File.WriteAllText(outputTxtPath, pageText);
                }
            }

            // Close the document
            doc.Close();
        }
    }
}

Key Components Explained :

  1. OcrScanner Class : This class is crucial for performing OCR. It provides methods to configure and execute the scanning operation.
  2. ConfigureOptions Class : This class is used to set up the OCR scanner's configurations. The ModelPath property specifies the path to the OCR model files, and the Language property allows you to specify the language for text recognition.
  3. PdfDocument Class : This class represents the PDF document. The LoadFromFile method loads the PDF file that you want to process.
  4. Image Conversion : Each PDF page is converted to an image using the SaveAsImage method. This is essential because OCR works on image files.
  5. MemoryStream : The image is saved into a MemoryStream , allowing us to perform OCR without saving the image to disk.
  6. OCR Processing : The Scan method performs OCR on the image stream. The recognized text can be accessed using the Text property of the OcrScanner instance.
  7. Output : The extracted text is saved to a text file for each page.

Output :

Perform OCR on a PDF document in C#

To extract text from searchable PDFs, refer to this guide: Automate PDF Text Extraction Using C#

Extracting Text from Images within PDFs in C#

In addition to processing entire PDF pages, you can also extract text from images embedded within PDFs. Here’s how:

using Spire.OCR;
using Spire.Pdf;
using Spire.Pdf.Graphics;
using System.Drawing;

namespace OCRPDF
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create an instance of the OcrScanner class
            OcrScanner scanner = new OcrScanner();

            // Configure the scanner
            ConfigureOptions configureOptions = new ConfigureOptions
            {
                ModelPath = @"D:\win-x64", // Set model path
                Language = "English"        // Set language
            };

            // Apply the configuration options
            scanner.ConfigureDependencies(configureOptions);

            // Load a PDF document
            PdfDocument doc = new PdfDocument();
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Input5.pdf");

            // Iterate through all pages
            for (int i = 0; i < doc.Pages.Count; i++)
            {
                // Convert page to image
                Image image = doc.SaveAsImage(i, PdfImageType.Bitmap);

                // Convert the image to a MemoryStream
                using (MemoryStream stream = new MemoryStream())
                {
                    image.Save(stream, System.Drawing.Imaging.ImageFormat.Png);
                    stream.Position = 0; // Reset the stream position

                    // Perform OCR on the image stream
                    scanner.Scan(stream, OCRImageFormat.Png);
                    string pageText = scanner.Text.ToString();

                    // Save extracted text to a separate file
                    string outputTxtPath = Path.Combine(@"C:\Users\Administrator\Desktop\Output", $"Page-{i + 1}.txt");
                    File.WriteAllText(outputTxtPath, pageText);
                }
            }

            // Close the document
            doc.Close();
        }
    }
}

Key Components Explained :

  1. PdfImageHelper Class : This class is essential for extracting images from a PDF page. It provides methods to retrieve image information such as GetImagesInfo , which returns an array of PdfImageInfo objects.
  2. PdfImageInfo Class : Each PdfImageInfo object contains properties related to an image, including the actual Image object that can be processed further.
  3. Image Processing : Similar to the previous example, each image is saved to a MemoryStream for OCR processing.
  4. Output : The extracted text from each image is saved to a separate text file.

Output:

Extract text from images in PDF in C#

Wrapping Up

By combining Spire.PDF with Spire.OCR , you can seamlessly transform scanned PDFs and image-based documents into fully searchable and editable text. Whether you need to process entire pages or extract text from specific embedded images, the approach is straightforward and flexible.

This OCR integration not only streamlines document digitization but also enhances productivity by enabling search, copy, and automated data extraction. In industries where large volumes of scanned documents are the norm, implementing OCR with C# can significantly improve accessibility, compliance, and information retrieval speed.

FAQs

Q1. Can I perform OCR on non-English PDFs?

Yes, Spire.OCR supports multiple languages. You can set the Language property in ConfigureOptions to the desired language.

Q2. What should I do if the output is garbled or incorrect?

Check the quality of the input PDF images. If the images are blurry or have low contrast, OCR may struggle to recognize text accurately. Consider enhancing the image quality before processing.

Q3. Can I extract text from images embedded within a PDF?

Yes, you can. Use a helper class to extract images from each page and then apply OCR to recognize text.

Q4. Can Spire.OCR handle handwritten text in PDFs?

Spire.OCR is primarily optimized for printed text. Handwriting recognition typically has lower accuracy.

Q5. Do I need to install additional language models for OCR?

Yes, Spire.OCR requires pre-trained language model files. Download and configure the appropriate models for your target language before performing OCR.

Get a Free License

To fully experience the capabilities of Spire.PDF for .NET and Spire.OCR for .NET without any evaluation limitations, you can request a free 30-day trial license.

Page 10 of 333
page 10