Java (483)
How to Convert PDF to CSV in Java (Easily Extract PDF Tables)
2025-08-28 05:43:27 Written by zaki zou
When working with reports, invoices, or datasets stored in PDF format, developers often need a way to reuse the tabular data in spreadsheets, databases, or analytical tools. A common solution is to convert PDF to CSV using Java, since CSV is lightweight, structured, and compatible with almost every platform.
Unlike text or image export, a PDF-to-CSV conversion is mainly about extracting tables from PDF and saving them as CSV. With the help of Spire.PDF for Java, you can detect table structures in PDFs and export them programmatically with just a few lines of code.
In this article, you’ll learn step by step how to perform a PDF to CSV conversion in Java—from setting up the environment, to extracting tables, and even handling more complex scenarios like multi-page documents or multiple tables per page.
Overview of This Tutorial
Environment Setup for PDF to CSV Conversion in Java
Before extracting tables and converting PDF to CSV using Java, you need to set up the development environment. This involves choosing a suitable library and adding it to your project.
Why Choose Spire.PDF for Java
Since PDF files do not provide a built-in export to CSV, extracting tables programmatically is the practical approach. Spire.PDF for Java offers APIs to detect table structures in PDF documents and save them directly as CSV files, making the conversion process simple and efficient.
Install Spire.PDF for Java
Add Spire.PDF for Java to your project using Maven:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>12.6.4</version>
</dependency>
</dependencies>
If you are not using Maven, you can download the Spire.PDF for Java package and add the JAR files to your project’s classpath.
Extract Tables from PDF and Save as CSV
The most practical way to perform PDF to CSV conversion is by extracting tables. With Spire.PDF for Java, this can be done with just a few steps:
- Load the PDF document.
- Use PdfTableExtractor to find tables on each page.
- Collect cell values row by row.
- Write the output into a CSV file.
Here is a Java example that shows the process from start to finish:
Java Code Example for PDF to CSV Conversion
import com.spire.pdf.*;
import com.spire.pdf.utilities.*;
import java.io.*;
public class PdfToCsvExample {
public static void main(String[] args) throws Exception {
// Load the PDF document
PdfDocument pdf = new PdfDocument();
pdf.loadFromFile("Sample.pdf");
// Create a StringBuilder to store extracted text
StringBuilder sb = new StringBuilder();
// Iterate through each page
for (int i = 0; i < pdf.getPages().getCount(); i++) {
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
PdfTable[] tableLists = extractor.extractTable(i);
if (tableLists != null) {
for (PdfTable table : tableLists) {
for (int row = 0; row < table.getRowCount(); row++) {
for (int col = 0; col < table.getColumnCount(); col++) {
// Escape the cell text safely
String cellText = escapeCsvField(table.getText(row, col));
sb.append(cellText);
if (col < table.getColumnCount() - 1) {
sb.append(",");
}
}
sb.append("\n");
}
}
}
}
// Write the output to a CSV file
FileWriter writer = new FileWriter("output/PDFTable.csv");
writer.write(sb.toString());
writer.close();
pdf.close();
System.out.println("PDF tables successfully exported to CSV.");
}
// Utility method to escape CSV fields
private static String escapeCsvField(String text) {
if (text == null) return "";
// Remove line breaks
text = text.replaceAll("[\\n\\r]", "");
// Escape if contains special characters
if (text.contains(",") || text.contains(";") || text.contains("\"") || text.contains("\n")) {
text = text.replace("\"", "\"\""); // Escape double quotes
text = "\"" + text + "\""; // Wrap with quotes
}
return text;
}
}
Code Walkthrough
- PdfDocument loads the PDF file into memory.
- PdfTableExtractor checks each page for tables.
- PdfTable provides access to rows and columns.
- escapeCsvField() removes line breaks and safely quotes/escapes text if needed.
- StringBuilder accumulates cell text, separated by commas.
- The result is written into Output.csv, which you can open in Excel or any editor.
CSV file generated from a PDF table after running the Java code.

Handling Complex PDF-to-CSV Conversion Cases
In practice, PDFs often contain multiple tables, span multiple pages, or have irregular structures. Let’s see how to extend the solution to handle these scenarios.
1. Multiple Tables per Page
The PdfTable[] returned by extractTable(i) contains all tables detected on a page. You can process each one separately. For example, to save each table as a different CSV file:
for (int i = 0; i < pdf.getPages().getCount(); i++) {
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
PdfTable[] tableLists = extractor.extractTable(i);
if (tableLists != null) {
for (int t = 0; t < tableLists.length; t++) {
PdfTable table = tableLists[t];
StringBuilder tableContent = new StringBuilder();
for (int row = 0; row < table.getRowCount(); row++) {
for (int col = 0; col < table.getColumnCount(); col++) {
tableContent.append(escapeCsvField(table.getText(row, col)));
if (col < table.getColumnCount() - 1) {
tableContent.append(",");
}
}
tableContent.append("\n");
}
FileWriter writer = new FileWriter("Table_Page" + i + "_Index" + t + ".csv");
writer.write(tableContent.toString());
writer.close();
}
}
}
Example of multiple tables in one PDF page exported into separate CSV files.

This way, every table is saved as an independent CSV file for better organization.
2. Multi-page or Large Tables
If a table spans across multiple pages, iterating page by page ensures that all data is collected. The key is to append data instead of overwriting:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < pdf.getPages().getCount(); i++) {
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
PdfTable[] tables = extractor.extractTable(i);
if (tables != null) {
for (PdfTable table : tables) {
for (int row = 0; row < table.getRowCount(); row++) {
for (int col = 0; col < table.getColumnCount(); col++) {
sb.append(escapeCsvField(table.getText(row, col)));
if (col < table.getColumnCount() - 1) sb.append(",");
}
sb.append("\n");
}
}
}
}
FileWriter writer = new FileWriter("MergedTables.csv");
writer.write(sb.toString());
writer.close();
Example of a large table across multiple PDF pages merged into one CSV file.

Here, all tables across pages are merged into one CSV file, useful when dealing with continuous reports.
3. Limitations with Formatting
CSV only stores plain text values. Elements like merged cells, fonts, or images are discarded. If preserving styling is critical, exporting to Excel (.xlsx) is a better alternative, which the same library also supports. See How to Export PDF Table to Excel in Java for more details.
4. CSV Special Characters Handling
When writing tables to CSV, certain characters like commas, semicolons, double quotes, or line breaks can break the file structure if not handled properly.
In the Java examples above, the escapeCsvField method removes line breaks and safely quotes or escapes text when needed.
For more advanced scenarios, you can also use Spire.XLS for Java to write data into worksheets and then save as CSV, which automatically handles special characters and ensures correct CSV formatting without manual processing.
Alternatively, for open-source options, libraries like OpenCSV or Apache Commons CSV also automatically handle special characters and CSV formatting, reducing potential issues and simplifying code.
Conclusion
Converting PDF to CSV in Java essentially means extracting tables and saving them in a structured format. CSV is widely supported, lightweight, and ideal for storing and analyzing tabular data. By setting up Spire.PDF for Java and following the code example, you can automate this process, saving time and reducing manual effort.
If you want to explore more advanced features of Spire.PDF for Java, please apply for a free trial license. You can also use Free Spire.PDF for Java for small projects.
FAQ
Q: Can I turn a PDF into a CSV file? A: Yes. While images and styled text cannot be exported, you can extract tables and save them as CSV files using Java.
Q: How to extract data from a PDF file in Java? A: Use a PDF library like Spire.PDF for Java to parse the document, detect tables, and export them to CSV or Excel.
Q: What is the best PDF to CSV converter? A: For Java developers, programmatic solutions such as Spire.PDF for Java offer more flexibility and automation than manual converters.
Q: How to convert PDF to Excel using Java code? A: The process is similar to CSV export. Instead of writing data as comma-separated text, you can export tables into Excel format for richer features.

CSV (Comma-Separated Values) remains a universal format for data exchange due to its simplicity, readability, and wide compatibility across platforms. If you're looking for a robust and efficient method to read CSV in Java, the Spire.XLS for Java library offers a powerful and straightforward solution.
This guide will walk you through how to use Java to load and read CSV files, as well as convert them into structured DataTables for seamless data manipulation and analysis in your applications.
- Why Choose Spire.XLS for Java to Parse CSV Files?
- Step-by-Step: Read a CSV File in Java
- Advanced: Read CSV into DataTable in Java
- Frequently Asked Questions
- Conclusion
Why Choose Spire.XLS for Java to Parse CSV Files?
Compared with other CSV parser in Java, Spire.XLS offers several advantages for CSV processing:
- Simplified API for reading CSV files
- Support for custom delimiters (not just commas)
- Built-in range detection to avoid empty rows/columns
- Natively converts CSV data to DataTable
- Seamlessly switch between CSV, XLS, and XLSX formats
Step-by-Step: Read a CSV File in Java
Spire.XLS for Java provides the Workbook class to load CSV files and the Worksheet class to access data. Below are the steps to read CSV files line by line with automatic delimiter detection:
1. Setup and Dependencies
First, ensure you have Spire.XLS for Java included in your project. You can add it via Maven by including the following dependency:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.xls</artifactId>
<version>16.4.1</version>
</dependency>
</dependencies>
2. Load the CSV File
Spire.XLS for Java loads CSV files into a Workbook object, where each CSV row becomes a worksheet row.
import com.spire.xls.*;
public class ReadCSV {
public static void main(String[] args) {
// Create Workbook instance
Workbook workbook = new Workbook();
// Load CSV file (specify delimiter)
workbook.loadFromFile("sample.csv", ",", 1, 1);
}
}
Parameters:
The loadFromFile() method accepts four parameters:
- "sample.csv": The input CSV file path.
- ", ": Custom delimiter (e.g."," ";" or "\t").
- 1: Start row index.
- 1: Start column index.
3. Access Worksheet & Read CSV Data
Spire.XLS treats CSV files as single-worksheet workbooks, so we access the first worksheet and then iterate through rows/columns:
// Get the first worksheet
Worksheet sheet = workbook.getWorksheets().get(0);
// Get the used range (avoids iterating over empty rows/columns)
CellRange dataRange = sheet.getAllocatedRange();
//Iterate through the rows
for (int i = 0; i < dataRange.getRowCount(); i++) {
//Iterate through the columns
for (int j = 0; j < dataRange.getColumnCount(); j++) {
// Get cell text
CellRange cell = dataRange.get(i+1,j+1);
System.out.print(cell.getText() + "\t"); // Use tab to separate columns
}
System.out.println(); // New line per row
Output: Read data from a CSV file and print out with tab separation for readability.

Advanced: Read CSV into DataTable in Java
For structured data manipulation, converting CSV to a DataTable is invaluable. A DataTable organizes data into rows and columns, making it easy to query, filter, or integrate with databases.
Java code to read a CSV file and export to a DataTable:
import com.spire.xls.*;
import com.spire.xls.data.table.DataTable;
public class CSVtoDataTable {
public static void main(String[] args) {
// Create a workbook and load a csv file
Workbook workbook = new Workbook();
workbook.loadFromFile("sample.csv", ",", 1, 1);
// Get the first worksheet
Worksheet sheet = workbook.getWorksheets().get(0);
// Export to DataTable
DataTable dataTable = sheet.exportDataTable();
// Get row and column count
System.out.println("Total columns: " + dataTable.getColumns().size());
System.out.println("Total rows: " + dataTable.getRows().size());
System.out.println();
// Print column names
for (int i = 0; i < dataTable.getColumns().size(); i++) {
System.out.print(dataTable.getColumns().get(i).getColumnName() + " | ");
}
System.out.println();
System.out.println("----------------------------------------------------------");
// Print rows
for (int i = 0; i < dataTable.getRows().size(); i++) {
for (int j = 0; j < dataTable.getColumns().size(); j++) {
System.out.print(dataTable.getRows().get(i).getString(j) + "\t");
}
System.out.println();
}
}
}
Key Explanations:
- exportDataTable(): convert CSV data into a DataTable directly, no manual row/column mapping required.
- DataTable Benefits: Easily access basic information such as column count, row count, column names, and row data etc.
Output:

You may also read: Convert CSV to Excel in Java
Frequently Asked Questions
Q1: How do I handle CSV files with different delimiters (semicolon, tab, etc.)?
A: Specify the delimiter in the loadFromFile() method:
// For semicolon-delimited files
workbook.loadFromFile("sample.csv", ";", 0, 0);
// For tab-delimited files
workbook.loadFromFile("sample.csv", "\t", 0, 0);
// For pipe-delimited files
workbook.loadFromFile("sample.csv", "|", 0, 0);
Q2: How do I skip header rows in a CSV file?
A: You can skip header rows by iterating from the second row. For example, if your CSV has 2 header rows (rows 1 and 2) and data starts at row 3:
// Start reading from the third row
for (int i = 2; i < dataRange.getRowCount(); i++) {
for (int j = 0; j < dataRange.getColumnCount(); j++) {
// Convert 0-based loop index to Spire.XLS's 1-based cell index
CellRange cell = dataRange.get(i + 1, j + 1);
System.out.print(cell.getText() + "\t");
Q3. Can I export a specific range of a CSV to a DataTable?
A: Yes. Spire.XLS lets you define a precise cell range and export it to a DataTable with the exportDataTable(CellRange range, boolean exportColumnNames) method.
Conclusion
Spire.XLS for Java simplifies CSV file reading in Java, offering a robust alternative to manual parsing or basic libraries. Whether you need to read a simple CSV, or convert it to a structured DataTable, this guide provides the corresponding examples to help you implement CSV parsing efficiently.
For more advanced features (e.g., exporting to PDF), check the Spire.XLS for Java Documentation.
Java Convert Byte Array to PDF: Load & Create with Spire.PDF
2025-08-21 06:31:19 Written by zaki zou
In modern Java applications, PDF data is not always stored as files on disk. Instead, it may be transmitted over a network, returned by a REST API, or stored as a byte array in a database. In such cases, you’ll often need to convert a byte array back into a PDF file or even generate a new PDF from plain text bytes.
This tutorial will walk you through both scenarios using Spire.PDF for Java, a powerful library for working with PDF documents.
Table of Contents:
- Getting Started with Spire.PDF for Java
- Understanding PDF Bytes vs. Text Bytes
- Loading PDF from Byte Array
- Creating PDF from Text Bytes
- Common Pitfalls to Avoid
- Frequently Asked Questions
- Conclusion
Getting Started with Spire.PDF for Java
Spire.PDF is a powerful and feature-rich API that allows Java developers to create, read, edit, convert, and print PDF documents without any dependencies on Adobe Acrobat.
Key Features:
- Create PDFs with text, images, tables, and shapes.
- Edit existing PDFs and extract text and images.
- Convert PDFs to formats like HTML, Word, Excel, and images.
- Encrypt PDFs with password protection.
- Add watermarks, annotations, and digital signatures.
To get started, download Spire.PDF for Java from our website and add the JAR files to your project's build path. If you’re using Maven, include the following dependency in your pom.xml.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>12.6.4</version>
</dependency>
</dependencies>
Once set up, you can now proceed to convert byte arrays to PDFs and perform other PDF-related operations.
Understanding PDF Bytes vs. Text Bytes
Before coding, it’s important to distinguish between two very different kinds of byte arrays :
- PDF File Bytes : These represent the actual binary structure of a valid PDF document. They always start with %PDF-1.x and contain objects, cross-reference tables, and streams. Such byte arrays can be loaded directly into a PdfDocument.
- Text Bytes : These are simply ASCII or UTF-8 encodings of characters. For example,
byte[] bytes = {84, 104, 105, 115};
System.out.println(new String(bytes)); // Output: "This"
Such arrays are not valid PDFs, but you can create a new PDF and write the text into it.
Loading PDF from Byte Array in Java
Suppose you want to download a PDF from a URL and work with it in memory as a byte array. With Spire.PDF for Java, you can easily load and save it back as a PDF document.
import com.spire.pdf.PdfDocument;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;
public class LoadPdfFromByteArray throws Exception{
public static void main(String[] args) {
// The PDF URL
String fileUrl = "https://www.e-iceblue.com/resource/sample.pdf";
// Download PDF into a byte array
byte[] pdfBytes = downloadPdfAsBytes(fileUrl);
// Create a PdfDocument object
PdfDocument doc = new PdfDocument();
// Load PDF from byte array
doc.loadFromStream(new ByteArrayInputStream(pdfBytes));
// Save the document locally
doc.saveToFile("downloaded.pdf");
doc.close();
}
// Helper method: download file as byte[]
private static byte[] downloadPdfAsBytes(String fileUrl) throws Exception {
URL url = new URL(fileUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
InputStream inputStream = conn.getInputStream();
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
byte[] data = new byte[4096];
int nRead;
while ((nRead = inputStream.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
buffer.flush();
inputStream.close();
conn.disconnect();
return buffer.toByteArray();
}
}
How this works
- Make an HTTP request to fetch the PDF file.
- Convert the InputStream into a byte array using ByteArrayOutputStream .
- Pass the byte array into Spire.PDF via loadFromStream .
- Save or manipulate the document as needed.
Output:

Creating PDF from Text Bytes in Java
If you only have plain text bytes (e.g., This document is created from text bytes.), you can decode them into a string and then draw the text onto a new PDF document.
import com.spire.pdf.*;
import com.spire.pdf.graphics.*;
import java.awt.*;
public class TextFromBytesToPdf {
public static void main(String[] args) {
// Your text bytes
byte[] byteArray = {
84, 104, 105, 115, 32,
100, 111, 99, 117, 109, 101, 110, 116, 32,
105, 115, 32,
99, 114, 101, 97, 116, 101, 100, 32,
102, 114, 111, 109, 32,
116, 101, 120, 116, 32,
98, 121, 116, 101, 115, 46
};
String text = new String(byteArray);
// Create a PDF document
PdfDocument doc = new PdfDocument();
// Configure the page settings
doc.getPageSettings().setSize(PdfPageSize.A4);
doc.getPageSettings().setMargins(40f);
// Add a page
PdfPageBase page = doc.getPages().add();
// Draw the string onto PDF
PdfFont font = new PdfFont(PdfFontFamily.Helvetica, 20f);
PdfSolidBrush brush = new PdfSolidBrush(new PdfRGBColor(Color.black));
page.getCanvas().drawString(text, font, brush, 20, 40);
// Save the document to a PDF file
doc.saveToFile("TextBytes.pdf");
doc.close();
}
}
This will produce a new PDF named TextBytes.pdf (shown below) containing the sentence represented by your byte array.

You might be interested in: How to Generate PDF Documents in Java
Common Pitfalls to Avoid
When converting byte arrays to PDFs, watch out for these issues:
- Confusing plain text with PDF bytes
Not every byte array is a valid PDF. Unless the array starts with %PDF-1.x and contains the proper structure, you can’t load it directly with PdfDocument.loadFromStream .
- Incorrect encoding
If your text bytes are in UTF-16, ISO-8859-1, or another encoding, you need to specify the charset when creating a string:
String text = new String(byteArray, StandardCharsets.UTF_8);
- Large byte arrays
When dealing with large PDFs, consider streaming instead of holding everything in memory to avoid OutOfMemoryError .
- Forgetting to close documents
Always call doc.close() to release resources after saving or processing a PDF.
Frequently Asked Questions
Q1. Can I store a PDF as a byte array in a database?
Yes. You can store a PDF as a BLOB in a relational database. Later, you can retrieve it, load it into a PdfDocument , and save or manipulate it.
Q2. How do I check if a byte array is a valid PDF?
Check if the array begins with the %PDF- header. You can do:
String header = new String(Arrays.copyOfRange(bytes, 0, 5));
if (header.startsWith("%PDF-")) {
// valid PDF
}
Q3. Can Spire.PDF load a PDF directly from an InputStream?
Yes. Instead of converting to a byte array, you can pass the InputStream directly to loadFromStream() .
Q4. Can I convert a PdfDocument back into a byte array?
You can save the document into a ByteArrayOutputStream instead of a file:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.saveToStream(baos);
byte[] pdfBytes = baos.toByteArray();
Q5. What if my byte array contains images instead of text or PDF?
In that case, you’ll need to create a new PDF and insert the image using Spire.PDF’s drawing APIs.
Conclusion
In this article, we explored how to efficiently convert byte arrays to PDF documents using Spire.PDF for Java. Whether you're loading existing PDF files from byte arrays retrieved via APIs or creating new PDFs from plain text bytes, Spire.PDF provides a robust solution to meet your needs.
We covered essential concepts, including the distinction between PDF file bytes and text bytes, and highlighted common pitfalls to avoid during the conversion process. With the right understanding and tools, you can seamlessly integrate PDF functionalities into your Java applications, enhancing your ability to manage and manipulate document data.
For further exploration, consider experimenting with additional features of Spire.PDF, such as editing, encrypting, and converting PDFs to other formats. The possibilities are extensive, and mastering these techniques will undoubtedly improve your development skills and project outcomes.