.NET (1323)
Children categories
Hyperlinks are an essential element in Excel that allows users to reference external data sources, navigate between worksheets, or provide additional information about specific cells. When working with an Excel file, you may need to manipulate hyperlinks for various reasons. For example, you may need to extract all the hyperlinks from the file to perform an analysis, modify an outdated hyperlink to ensure accuracy or remove a broken hyperlink to improve the document's usability. In this article, we will explain how to extract, modify and remove hyperlinks in Excel in C# and VB.NET using Spire.XLS for .NET.
- Extract Hyperlinks from Excel in C# and VB.NET
- Modify Hyperlinks in Excel in C# and VB.NET
- Remove Hyperlinks from Excel in C# and VB.NET
Install Spire.XLS for .NET
To begin with, you need to add the DLL files included in the Spire.XLS for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.XLS
Extract Hyperlinks from Excel in C# and VB.NET
If you are migrating data from an Excel workbook to another system (such as a database) and need to preserve the hyperlinks associated with that data, extracting hyperlinks from the Excel file beforehand is necessary.
The following steps demonstrate how to extract hyperlinks from an Excel file in C# and VB.NET using Spire.XLS for .NET:
- Initialize an instance of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet using the Workbook.Worksheets[int index] property.
- Get the collection of all hyperlinks in the worksheet using the Worksheet.Hyperlinks property.
- Initialize an instance of the StringBuilder class to store the extracted hyperlink information.
- Iterate through the hyperlinks in the hyperlinks collection.
- Get the address and type of each hyperlink using the XlsHyperlink.Address and XlsHyperlink.Type properties.
- Append the address and type to the StringBuilder instance.
- Write the content of the StringBuilder instance into a text file using the File.WriteAllText() method.
- C#
- VB.NET
using Spire.Xls;
using Spire.Xls.Collections;
using Spire.Xls.Core.Spreadsheet;
using System.IO;
using System.Text;
namespace ExtractHyperlinks
{
internal class Program
{
static void Main(string[] args)
{
//Initialize an instance of the Workbook class
Workbook workbook = new Workbook();
//Load an Excel file
workbook.LoadFromFile("Hyperlinks1.xlsx");
//Get the first worksheet
Worksheet sheet = workbook.Worksheets[0];
//Get the collection of all hyperlinks in the worksheet
HyperLinksCollection hyperLinks = sheet.HyperLinks;
//Initialize an instance of the StringBuilder class
StringBuilder sb = new StringBuilder();
//Iterate through the hyperlinks in the collection
foreach (XlsHyperLink hyperlink in hyperLinks)
{
//Get the address of the hyperlink
string address = hyperlink.Address;
//Get the type of the hyperlink
HyperLinkType type = hyperlink.Type;
//Append the address and type of the hyperlink to the StringBuilder instance
sb.AppendLine("Link address: " + address);
sb.AppendLine("Link type: " + type.ToString());
sb.AppendLine();
}
//Write the content of the StringBuilder instance to a text file
File.WriteAllText("GetHyperlinks.txt", sb.ToString());
workbook.Dispose();
}
}
}

Modify Hyperlinks in Excel in C# and VB.NET
If you've accidentally linked to the wrong resource or entered an incorrect URL when creating a hyperlink, you may need to modify the hyperlink to correct the mistake.
The following steps demonstrate how to modify an existing hyperlink in an Excel file:
- Initialize an instance of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet using the Workbook.Worksheets[int index] property.
- Get the collection of all hyperlinks in the worksheet using the Worksheet.Hyperlinks property.
- Get the first hyperlink in the collection.
- Modify the display text and address of the hyperlink using the XlsHyperlink.TextToDisplay and XlsHyperlink.Address properties.
- Save the result file using the Workbook.SaveToFile() method.
- C#
- VB.NET
using Spire.Xls;
using Spire.Xls.Collections;
using Spire.Xls.Core.Spreadsheet;
namespace ModifyHyperlinks
{
internal class Program
{
static void Main(string[] args)
{
//Initialize an instance of the Workbook class
Workbook workbook = new Workbook();
//Load an Excel file
workbook.LoadFromFile("Hyperlinks2.xlsx");
//Get the first worksheet
Worksheet sheet = workbook.Worksheets[0];
//Get the collection of all hyperlinks in the worksheet
HyperLinksCollection links = sheet.HyperLinks;
//Get the first hyperlink in the collection
XlsHyperLink hyperLink = links[0];
//Modify the display text and the address of the hyperlink
hyperLink.TextToDisplay = "Spire.XLS for .NET";
hyperLink.Address = "http://www.e-iceblue.com/Introduce/excel-for-net-introduce.html";
//Save the result file
workbook.SaveToFile("ModifyHyperlink.xlsx", ExcelVersion.Version2013);
workbook.Dispose();
}
}
}

Remove Hyperlinks from Excel in C# and VB.NET
Removing the irrelevant hyperlinks can help make your worksheet neater and more professional-looking.
The following steps demonstrate how to remove a specific hyperlink from an Excel file:
- Initialize an instance of the Workbook class.
- Load an Excel file using the Workbook.LoadFromFile() method.
- Get a specific worksheet using the Workbook.Worksheets[int index] property.
- Remove a specific hyperlink from the worksheet using the Worksheet.Hyperlinks.RemoveAt(int index) method.
- Save the result file using the Workbook.SaveToFile() method.
- C#
- VB.NET
using Spire.Xls;
namespace RemoveHyperlinks
{
internal class Program
{
static void Main(string[] args)
{
//Initialize an instance of the Workbook class
Workbook workbook = new Workbook();
//Load an Excel file
workbook.LoadFromFile("Hyperlinks2.xlsx");
//Get the first worksheet
Worksheet sheet = workbook.Worksheets[0];
//Remove the first hyperlink and keep its display text
sheet.HyperLinks.RemoveAt(0);
//Remove all content from the cell
//sheet.Range["B2"].ClearAll();
//Save the result file
workbook.SaveToFile("RemoveHyperlink.xlsx", ExcelVersion.Version2013);
workbook.Dispose();
}
}
}

Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.
The sample demonstrates how to Set PDF Text Format for Silverlight via Spire.PDF.

The sample demonstrates how to Create Table in Word for Silverlight via Spire.Doc.

The sample demonstrates how to Edit Excel in Silverlight via Spire.XLS.

Effortlessly Automate PDF Text Extraction Using C# .NET: A Complete Guide
2022-07-29 06:33:00 Written by Koohji
Manually extracting text from PDF files can be tedious, error-prone, and inefficient—especially when working with large volumes of documents or complex layouts. PDFs store content based on coordinates rather than linear text flow, making it difficult to retrieve structured or readable text without specialized tools.
For developers working in C# .NET, automating PDF text extraction is essential for streamlining workflows such as document processing, content indexing, data migration, and digital archiving.
This comprehensive guide shows you how to read text from PDF using C# and Spire.PDF for .NET, a powerful library for reading and processing PDF documents. You’ll learn how to:
- Extract full-text content from entire documents
- Retrieve text from individual pages
- Capture content within defined regions
- Obtain position and font metadata for advanced use cases
Whether you're building a PDF parser, developing an automated document management system, or migrating PDF data into structured formats, this article provides ready-to-use C# code examples and best practices to help you extract text from PDFs quickly, accurately, and at scale.
Table of Contents
- Why Use Spire.PDF for Text Extraction in .NET?
- Extract Text from PDF (Basic Example)
- Advanced Text Extraction Options
- Conclusion
- FAQs
Why Use Spire.PDF for Text Extraction in .NET?
Spire.PDF for .NET is a feature-rich and developer friendly library that supports seamless text extraction from PDFs in .NET applications. Here's why it stands out:
- Precise Layout Preservation: Maintains original layout, spacing, and reading order.
- Detailed Extraction: Retrieve text along with its metadata like position and size.
- No Adobe Dependency: Works independently of Adobe Acrobat or other third-party tools.
- Quick Integration: Clean API and extensive documentation for faster development.
Installation
Before getting started, install the library in your project via NuGet:
Install-Package Spire.PDF
Or download the DLL and manually reference it in your solution.
Extract Text from PDF (Basic Example)
Extract full text content from a PDF is crucial for capturing all information for analysis or processing.
This basic example extracts all text content from a PDF file uses the PdfTextExtractor class and saves it to a text file with the original spacing, line breaks and layout preserved.
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.IO;
using System.Text;
namespace ExtractAllTextFromPDF
{
internal class Program
{
static void Main(string[] args)
{
// Create a PDF document instance
PdfDocument pdf = new PdfDocument();
// Load the PDF file
pdf.LoadFromFile("Sample.pdf");
// Initialize a StringBuilder to hold the extracted text
StringBuilder extractedText = new StringBuilder();
// Loop through each page in the PDF
foreach (PdfPageBase page in pdf.Pages)
{
// Create a PdfTextExtractor for the current page
PdfTextExtractor extractor = new PdfTextExtractor(page);
// Set extraction options
PdfTextExtractOptions option = new PdfTextExtractOptions
{
IsExtractAllText = true
};
// Extract text from the current page
string text = extractor.ExtractText(option);
// Append the extracted text to the StringBuilder
extractedText.AppendLine(text);
}
// Save the extracted text to a text file
File.WriteAllText("ExtractedText.txt", extractedText.ToString());
// Close the PDF document
pdf.Close();
}
}
}
Advanced Text Extraction Options
Spire.PDF offers more than basic full-document extraction. It supports advanced scenarios like retrieving text from specific pages, extracting content from defined areas, and accessing text layout details such as position and dimensions. This section explores these capabilities with practical examples.
Retrieve Text from Individual PDF Pages
Sometimes, you only need text from a particular page—for example, when processing a specific section of a multi-page document. You can access the desired page from the Pages collection of the document and then apply the extraction logic to it.
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.IO;
namespace ExtractTextFromIndividualPages
{
internal class Program
{
static void Main(string[] args)
{
// Create a PDF document instance
PdfDocument pdf = new PdfDocument();
// Load the PDF file
pdf.LoadFromFile("Sample.pdf");
// Access the page to extract text from (e.g., index 1 = the second page)
PdfPageBase page = pdf.Pages[1];
// Create a PdfTextExtractor for the selected page
PdfTextExtractor extractor = new PdfTextExtractor(page);
// Set extraction options
PdfTextExtractOptions option = new PdfTextExtractOptions
{
IsExtractAllText = true
};
// Extract text from the specified page
string text = extractor.ExtractText(option);
// Save the extracted text to a text file
File.WriteAllText("IndividualPage.txt", text);
// Close the PDF document
pdf.Close();
}
}
}

Read Text within a Defined Area on a PDF Page
If you're interested in text within a specific rectangular area (e.g., header or footer), you can set a rectangular extraction region via PdfTextExtractOptions.ExtractArea to limit the extraction scope.
using Spire.Pdf;
using Spire.Pdf.Texts;
using System.IO;
using System.Drawing;
namespace ExtractTextFromDefinedArea
{
internal class Program
{
static void Main(string[] args)
{
// Create a PDF document instance
PdfDocument doc = new PdfDocument();
// Load the PDF file
doc.LoadFromFile("Sample.pdf");
// Get the second page
PdfPageBase page = doc.Pages[1];
// Create a PdfTextExtractor for the selected page
PdfTextExtractor textExtractor = new PdfTextExtractor(page);
// Set extraction options with a defined rectangular area
PdfTextExtractOptions extractOptions = new PdfTextExtractOptions
{
ExtractArea = new RectangleF(0, 0, 890, 170)
};
// Extract text from the specified rectangular area
string text = textExtractor.ExtractText(extractOptions);
// Save the extracted text to a text file
File.WriteAllText("Extracted.txt", text);
// Close the PDF document
doc.Close();
}
}
}

Get Text Position and Size Information for Advanced Processing
For advanced tasks like annotation or content overlay, accessing the position and size of each text fragment is crucial. You can obtain this information using PdfTextFinder and PdfTextFragment.
using Spire.Pdf;
using Spire.Pdf.Texts;
using System;
using System.Collections.Generic;
using System.Drawing;
namespace ExtractTextWithPositionAndSize
{
class Program
{
static void Main(string[] args)
{
// Load the PDF document
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("Sample.pdf");
// Iterate through each page of the document
for (int i = 0; i < pdf.Pages.Count; i++)
{
PdfPageBase page = pdf.Pages[i];
// Create a PdfTextFinder object for the current page
PdfTextFinder finder = new PdfTextFinder(page);
// Find all text fragments on the page
List<PdfTextFragment> fragments = finder.FindAllText();
Console.WriteLine($"Page {i + 1}:");
// Iterate over each text fragment
foreach (PdfTextFragment fragment in fragments)
{
// Extract text content
string text = fragment.Text;
// Get bounding rectangles with position and size
RectangleF[] rects = fragment.Bounds;
Console.WriteLine($"Text: \"{text}\"");
// Iterate through each rectangle for this fragment
foreach (var rect in rects)
{
Console.WriteLine($"Position: ({rect.X}, {rect.Y}), Size: ({rect.Width} x {rect.Height})");
}
Console.WriteLine();
}
}
}
}
}
Conclusion
Whether you're performing simple extraction or building advanced document automation tools, Spire.PDF for .NET provides versatile and accurate methods to extract and manipulate PDF text:
- Full-text extraction for complete documents
- Page-level control to isolate relevant sections
- Area-based targeting for structured or repeated patterns
- Precise layout data for custom rendering and analysis
By combining these techniques, you can create powerful and flexible PDF processing workflows tailored to your application's needs.
FAQs
Q1: Can Spire.PDF extract text from password-protected PDFs?
A1: Yes, by providing the correct password when loading the documents, Spire.PDF can open and extract text from secured PDFs.
Q2: Does Spire.PDF support batch extraction?
A2: Absolutely. You can iterate over a directory of PDF files and apply the same extraction logic programmatically.
Q3: Can it extract font styles and sizes?
A3: Yes. Spire.PDF allows you to retrieve font-related details such as font name, size, style.
Q4: Can I extract images or tables as well?
A4: While text extraction is the focus of this guide, Spire.PDF can also extract images and supports table detection with additional logic.
Q5: Can Spire.PDF extract text from scanned (image-based) PDFs?
A5: Scanned PDFs require OCR (Optical Character Recognition). Spire.PDF doesn't provide built-in OCR, but you can combine it with an OCR library - Spire.OCR for image-to-text conversion.
Get a Free License
To fully experience the capabilities of Spire.PDF for .NET without any evaluation limitations, you can request a free 30-day trial license.
After searching so much information about PDF merge, it is easy to find that whether you merge PDF files online or use C#/VB.NET to realize this task, you never escape worrying some important points such as the safety of your PDF file, so much time it costs or whether the merged file supports to print page number and so on. However, as long as you come here, these troubles will not appear. This section will specifically introduce you a secure solution to merge PDF files into one with C#, VB.NET via a .NET PDF component Spire.PDF for .NET.
Spire.PDF for .NET, built from scratch in C#, enables programmers and developers to create, read, write and manipulate PDF documents in .NET applications without using Adobe Acrobat or any external libraries. Using Spire.PDF for .NET, you not only can quickly merge PDF files but also enables you to print PDF page with page number. Now please preview the effective screenshot below:

Before following below procedure, please download Spire.PDF for .NET and install it on system.
Step1: You can use the String array to save the names of the three PDF files which will be merged into one PDF and demonstrate Spire.Pdf.PdfDocument array. Then, load three PDF files and select the first PdfDocument for the purpose of merging the second and third PDF file to it. In order to import all pages from the second PDF file to the first PDF file, you need to call the method public void AppendPage(PdfDocument doc). Also by calling another method public PdfPageBase InsertPage(PdfDocument doc, int pageIndex),every page of the third PDF file can be imported to the first PDF file.
private void button1_Click(object sender, EventArgs e)
{
//pdf document list
String[] files = new String[]
{
@"..\PDFmerge0.pdf",
@"..\ PDFmerge1.pdf",
@"..\ PDFmerge2.pdf"
};
//open pdf documents
PdfDocument[] docs = new PdfDocument[files.Length];
for (int i = 0; i < files.Length; i++)
{
docs[i] = new PdfDocument(files[i]);
}
//append document
docs[0].AppendPage(docs[1]);
//import PDF pages
for (int i = 0; i < docs[2].Pages.Count; i = i + 2)
{
docs[0].InsertPage(docs[2], i);
}
Private Sub button1_Click(sender As Object, e As EventArgs)
'pdf document list
Dim files As [String]() = New [String]() {"..\PDFmerge0.pdf", "..\ PDFmerge1.pdf", "..\ PDFmerge2.pdf"}
'open pdf documents
Dim docs As PdfDocument() = New PdfDocument(files.Length - 1) {}
For i As Integer = 0 To files.Length - 1
docs(i) = New PdfDocument(files(i))
Next
'append document
docs(0).AppendPage(docs(1))
'import PDF pages
Dim j As Integer = 0
While j < docs(2).Pages.Count
docs(0).InsertPage(docs(2), j)
j = j + 2
End While"
Step2: Draw page number in the first PDF file. In this step, you can set PDF page number margin by invoking the class Spire.Pdf.Graphics. PdfMargins. Then, Call the custom method DrawPageNumber(PdfPageCollection pages, PdfMargins margin, int startNumber, int pageCount) to add page number in the bottom of every page in the first PDF. Please see the detail code below:
//set PDF margin
PdfUnitConvertor unitCvtr = new PdfUnitConvertor();
PdfMargins margin = new PdfMargins();
margin.Top = unitCvtr.ConvertUnits(2.54f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point);
margin.Bottom = margin.Top;
margin.Left = unitCvtr.ConvertUnits(3.17f, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point);
margin.Right = margin.Left;
this.DrawPageNumber(docs[0].Pages, margin, 1, docs[0].Pages.Count);
private void DrawPageNumber(PdfPageCollection pages, PdfMargins margin, int startNumber, int pageCount)
{
foreach (PdfPageBase page in pages)
{
page.Canvas.SetTransparency(0.5f);
PdfBrush brush = PdfBrushes.Black;
PdfPen pen = new PdfPen(brush, 0.75f);
PdfTrueTypeFont font = new PdfTrueTypeFont(new Font("Arial", 9f, System.Drawing.FontStyle.Italic), true);
PdfStringFormat format = new PdfStringFormat(PdfTextAlignment.Right);
format.MeasureTrailingSpaces = true;
float space = font.Height * 0.75f;
float x = margin.Left;
float width = page.Canvas.ClientSize.Width - margin.Left - margin.Right;
float y = page.Canvas.ClientSize.Height - margin.Bottom + space;
page.Canvas.DrawLine(pen, x, y, x + width, y);
y = y + 1;
String numberLabel
= String.Format("{0} of {1}", startNumber++, pageCount);
page.Canvas.DrawString(numberLabel, font, brush, x + width, y, format);
page.Canvas.SetTransparency(1);
}
}
'set PDF margin
Dim unitCvtr As New PdfUnitConvertor()
Dim margin As New PdfMargins()
margin.Top = unitCvtr.ConvertUnits(2.54F, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point)
margin.Bottom = margin.Top
margin.Left = unitCvtr.ConvertUnits(3.17F, PdfGraphicsUnit.Centimeter, PdfGraphicsUnit.Point)
margin.Right = margin.Left
Me.DrawPageNumber(docs(0).Pages, margin, 1, docs(0).Pages.Count)
Private Sub DrawPageNumber(pages As PdfPageCollection, margin As PdfMargins, startNumber As Integer, pageCount As Integer)
For Each page As PdfPageBase In pages
page.Canvas.SetTransparency(0.5F)
Dim brush As PdfBrush = PdfBrushes.Black
Dim pen As New PdfPen(brush, 0.75F)
Dim font As New PdfTrueTypeFont(New Font("Arial", 9F, System.Drawing.FontStyle.Italic), True)
Dim format As New PdfStringFormat(PdfTextAlignment.Right)
format.MeasureTrailingSpaces = True
Dim space As Single = font.Height * 0.75F
Dim x As Single = margin.Left
Dim width As Single = page.Canvas.ClientSize.Width - margin.Left - margin.Right
Dim y As Single = page.Canvas.ClientSize.Height - margin.Bottom + space
page.Canvas.DrawLine(pen, x, y, x + width, y)
y = y + 1
Dim numberLabel As [String] = [String].Format("{0} of {1}", System.Math.Max(System.Threading.Interlocked.Increment(startNumber),startNumber - 1), pageCount)
page.Canvas.DrawString(numberLabel, font, brush, x + width, y, format)
page.Canvas.SetTransparency(1)
Next
End Sub
The PDF merge code can be very long when you view it at first sight, actually, if you do not need to add page number in your merged PDF, steps two should be avoided. However, in many cases, page number brings great convenience for users to read PDF as well as print it. Spire.PDF for .NET can satisfy both your requirements of merging PDF files and adding page numbers in the merged PDF file.
The sample demonstrates how to set PDF properties for Silverlight via Spire.PDF.

The sample demonstrates how to Create Excel file for Silverlight via Spire.XLS.

The sample demonstrates how to add bookmark into Word for Silverlight via Spire.Doc.

Table in Microsoft Word is used to present data information which can assist to explain specified paragraph contents. In order to have a better appearance, people can set Word table style. This guide shows how to use Spire.Doc to set table style in Word with C#/VB.NET.
Download Spire.Doc (or Spire.Office) with .NET Framework 2.0 (or above) together. Once make sure Spire.Doc (or Spire.Office) are correctly installed on system, follow the steps below to set Word table style
In this example, a Word document with table has been prepared. It is a student transcript template from Office.com.
Step 1: Create a C#/VB.NET project in Visual Studio. Add Spire.Doc.dll as reference.
Document document = new Document(); document.LoadFromFile(@"E:\work\Documents\Student Transcript.docx");
Dim document As New Document()
document.LoadFromFile("E:\work\Documents\Student Transcript.docx")
Step 2: Set Table Style
Get table which you want to set style
Because table1 type is different from document.Sections[0].Tables[1] type, so use (Table) to transformed forcibly.
Table table1 = (Table)document.Sections[0].Tables[1];
Dim table1 As Table = CType(document.Sections(0).Tables(1), Table)
Set table row height.
table1.Rows[0].Height = 25;
table1.Rows(0).Height = 25
Set Table Style
In order to have distinction. Keep the first cell in first row as before and set style for the second cell. Firstly, set alignment and background color for the second cell. Secondly, declare a paragraph style, including font size, color and apply this style in cell.
table1.Rows[0].Cells[1].CellFormat.VerticalAlignment = VerticalAlignment.Middle; table1.Rows[0].Cells[1].CellFormat.BackColor = Color.LimeGreen; ParagraphStyle style = new ParagraphStyle(document); style.Name = "TableStyle"; style.CharacterFormat.FontSize = 14; style.CharacterFormat.TextColor = Color.GhostWhite; document.Styles.Add(style); table1.Rows[0].Cells[1].Paragraphs[0].ApplyStyle(style.Name);
table1.Rows(0).Cells(1).CellFormat.VerticalAlignment = VerticalAlignment.Middle table1.Rows(0).Cells(1).CellFormat.BackColor = Color.LimeGreen Dim style As New ParagraphStyle(document) style.Name = "TableStyle" style.CharacterFormat.FontSize = 14 style.CharacterFormat.TextColor = Color.GhostWhite document.Styles.Add(style) table1.Rows(0).Cells(1).Paragraphs(0).ApplyStyle(style.Name)
Step 3: Save and Launch
document.SaveToFile("WordTable.docx", FileFormat.Docx);
System.Diagnostics.Process.Start("WordTable.docx");
document.SaveToFile("WordTable.docx", FileFormat.Docx)
System.Diagnostics.Process.Start("WordTable.docx")
Effective Screenshot:

This guide shows how to set Word table style such as size and color via Spire.Doc. However, Spire.Doc can do a lot on operating Word document Click to learn more