Spire.Office Knowledgebase Page 59 | E-iceblue

OLE (Object Linking and Embedding) objects in Word are files or data from other applications that can be inserted into a document. These objects can be edited and updated within Word, allowing you to seamlessly integrate content from various programs, such as Excel spreadsheets, PowerPoint presentations, or even multimedia files like images, audio, or video. In this article, we will introduce how to insert and extract OLE objects in a Word document in Python using Spire.Doc for Python.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to this tutorial: How to Install Spire.Doc for Python on Windows

Insert OLE Objects in Word in Python

Spire.Doc for Python provides the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method to embed OLE objects in a Word document. The detailed steps are as follows.

  • Create an object of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Get a specific section using the Document.Sections.get_Item(index) method.
  • Add a paragraph to the section using the Section.AddParagraph() method.
  • Create an object of the DocPicture class.
  • Load an image that will be used as the icon of the OLE object using the DocPicture.LoadImage() method and then set image width and height.
  • Append an OLE object to the paragraph using the Paragraph.AppendOleObject(pathToFile:str, olePicture:DocPicture, type:OleObjectType) method.
  • Save the result file using the Document.SaveToFile() method.

The following code example shows how to embed an Excel spreadsheet, a PDF file, and a PowerPoint presentation in a Word document using Spire.Doc for Python:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("Example.docx")

# Get the first section
section = doc.Sections.get_Item(0)

# Add a paragraph to the section
para1 = section.AddParagraph()
para1.AppendText("Excel File: ")
# Load an image which will be used as the icon of the OLE object
picture1 = DocPicture(doc)
picture1.LoadImage("Excel-Icon.png")
picture1.Width = 50
picture1.Height = 50
# Append an OLE object (an Excel spreadsheet) to the paragraph 
para1.AppendOleObject("Budget.xlsx", picture1, OleObjectType.ExcelWorksheet)

# Add a paragraph to the section
para2 = section.AddParagraph()
para2.AppendText("PDF File: ")
# Load an image which will be used as the icon of the OLE object
picture2 = DocPicture(doc)
picture2.LoadImage("PDF-Icon.png")
picture2.Width = 50
picture2.Height = 50
# Append an OLE object (a PDF file) to the paragraph 
para2.AppendOleObject("Report.pdf", picture2, OleObjectType.AdobeAcrobatDocument)

# Add a paragraph to the section
para3 = section.AddParagraph()
para3.AppendText("PPT File: ")
# Load an image which will be used as the icon of the OLE object
picture3 = DocPicture(doc)
picture3.LoadImage("PPT-Icon.png")
picture3.Width = 50
picture3.Height = 50
# Append an OLE object (a PowerPoint presentation) to the paragraph 
para3.AppendOleObject("Plan.pptx", picture3, OleObjectType.PowerPointPresentation)

doc.SaveToFile("InsertOLE.docx", FileFormat.Docx2013)
doc.Close()

Python: Insert or Extract OLE Objects in Word

Extract OLE Objects from Word in Python

To extract OLE objects from a Word document, you first need to locate the OLE objects within the document. Once located, you can determine the file format of each OLE object. Finally, you can save the data of each OLE object to a file in its native file format. The detailed steps are as follows.

  • Create an instance of the Document class.
  • Load a Word document using the Document.LoadFromFile() method.
  • Iterate through all sections of the document.
  • Iterate through all child objects in the body of each section.
  • Identify the paragraphs within each section.
  • Iterate through the child objects in each paragraph.
  • Locate the OLE object within the paragraph.
  • Determine the file format of the OLE object.
  • Save the data of the OLE object to a file in its native file format.

The following code example shows how to extract the embedded Excel spreadsheet, PDF file, and PowerPoint presentation from a Word document using Spire.Doc for Python:

  • Python
from spire.doc import *
from spire.doc.common import *

# Create an object of the Document class
doc = Document()
# Load a Word document
doc.LoadFromFile("InsertOLE.docx")

i = 1 
# Iterate through all sections of the Word document
for k in range(doc.Sections.Count):
    sec = doc.Sections.get_Item(k)
    # Iterate through all child objects in the body of each section
    for j in range(sec.Body.ChildObjects.Count):
        obj = sec.Body.ChildObjects.get_Item(j)
        # Check if the child object is a paragraph
        if isinstance(obj, Paragraph):
            par = obj if isinstance(obj, Paragraph) else None
            # Iterate through the child objects in the paragraph
            for m in range(par.ChildObjects.Count):
                o = par.ChildObjects.get_Item(m)
                # Check if the child object is an OLE object
                if o.DocumentObjectType == DocumentObjectType.OleObject:
                    ole = o if isinstance(o, DocOleObject) else None
                    s = ole.ObjectType
                    # Check if the OLE object is a PDF file
                    if s.startswith("AcroExch.Document"):
                        ext = ".pdf"
                    # Check if the OLE object is an Excel spreadsheet
                    elif s.startswith("Excel.Sheet"):
                        ext = ".xlsx"
                    # Check if the OLE object is a PowerPoint presentation
                    elif s.startswith("PowerPoint.Show"):
                        ext = ".pptx"
                    else:
                        continue
                    # Write the data of OLE into a file in its native format
                    with open(f"Output/OLE{i}{ext}", "wb") as file:
                            file.write(ole.NativeData)                        
                    i += 1

doc.Close()

Python: Insert or Extract OLE Objects in Word

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

In addition to text and images, PDF files can also contain various types of attachments, such as documents, images, audio files, or other multimedia elements. Extracting attachments from PDF files allows users to retrieve and save the embedded content, enabling easy access and manipulation outside of the PDF environment. This process proves especially useful when dealing with PDFs that contain important supplementary materials, such as reports, spreadsheets, or legal documents.

In this article, you will learn how to extract attachments from a PDF document in Python using Spire.PDF for Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Prerequisite Knowledge

There are generally two categories of attachments in PDF files: document-level attachments and annotation attachments. Below, you can find a table outlining the disparities between these two types of attachments and how they are represented in Spire.PDF.

Attachment type Represented by Definition
Document level attachment PdfAttachment class A file attached to a PDF at the document level won't appear on a page, but can be viewed in the "Attachments" panel of a PDF reader.
Annotation attachment PdfAnnotationAttachment class A file attached as an annotation can be found on a page or in the "Attachments" panel. An annotation attachment is shown as a paper clip icon on the page; reviewers can double-click the icon to open the file.

Extract Document-Level Attachments from PDF in Python

To retrieve document-level attachments in a PDF document, you can use the PdfDocument.Attachments property. Each attachment has a PdfAttachment.FileName property, which provides the name of the specific attachment, including the file extension. Additionally, the PdfAttachment.Data property allows you to access the attachment's data. To save the attachment to a specific folder, you can utilize the PdfAttachment.Data.Save() method.

The steps to extract document-level attachments from a PDF using Python are as follows.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Get a collection of attachments using PdfDocument.Attachments property.
  • Iterate through the attachments in the collection.
  • Get a specific attachment from the collection, and get the file name and data of the attachment using PdfAttachment.FileName property and PdfAttachment.Data property.
  • Save the attachment to a specified folder using PdfAttachment.Data.Save() method.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\Attachments.pdf")

# Get the attachment collection from the document
collection = doc.Attachments

# Loop through the collection
if collection.Count > 0:
    for i in range(collection.Count):

        # Get a specific attachment
        attactment = collection.get_Item(i)

        # Get the file name and data of the attachment
        fileName= attactment.FileName
        data = attactment.Data

        # Save it to a specified folder
        data.Save("Output\\ExtractedFiles\\" + fileName)

doc.Close()

Python: Extract Attachments from a PDF Document

Extract Annotation Attachments from PDF in Python

The Annotations attachment is a page-based element. To retrieve annotations from a specific page, use the PdfPageBase.AnnotationsWidget property. You then need to determine if a particular annotation is an attachment. If it is, save it to the specified folder while retaining its original filename.

The following are the steps to extract annotation attachments from a PDF using Python.

  • Create a PdfDocument object.
  • Load a PDF file using PdfDocument.LoadFromFile() method.
  • Iterate though the pages in the document.
  • Get the annotations from a particular page using PdfPageBase.AnnotationsWidget property.
  • Iterate though the annotations, and determine if a specific annotation is an attachment annotation.
  • If it is, get the file name and data of the annotation using PdfAttachmentAnnotation.FileName property and PdfAttachmentAnnotation.Data property.
  • Save the annotated attachment to a specified folder.
  • Python
from spire.pdf import *
from spire.pdf.common import *

# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF file
doc.LoadFromFile("C:\\Users\\Administrator\\Desktop\\AnnotationAttachment.pdf")

# Iterate through the pages in the document
for i in range(doc.Pages.Count):

    # Get a specific page
    page = doc.Pages.get_Item(i)

    # Get the annotation collection of the page
    annotationCollection = page.AnnotationsWidget

    # If the page has annotations
    if annotationCollection.Count > 0:

        # Iterate through the annotations
        for j in range(annotationCollection.Count):

            # Get a specific annotation
            annotation = annotationCollection.get_Item(j)

            # Determine if the annotation is an attachment annotation
            if isinstance(annotation, PdfAttachmentAnnotationWidget):
              
                # Get the file name and data of the attachment
                fileName = annotation.FileName
                byteData = annotation.Data
                streamMs = Stream(byteData)

                # Save the attachment into a specified folder 
                streamMs.Save("Output\\ExtractedFiles\\" + fileName)

Python: Extract Attachments from a PDF Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Python: Draw Shapes in PDF Documents

2024-02-18 07:06:31 Written by Koohji

Shapes play a vital role in PDF documents. By drawing graphics, defining outlines, filling colors, setting border styles, and applying geometric transformations, shapes provide rich visual effects and design options for documents. The properties of shapes such as color, line type, and fill effects can be customized according to requirements to meet personalized design needs. They can be used to create charts, decorations, logos, and other elements that enhance the readability and appeal of the document. This article will introduce how to use Spire.PDF for Python to draw shapes into PDF documents from Python.

Install Spire.PDF for Python

This scenario requires Spire.PDF for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.PDF

If you are unsure how to install, please refer to this tutorial: How to Install Spire.PDF for Python on Windows

Draw Lines in PDF Documents in Python

Spire.PDF for Python provides the PdfPageBase.Canvas.DrawLine() method to draw lines by specifying the coordinates of the starting point and end point and a brush object. Here is a detailed step-by-step guide on how to draw lines:

  • Create a PdfDocument object.
  • Use the PdfDocument.Pages.Add() method to add a blank page to the PDF document.
  • Save the current drawing state using the PdfPageBase.Canvas.Save() method so it can be restored later.
  • Define the start point coordinate (x, y) and the length of a solid line segment.
  • Create a PdfPen object.
  • Draw a solid line segment using the PdfPageBase.Canvas.DrawLine() method with the previously created pen object.
  • Set the DashStyle property of the pen to PdfDashStyle.Dash to create a dashed line style.
  • Draw a dashed line segment using the pen with a dashed line style via the PdfPageBase.Canvas.DrawLine() method.
  • Restore the previous drawing state using the PdfPageBase.Canvas.Restore(state) method.
  • Save the document to a file using the PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create PDF Document Object
doc = PdfDocument()

# Add a Page
page = doc.Pages.Add()

# Save the current drawing state
state = page.Canvas.Save()

# The starting X coordinate of the line
x = 100.0

# The starting Y coordinate of the line
y = 50.0

# The length of the line
width = 300.0

# Create a pen object with deep sky blue color and a line width of 3.0
pen = PdfPen(PdfRGBColor(Color.get_DeepSkyBlue()), 3.0)  

# Draw a solid line
page.Canvas.DrawLine(pen, x, y, x + width, y)

# Set the pen style to dashed
pen.DashStyle = PdfDashStyle.Dash

# Set the dashed pattern to [1, 4, 1]
pen.DashPattern = [1, 4, 1]

# The Y coordinate for the start of the dashed line
y = 80.0

# Draw a dashed line
page.Canvas.DrawLine(pen, x, y, x + width, y)

# Restore the previously saved drawing state
page.Canvas.Restore(state)

# Save the document to a file
doc.SaveToFile("Drawing Lines.pdf")

# Close the document and release resources
doc.Close()
doc.Dispose()

Python: Draw Shapes in PDF Documents

Draw Pies in PDF Documents in Python

To draw pie charts with different positions, sizes, and angles on a specified page, call the PdfPageBase.Canvas.DrawPie() method and pass appropriate parameters. The detailed steps are as follows:

  • Create a PdfDocument object.
  • Add a blank page to the PDF document using the PdfDocument.Pages.Add() method.
  • Save the current drawing state using the PdfPageBase.Canvas.Save() method so it can be restored later.
  • Create a PdfPen object.
  • Call the PdfPageBase.Canvas.DrawPie() method and pass various position, size, and angle parameters to draw three pie charts.
  • Restore the previous drawing state using the PdfPageBase.Canvas.Restore(state) method.
  • Save the document to a file using the PdfDocument.SaveToFile() method.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create PDF Document Object
doc = PdfDocument()

# Add a Page
page = doc.Pages.Add()

# Save the current drawing state
state = page.Canvas.Save()

# Create a pen object with dark red color and a line width of 2.0
pen = PdfPen(PdfRGBColor(Color.get_DarkRed()), 2.0)

# Draw the first pie chart
page.Canvas.DrawPie(pen, 10.0, 30.0, 130.0, 130.0, 360.0, 300.0)

# Draw the second pie chart
page.Canvas.DrawPie(pen, 160.0, 30.0, 130.0, 130.0, 360.0, 330.0)

# Draw the third pie chart
page.Canvas.DrawPie(pen, 320.0, 30.0, 130.0, 130.0, 360.0, 360.0)

# Restore the previously saved drawing state
page.Canvas.Restore(state)

# Save the document to a file
doc.SaveToFile("Drawing Pie Charts.pdf")

# Close the document and release resources
doc.Close()
doc.Dispose()

Python: Draw Shapes in PDF Documents

Draw Rectangles in PDF Documents in Python

Spire.PDF for Python provides the PdfPageBase.Canvas.DrawRectangle() method to draw rectangular shapes. By passing position and size parameters, you can define the position and dimensions of the rectangle. Here are the detailed steps for drawing a rectangle:

  • Create a PdfDocument object.
  • Use the PdfDocument.Pages.Add() method to add a blank page to the PDF document.
  • Use the PdfPageBase.Canvas.Save() method to save the current drawing state for later restoration.
  • Create a PdfPen object.
  • Use the PdfPageBase.Canvas.DrawRectangle() method with the pen to draw the outline of a rectangle.
  • Create a PdfLinearGradientBrush object for linear gradient filling.
  • Use the PdfPageBase.Canvas.DrawRectangle() method with the linear gradient brush to draw a filled rectangle.
  • Create a PdfRadialGradientBrush object for radial gradient filling.
  • Use the PdfPageBase.Canvas.DrawRectangle() method with the radial gradient brush to draw a filled rectangle.
  • Use the PdfPageBase.Canvas.Restore(state) method to restore the previously saved drawing state.
  • Use the PdfDocument.SaveToFile() method to save the document to a file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create PDF Document Object
doc = PdfDocument()

# Add a Page
page = doc.Pages.Add()

# Save the current drawing state
state = page.Canvas.Save()

# Create a Pen object with chocolate color and line width of 1.5
pen = PdfPen(PdfRGBColor(Color.get_Chocolate()), 1.5)

# Draw the outline of a rectangle using the pen
page.Canvas.DrawRectangle(pen, RectangleF(PointF(20.0, 30.0), SizeF(150.0, 120.0)))

# Create a linear gradient brush
linearGradientBrush = PdfLinearGradientBrush(PointF(200.0, 30.0), PointF(350.0, 150.0), PdfRGBColor(Color.get_Green()), PdfRGBColor(Color.get_Red()))

# Draw a filled rectangle using the linear gradient brush
page.Canvas.DrawRectangle(linearGradientBrush, RectangleF(PointF(200.0, 30.0), SizeF(150.0, 120.0)))

# Create a radial gradient brush
radialGradientBrush = PdfRadialGradientBrush(PointF(380.0, 30.0), 150.0, PointF(530.0, 150.0), 150.0, PdfRGBColor(Color.get_Orange()) , PdfRGBColor(Color.get_Blue()))

# Draw a filled rectangle using the radial gradient brush
page.Canvas.DrawRectangle(radialGradientBrush, RectangleF(PointF(380.0, 30.0), SizeF(150.0, 120.0)))

# Restore the previously saved drawing state
page.Canvas.Restore(state)

# Save the document to a file
doc.SaveToFile("Drawing Rectangle Shapes.pdf")

# Close the document and release resources
doc.Close()
doc.Dispose()

Python: Draw Shapes in PDF Documents

Draw Ellipses in PDF Documents in Python

Spire.PDF for Python provides the PdfPageBase.Canvas.DrawEllipse() method to draw elliptical shapes. You can use either a pen or a fill brush to draw ellipses in different styles. Here are the detailed steps for drawing an ellipse:

  • Create a PdfDocument object.
  • Use the PdfDocument.Pages.Add() method to add a blank page to the PDF document.
  • Use the PdfPageBase.Canvas.Save() method to save the current drawing state for later restoration.
  • Create a PdfPen object.
  • Use the PdfPageBase.Canvas.DrawEllipse() method with the pen object to draw the outline of an ellipse, specifying the position and size of the ellipse.
  • Create a PdfSolidBrush object.
  • Use the PdfPageBase.Canvas.DrawEllipse() method with the fill brush object to draw a filled ellipse, specifying the position and size of the ellipse.
  • Use the PdfPageBase.Canvas.Restore(state) method to restore the previously saved drawing state.
  • Use the PdfDocument.SaveToFile() method to save the document to a file.
  • Python
from spire.pdf.common import *
from spire.pdf import *

# Create PDF Document Object
doc = PdfDocument()

# Add a Page
page = doc.Pages.Add()

# Save the current drawing state
state = page.Canvas.Save()

# Create a Pen object
pen = PdfPens.get_CadetBlue()

# Draw the outline of an ellipse shape
page.Canvas.DrawEllipse(pen, 50.0, 30.0, 120.0, 100.0)

# Create a Brush object for filling
brush = PdfSolidBrush(PdfRGBColor(Color.get_CadetBlue()))

# Draw the filled ellipse shape
page.Canvas.DrawEllipse(brush, 180.0, 30.0, 120.0, 100.0)

# Restore the previously saved drawing state
page.Canvas.Restore(state)

# Save the document to a file
doc.SaveToFile("Drawing Ellipse Shape.pdf")

# Close the document and release resources
doc.Close()
doc.Dispose()

Python: Draw Shapes in PDF Documents

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 59