Spire.Office Knowledgebase Page 56 | E-iceblue

Directly extracting text has emerged as a crucial method for obtaining textual information from information-dense PowerPoint presentations. By utilizing Python programs, users can conveniently and quickly access the content within slides, enabling efficient collection of information and further data processing. This article shows how to use Spire.Presentation for Python to extract text from PowerPoint presentations, including text in slides, speaker notes, and comments.

Install Spire.Presentation for Python

This scenario requires Spire.Presentation for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip commands.

pip install Spire.Presentation

If you are unsure how to install, please refer to: How to Install Spire.Presentation for Python on Windows

Extract Text from Presentation Slides with Python

The text within PowerPoint presentation slides is placed within shapes. Therefore, developers can extract the text from the presentation by accessing all the shapes within each slide and extracting the text contained within them. The detailed steps are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through the slides in the presentation and then iterate through the shapes in each slide.
  • Check if a shape is an IAutoShape instance. If it is, get the paragraphs in the shape through IAutoShape.TextFrame.Paragraphs property and then get the text in the paragraphs through Paragraph.Text property.
  • Write the slide text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

text = []
# Loop through each slide
for slide in pres.Slides:
    # Loop through each shape
    for shape in slide.Shapes:
        # Check if the shape is an IAutoShape instance
        if isinstance(shape, IAutoShape):
            # Extract the text from the shape
            for paragraph in (shape if isinstance(shape, IAutoShape) else None).TextFrame.Paragraphs:
                text.append(paragraph.Text)

# Write the text to a text file
f = open("output/SlideText.txt","w", encoding = 'utf-8')
for s in text:
    f.write(s + "\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Extract Text from Speaker Notes with Python

Speaker notes are additional information that provides guidance to the presenter and are not visible to the audience. The text in speaker notes of each slide is stored in the notes slide and developers can extract the text through NotesSlide.NotesTextFrame.Text property. The detailed steps for extracting text in speaker notes are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through each slide.
  • Get the note slide through ISlide.NotesSlide property and retrieve the text through NotesSlide.NotesTextFrame.Text property.
  • Write the speaker note text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

list = []
# Iterate through each slide
for slide in pres.Slides:
    # Get the notes slide
    notesSlide = slide.NotesSlide
    # Get the notes
    notes = notesSlide.NotesTextFrame.Text
    list.append(notes)

# Write the notes to a text file
f = open("output/SpeakerNoteText.txt", "w", encoding="utf-8")
for note in list:
    f.write(note)
    f.write("\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Extract Text from Presentation Comments with Python

With Spire.Presentation for Python, developers can also extract the text from comments in PowerPoint presentations by getting comments from slides with ISlide.Comments property and retrieving text from comments with Comment.Text property. The detailed steps are as follows:

  • Create an object of Presentation class and load PowerPoint presentation using Presentation.LoadFromFile() method.
  • Iterate through each slide and get the comment from each slide through ISlide.Comments property.
  • Iterate through each comment and retrieve the text from each comment through Comment.Text property.
  • Write the comment text to a text file.
  • Python
from spire.presentation import *
from spire.presentation.common import *

# Create an object of Presentation class
pres = Presentation()

# Load a PowerPoint presentation
pres.LoadFromFile("Sample.pptx")

list = []
# Iterate through all slides
for slide in pres.Slides:
    # Get all comments from the slide
    comments = slide.Comments
    # Iterate through the comments
    for comment in comments:
        # Get the comment text
        commentText = comment.Text
        list.append(commentText)

# Write the comments to a text file
f = open("output/CommentText.txt", "w", encoding="utf-8")
for i in range(len(list)):
    f.write(list[i] + "\n")
f.close()
pres.Dispose()

Python: Extract Text from PowerPoint Presentations

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Moving and deleting worksheets in Excel are essential operations that allow you to organize and manage your workbook efficiently. Moving worksheets enables you to adjust the order of worksheets to match your specific needs or bring related information together. While deleting worksheets helps you eliminate unwanted or redundant sheets, creating a cleaner and more organized workspace. In this article, we will demonstrate how to move and delete worksheets in Excel in Python using Spire.XLS for Python.

Install Spire.XLS for Python

This scenario requires Spire.XLS for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.XLS

If you are unsure how to install, please refer to this tutorial: How to Install Spire.XLS for Python on Windows

Move a Worksheet in Excel in Python

You can easily move a worksheet in an Excel file to another position by using the Worksheet.MoveWorksheet() method provided by Spire.XLS for Python. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using the Workbook.LoadFromFile() method.
  • Get a specific worksheet in the file using the Workbook.Worksheet[] property.
  • Move the worksheet to another position in the file using the Worksheet.MoveWorksheet() method.
  • Save the result file using the Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create an object of the Workbook class
workbook = Workbook()
# Load a sample Excel file
workbook.LoadFromFile("Sample.xlsx")

# Get a specific worksheet in the file by its index
sheet = workbook.Worksheets[0]
# Or get a specific worksheet in the file by its name
# sheet = workbook.Worksheets["Sheet1"]

# Move the worksheet to the 3rd position in the file
sheet.MoveWorksheet(2)

# Save the result file
workbook.SaveToFile("MoveWorksheet.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Move or Delete Worksheets in Excel

Delete a Worksheet in Excel in Python

You can delete a specific worksheet from an Excel file by using the Workbook.Worksheets.RemoveAt() or Workbook.Worksheets.Remove() method provided by Spire.XLS for Python. The detailed steps are as follows.

  • Create an object of the Workbook class.
  • Load an Excel file using the Workbook.LoadFromFile() method.
  • Remove a specific worksheet from the file using the Workbook.Worksheets.RemoveAt() or Workbook.Worksheets.Remove() method.
  • Save the result file using the Workbook.SaveToFile() method.
  • Python
from spire.xls import *
from spire.xls.common import *

# Create an object of the Workbook class
workbook = Workbook()
# Load a sample Excel file
workbook.LoadFromFile("Sample.xlsx")

# Remove a specific worksheet in the file by its index
workbook.Worksheets.RemoveAt(0)

# Or get a specific worksheet in the file by its name and then remove it
# worksheet = workbook.Worksheets["Sheet1"]
# workbook.Worksheets.Remove(worksheet)

# Save the result file
workbook.SaveToFile("DeleteWorksheet.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Python: Move or Delete Worksheets in Excel

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Comparing two Word documents for differences is a crucial task when reviewing changes, ensuring accuracy, and collaborating on content. This process allows you to identify additions, deletions, and modifications made between different document iterations. By comparing versions, you can efficiently track alterations, verify updates, and maintain document integrity. In this article, you will learn how to compare two versions of a Word document in Python using the Spire.Doc for Python library.

Install Spire.Doc for Python

This scenario requires Spire.Doc for Python and plum-dispatch v1.7.4. They can be easily installed in your Windows through the following pip command.

pip install Spire.Doc

If you are unsure how to install, please refer to: How to Install Spire.Doc for Python on Windows

Compare Two Versions of a Word Document in Python

MS Word also offers a "Compare" feature that allows you to directly compare two versions of a document. This feature generates a new document that highlights the differences between the two versions.

To achieve similar results using Spire.Doc for Python, load the original and revised versions into two separate Document objects. Then, use the Compare() method to compare the revised version against the original. Finally, save the comparative document, which highlights the alterations, using the SaveToFile() method.

The steps to compare two version of a Word document using Python are as follows.

  • Load the first document (original version) while initializing the Document object.
  • Load the second document (revised version) while initializing the Document object.
  • Call Compare() method of the first Document object to compare the revised version against the original version.
  • Save the comparison results in a new Word document.
  • Python
from spire.doc import *
from spire.doc.common import *

# Load the first document while initializing the Document object
firstDoc = Document("C:\\Users\\Administrator\\Desktop\\Original.docx")

# Load the second document while initializing the Document object
secondDoc = Document("C:\\Users\\Administrator\\Desktop\\Revised.docx")

# Compare two documents
firstDoc.Compare(secondDoc, "E-ICEBLUE")

# Save the comparison results in a new document
firstDoc.SaveToFile("Output/Differences.docx", FileFormat.Docx2016)

# Dispose resources
firstDoc.Dispose()
secondDoc.Dispose()

Python: Compare Two Versions of a Word Document

Compare Two Versions of a Word Document While Ignoring Formatting in Python

Comparing two versions of a Word document while ignoring formatting can be useful when you want to focus solely on the textual changes and disregard any formatting modifications.

To customize the comparison options in Spire.Doc for Python, use the CompareOptions class. If you want to exclude formatting from the comparison process, you can set the IgnoreFormatting property of the CompareOptions object to True. When you call the Compare() method, simply pass the CompareOptions object as an argument to achieve the desired comparison behavior.

The following are the steps to compare two versions of a Word document while ignoring formatting using Python.

  • Load the first document (original version) while initializing the Document object.
  • Load the second document (revised version) while initializing the Document object.
  • Create a CompareOptions object and set its IgnoreFormatting property to True.
  • Call Compare() method of the first Document object, passing the CompareOptions object as a parameter, to compare the revision against the original while ignoring formatting.
  • Save the comparison results in a new Word document.
  • Python
from spire.doc import *
from spire.doc.common import *

# Load the first document while initializing the Document object
firstDoc = Document("C:\\Users\\Administrator\\Desktop\\Original.docx")

# Load the second document while initializing the Document object
secondDoc = Document("C:\\Users\\Administrator\\Desktop\\Revised.docx")

# Set compare option to ignore formatting changes
compareOptions = CompareOptions()
compareOptions.IgnoreFormatting = True

# Compare the two Word documents with options
firstDoc.Compare(secondDoc, "E-ICEBLUE", compareOptions)

# Save the comparison results in a new document
firstDoc.SaveToFile("Output/DifferencesWithoutFormattingChanges.docx", FileFormat.Docx2016)

# Dispose resources
firstDoc.Dispose()
secondDoc.Dispose()

Python: Compare Two Versions of a Word Document

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

page 56