Extract Text from Powerpoint

2

So, I have found this nice Script here: http://www.pptfaq.com/FAQ00274_Export_Text_to_a_text_file-_extract_text_from_PowerPoint_-Mac_or_PC-.htm (I'm using the second one)

The import part is this:

 For Each oShp In oSld.Shapes                'Loop thru each shape on slide
      'Check to see if shape has a text frame and text
      If oShp.HasTextFrame And oShp.TextFrame.HasText Then
        If oShp.Type = msoPlaceholder Then
            Select Case oShp.PlaceholderFormat.Type
                Case Is = ppPlaceholderTitle, ppPlaceholderCenterTitle
                    Print #iFile, "Title:" & vbTab & oShp.TextFrame.TextRange
                Case Is = ppPlaceholderBody
                    Print #iFile, "Body:" & vbTab & oShp.TextFrame.TextRange
                Case Is = ppPlaceholderSubtitle
                    Print #iFile, "SubTitle:" & vbTab & oShp.TextFrame.TextRange
                Case Else
                    Print #iFile, "Other Placeholder:" & vbTab & oShp.TextFrame.TextRange
            End Select
        Else
            Print #iFile, vbTab & oShp.TextFrame.TextRange
        End If  ' msoPlaceholder
      Else  ' it doesn't have a textframe - it might be a group that contains text so:
        If oShp.Type = msoGroup Then
            sTempString = TextFromGroupShape(oShp)
            If Len(sTempString) > 0 Then
                Print #iFile, sTempString
            End If
        End If
      End If    ' Has text frame/Has text

    Next oShp

I have already modified it a bit, so the output file does not contain "Title", "Other Placeholder" and such text and neither inserts tabs ("vbTab"). However, it puts each line (or parapgraph) into a new line in the outfile.

The Question: How can I tell the script to dump all 'content' from a 'slide'/'body' into the same line/cell?

I noticed that this script (and neither this http://www.pptfaq.com/FAQ00332_Export_Slide_Number_and_Title_Text_to_a_text_file.htm) exhibits this behavior for titles, only for "body" or "ppPlaceholderBody".

I have no clue why that is or what the difference is. Can it simply not tell the difference between two lines or bulletins even in the same shape/box? My goal is to have a consistent line/cell numbering over multiple .ppts so that one added line in slide 2 does not result content from slide 5 to be shifted to the next line.

Thank you for your help!

user991558

Posted 2019-03-27T16:58:35.377

Reputation: 31

Answers

1

My PowerPoint installation is down at the moment, so this is untested. But...

You just need to create a string variable and add to it, then when you're finished with the slide, copy that string to an Excel cell.

Dim slideText As String
For Each oShp In oSld.Shapes                 'Loop thru each shape on slide
    If Len(slideText) > 0 Then
        '--- strip the unneeded trailing CRLF
        slideText = Left$(slideText, Len(slideText) - 2)
        '--- now copy the string to the appropriate cell in Excel
    Else
        '--- clear the string for the next slide
        slideText = vbNullString
    End If

    'Check to see if shape has a text frame and text
    If oShp.HasTextFrame Then
        If oShp.TextFrame.HasText Then
            If oShp.Type = msoPlaceholder Then
                Select Case oShp.PlaceholderFormat.Type
                    Case Is = ppPlaceholderTitle, ppPlaceholderCenterTitle
                        slideText = slideText & "Title:" & vbTab & _
                                    oShp.TextFrame.TextRange & vbCrLf
                    Case Is = ppPlaceholderBody
                        slideText = slideText & "Body:" & vbTab & _
                                    oShp.TextFrame.TextRange & vbCrLf
                    Case Is = ppPlaceholderSubtitle
                        slideText = slideText & "SubTitle:" & vbTab & _
                                    oShp.TextFrame.TextRange & vbCrLf
                    Case Else
                        slideText = slideText & "Other Placeholder:" & _
                                    vbTab & oShp.TextFrame.TextRange & vbCrLf
                End Select
            Else
                slideText = slideText & vbTab & oShp.TextFrame.TextRange
            End If                           ' msoPlaceholder
        End If
    Else
        ' it doesn't have a textframe - it might be a group that contains text so:
        If oShp.Type = msoGroup Then
            sTempString = TextFromGroupShape(oShp)
            If Len(sTempString) > 0 Then
                slideText = slideText & sTempString & vbCrLf
            End If
        End If
    End If                                   ' Has text frame/Has text
Next oShp

'--- catch the text on the last slide here
If Len(slideText) > 0 Then
    '--- strip the unneeded trailing CRLF
    slideText = Left$(slideText, Len(slideText) - 2)
    '--- now copy the string to the appropriate cell in Excel
End If

Of course, you're doing this loop for each slide.

PeterT

Posted 2019-03-27T16:58:35.377

Reputation: 247

Good one, @PeterT. It might be necessary to get a little trickier to get rid of line endings, eg by using Replace(oShp.TextFrame.TextRange,VBCRLF,"") and possibly the same for Chr$(11), which PPT uses for line endings in some placeholder types. – Steve Rindsberg – 2019-03-28T18:56:11.257

Hi Peter, Thank you so much for your solution. As a preface, everything I know about VB and Macros is from examples online.

  1. I don't really understand how
 If Len(slideText) > 0 Then
        '--- strip the unneeded trailing CRLF

removes CRLF... but okay. However your solutions gives me an empty output file.. I noticed how you removed the print part here

Case Is = ppPlaceholderTitle, ppPlaceholderCenterTitle
                    Print #iFile, "Title:" & vbTab & oShp.TextFrame.TextRange
 – user991558  – 2019-04-01T10:13:43.620

(accidentally hit Enter) compared to: ~~~ slideText = slideText & "Title:" & vbTab & oShp.TextFrame.TextRange & vbCrLf ~~~

But simply adding it back in does not seem to be the way to go, it gives an output file with "WRONG" cells. Thanks again. – user991558 – 2019-04-01T10:22:51.400

okay, hold on, I get it know. I understand how Len + Left -2 works now. But the " '--- now copy the string to the appropriate cell in Excel" / Print #iFile is still missing / not working, right? – user991558 – 2019-04-01T10:39:08.677

0

I don't think this helps, but this: https://stackoverflow.com/questions/45468824/printing-from-ppt-vba-to-an-excel-spreadsheet tries something similar using Lbound and Ubound to print to specific cells.

As long as the cells remain consistent over multiple ppt/xls, I don't really where the strings go...

(Though it also selects a specific xls file, while I want to create a new one for each print out, but that should no be a problem with the code I already have which either creates a specified file or uses the filename from the ppt.)

user991558

Posted 2019-03-27T16:58:35.377

Reputation: 31