How to Extract Text & Images Easily from MS Office Files

We may come across the need to extract images or text from a MS Word or MS Powerpoint file. Usually this may include manual copying and pasting, one page at a time, and with mega-large files, this is going to take quite a bit of time.

Well, we have a simple trick to help you extract images and text from files of the new format ie DOCX, PPTX, XLSX whereas with files of the older format ie DOC, PPT, XLS, all you need is a free software to help you quickly and easily extract images.

Note: For the purpose of demonstrating this post, we will be using only an MS Word file. The process is the same for MS Powerpoint and MS Excel files.

Here’s what this article covers:

  1. How to extract images & text from DOCX, PPTX, XLXS files
  2. How to extract images from a single DOC, PPT or XLS file
  3. How to extract images from multiple DOC, PPT or XLS files
  4. How to extract images with “Save as Web Page” method
  5. How to extract plain text instead of XML

How to Extract Images & Text from DOCX, PPTX, XLXS Files

Before following the steps, open the folder containing your files. click Organize > Folder and Search Options > View and uncheck Hide extensions for known file types. Now, you can see the file extension with each filename.

  1. Locate and select the file you want to extract images and text from (note: it is better to make a copy of said file). In this example, our target file is named Sample File.docx.

    Locate File
  2. Press F2 to rename the file and replace the extension name with .zip.

    Rename File
  3. A warning will be shown to confirm the change of the file extension. Click Yes.

    Create Zip File
  4. Right click on the ZIP file and click on Extract files.

    Extract Data
  5. Locate and open the folder containing the extracted data and then open the word.

    Locate Word Folder
  6. In it you will see a few folders and XML files. In the media folder you will find the extracted images. For the exracted text, open the document.xml file with notepad or XML Notepad.

    Locate Media Folder

Here’s what you will find in the media folder.

Media Folder

How to Extract Images from a Single DOC, PPT or XLS File

If you want to extract images from MS office files with older formats, the above method won’t work with the images. You need a free tool called Office Image Extraction Wizard for this purpose. The tool works with MS Office files as far back as 2012 and it works with one or multiple MS Office files in one go.

  1. Download and install Office Image Extraction Wizard.

    Welcome Screen
  2. Choose the document you want to extract images from (for this example, we’re doing it to a folder I named Ch1.doc), and select the output folder. You can opt to have a folder created to house all your output images by ticking the option Create a folder here. Once you are done, click Next.

    Input File & Output Folder
  3. Click Start to begin the process.

    Ready to Start
  4. Once the image extraction process is finished, click on Click here to open destination folder and it will open the output folder.

    Finish Image Extraction
  5. As you can see below, the program has created a Ch1 folder.

    Output Folder
  6. Inside the folder are the extracted images.

    Extracted Images in Output Folder

How to Extract Images from Multiple DOC, PPT or XLS Files

  1. For extracting images from multiple files of the DOC, PPT or XLS formats, tick the Batch mode option found at the bottom left.

    Input File & Output Folder
  2. Click on Add Files and then select the files you want to extract images from. Hold the Ctrl button to select multiple files in one go. After selecting the files, click Next.

    Batch Mode
  3. Click Start.

    Ready to Start
  4. When the process is completed, locate and open the output folder. Here, you will see two folders with the original filenames. Open these folders to see the extracted images from their original MS Office files.

    Locate Multiple Output Folders
    Open Folder 1
    Open Folder 2

How to Extract Images with "Save as Web Page" Method

There is another method that will work with both newer and older MS Office files.

  1. Open the DOCX or XLSX file and click on File > Save As > Computer > Browser and save file as Web Page.

    Save As File
    Save as Webpage
  2. Locate the folder with the filename you saved the Web Page in. Here, you will see all the images extracted from the file.

    Html & Image Folder
    Extracted Images

How to Extract Plaintext Instead of XML

  1. Open the DOCX file and click on File > Save As > Computer > Browser. Choose to save file as Plain Text (for XLSX files, save it as Text (Tab delimited)).

    Save As File
    Save As Text File
  2. Locate and open the text file with the name you have used to save it. This text file will contain only the text from your original file without any formatting.

    Locating Text File
    Opening Extracted Text

If you know any other method or tool to extract images from MS Office files, please mention in the comments section.

Source: Hongkiat

(1032 Posts)

Leave a Reply