After extracting your blog posts and pages from the WordPress database, you have a file that uses XML formatting to describe every element exported. Needless to say, you need to convert this file to a simple text format, so that you can copy and paste desired chapters or sections into your working draft.
If you are technically proficient—or infinitely patient—you can use a general-purpose XML editor to view and retrieve your blog content. For Windows users, XML Notepad 2007 provides a familiar view of your blog content: a tree structure consisting of nodes and sub-nodes. Working with this editor directly, you can retrieve each post that you want to add to your eBook. Linux users can probably find support for XML editing with the Emacs editor. Here is one(very technical) example: How to Use Emacs for XML Editing.
If you take the time to create a special stylesheet, you can create a nice, readable list of posts, complete with links and images. For a great example of this, check out the efforts of Sacha Chua, a who created an XSL Stylesheet to build one version of her blog posts from 2008. Do note, however, that she had to create a second stylesheet for the main list of posts.
You can simplify the retrieval process by using a specialized program called Blogmogrifier. It’s a Windows-based desktop application, so your computer will need to be running Windows Vista, 7 or 8.
Here is a walk-through that shows you how to import one of your downloaded XML files, select categories and tags and output a text file with matching posts and pages. The file shown in the steps is from Sharon Hurley Hall‘s Get Paid to Write Online blog.
To avoid confusion, please remember the following:
- Blogmogrifier is just one tool inside of a larger program called Retrievem
- Retrievem is just one of the many programs developed using a framework called ParserMonster
- For simplicity, all ParserMonster programs display the ParserMonster shield logo, but use their own names
- This means that you will be downloading and launching Retrievem.exe to get to Blogmogrifier
This version is currently in beta. It is a free download. You must run it on a Windows PC that uses XP, Vista, Windows 7 or Windows 8.
Click image to download Blogmogrifier
The link takes you to Copy.com, where you can simply click the Save button to bring up the dialog box shown below:
Be sure to click download it to your computer!
Click download it to your computer. The downloaded file is named Retrievem 3.exe. Put that file into a folder of your choice, as long as it is writable. Windows 7 and 8 do not allow applications to write into the \Programs (x86) folder. However, your \Documents folders is acceptable.
Blogmogrifier is inside of Retrievem.exe
Step Two: Start Blogmogrifier
Double-click the Retrievem 3.exe icon. Retrievem is a portable application that does not require installation. You should see a splash screen for a few seconds before the Retrievem Dashboard appears.
The Retrievem dashboard
In the task list window, you’ll see two or more task icons. Click the one that looks like a purple ray gun, marked 1
. This will select Blogmogrifier as the active task.
You have two ways to “prep” Blogmogrifier. The first way is to use the dashboard to specify where the XML files are located. The second way is to do that after clicking the Run Task button. The choice is yours; however, the first method has the advantage of remembering your settings the next time you run the application. So, let’s set things up from the dashboard. (I’ll briefly mention how to accomplish the same thing without using the dashboard.)
Step Three: Drag and drop XML Folder onto Dashboard
Using Windows Explorer, locate the folder where the WordPress XML files are located. Drag the entire folder onto the box marked 2. It is important that you drag folders only. Files will not be detected if dropped directly, even though they appear on the dashboard.
If the folder has any files, they will be listed in the box marked 3. In addition, a list of file types appears in the small box marked 4. Make sure these areas show the file(s) you want to import.
If you dragged the wrong folder, just click the red “X” to the right of the Paste Clipboard button and try again.
Step Four: Drag Output Folder onto Dashboard
By default, tasks send their output to the same folder where Retrievem.exe is stored. If you want to use a different folder, drag it onto the long, narrow box marked 5. Alternatively, you can use the Browse … button. However, the file dialog’s default behavior won’t let you choose a folder above your \Documents folder.
Step Four: Run the Blogmogrifier Task
Click the Run Task button, marked 6. This saves your folder choices and displays the Blogmogrifier form.
Blogmogrifier XML list
If you selected a valid folder in Step Three, this is the screen you will see when you first open Blogmogrifier. Click on the file you wish to import into Blogmogrifier. Then click the Import tab.
If you skipped Step Three or didn’t pick a valid folder, you’ll be forced to use drag and drop to select a single file (not folders!), as shown below:
Drag and drop a single file
In this case, after you drop a file with the .XML extension, Blogmogrifier automatically switches to the Import tab. By the way, the Drag and Drop tab is not smart enough to tell the difference between a WordPress XML file or any other XML file. It merely examines the file type and either switches the tabs or displays an error if an XML file was not dropped.
Step Five: Import File
On the Import tab, you will see what, if anything, was imported successfully. Click on the different options to view how many of each post type were imported. (In Sharon’s case, she chose to export only posts from her blog.)
The hyperlinks don’t work inside the Import tab. If you wish to review a link, select it like you would do in a text editor, copy it using CTRL-C and paste it into your web browser with CTRL-V.
Once you are satisfied that the content has been imported, click the Export tab.
The Import tab will be blank with greyed out controls if you attempted to import a non-WordPress XML file. A terse message alerts you to the problem:
Blogmogrifier import error
Step Six: Include Categories and Tags
If your XML file contains information about categories and tags, they will be displayed on the Export tab. You can click either or both of the green Include All buttons to toggle the selection of keywords. Whenever you do this, the button will change to red and display Include None, as shown in the image below:
Blogmogrifier categories and tags
These two buttons and the four possible choices are handy for when you want all or most of the keywords in a list. Otherwise, you can just click on the desired checkboxes, like this example:
Let’s get the “Best of GPTWO!”
Step Seven: Export Content to Text File!
Click the Export Text button. The bottom of the form displays the path to the output file. This is a plain .TXT file, so you can open it in your favorite editor. The screen shots below show the first few lines of the output, where you can see helpful information such as a list of the keywords. You will see that the source file is actually a temporary copy of your XML file. These are safe to delete.
Sharon’s eBook (Raw Content)
A couple more of Sharon’s posts
Click the End Task tab to close the Blogmogrifier form and return to the Retrievem dashboard.
Tips and Limitations
If you select all categories and / or all tags, every single keyword will be included in the output’s header.
Be sure to rename your output files if you intend to export different sections separately.
The Help tab has an explanation for each tab, as well as a few links back to this blog.
Version 3.12 of Retrievem offers Blogmogrifier as a very basic tool for retrieving your content from the WordPress XML file. A few enhancements, tweaks and new tools are planned. These changes may occur rapidly, depending on your feedback.
Each version of Retrievem has an expiration date. This limits the number of outdated copies in operation. Version 3.12 does not have a simple way for you get the next version. The next version should address that. You’ll be able to get that by visiting this post after March 31, 2015 and clicking any of the download links, including this one.
In order to keep the tutorial as concise as possible, I’ve ignored much of the dashboard. I’m building an online resource to explain ParserMonster in general. Learn more about the dashboard and the rest of The ParserMonster Project.
Original Method: WordPress Backup Files
(While writing the last post in this list, I discovered a better way...)
New Method: WordPress Export Tool
Retrieve Selected Posts
Raw Content Retrieval