Using the WordPress Exporter

The built-in WordPress Exporter utility is a great tool for retrieving some or all of my blog content. Finally, I can describe a process that is potentially useful to others.

First of all, without getting too technical, the WordPress Exporter creates an RSS feed of the blog. The developers chose XML, a file format that simplifies the monumental task of describing a chunk of content – namely a blog post or blog page. If you use an RSS feed reader, you will appreciate the care taken to preserve the blog post layout.

I decided to start small. I set up the exporter to retrieve a single category of posts. I picked a category that had exactly one post. If I could manage the extraction of a single post, I figured that scaling up would be a simple matter of repetition.

The WordPress Exporter has three main choices for what to include in the export file:

  • All Content
  • Posts
  • Pages

Each choice reveals a second set of options that can be used to limit the amount of content exported. When I chose Posts, I saw these options:

WordPress Export Tool
WordPress Export offers many options

The ability to fine-tune the export makes this a great tool for a Blog to eBook project. With a bit of planning ahead of time, I imagine that I would save a lot of time by not having to sift through irrelevant posts.

When I clicked the Download Export File button, I received a tiny, 8KB file with my one post. The next step is to extract that post and any other relevant information. Since my goal is to make this process useful to others, I will try to be as flexible as possible – probably grabbing more data than necessary. Stay tuned.

Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

Accessing the WordPress Database Posts Table

When I first wrote about converting some of my blog posts to an e-book, I hadn’t planned on repeating the process. Since I’m including a tutorial this time around, I will find out if my biggest concern is valid:

The most important things I had to know were the order and type of data used to store a blog post.
This requirement is the main drawback to working directly with a file. If a future version of WordPress
changes the database structure, my parser would have to be updated, as well.

from Extracting Posts From WordPress Files

Although I used iThemes Security Plugin for the tutorial, I installed WP DB Backup so that I could compare its backup file to the one I created for my e-book.

WP DB Backup plugin
Visit WP DB Backup plugin

Regrouping

I quickly discovered two things: the table structure had indeed changed and, WP DB Backup and iThemes Security both extracted the same fields from the post table. This meant that I could not use my old parsing pattern. On the other hand, at least I stood a chance to make a pattern that would work, regardless of the plugin used to create the backup file.

Clearly, I needed to standardize my extraction procedure; otherwise, this project would be of no use to anyone else. For the morbidly curious, here is a snapshot of the two table structures:

Changed Tables
Don’t count on table columns staying the same!

As I was putting this together, I realized that I didn’t have a clue about why the structures were different. Rather than speculate, I hunted for the answer and found it deep within the WordPress Codex. If you examine the Changelog for the Post Table, you’ll noticed that the category field was dropped in version 2.8:

WordPress Codex
The Posts table changes frequently

It is one thing to account for table structure changes. It is quite another thing to map the changes to a complex pattern. In fact, doing so might not be the best option. I came across an interesting post about importing posts and pages from one website to another. This gave me a new direction to explore.

Using the WordPress Exporter

I played around with the tool provided by WordPress. This turns out to be a simple XML file! Of course, simple is relative. The exporter has three options: all content, posts or pages. The good news is that if you don’t want to bother with adding pages to your e-book, you could use the posts option.

At this point, I was ready to abandon the old pattern in favor of parsing the xml file. After all, the XML file is much cleaner than the raw data from the database. I would need to extract the title, publication date and content. I found the specific XML tags that identified these elements:

  • <title> and </title>
  • <pubDate> and </pubDate>
  • <content:encoded><![CDATA[ and ]]></content:encoded>

That is topic of the next post.

Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

How to Backup All Your WordPress Posts

Project update

Aug. 28, 2014:

The WordPress Exporter provides a better way to extract blog posts. I will use it for the rest of this project.


As part of the Blog-to-eBook Project, I will present a step-by-step procedure for acquiring your blog posts and pages. You will also gain the benefit of having a backup plan for your WordPress blog.

Plugins for backing up WordPress have many features. At the most basic, a good backup plugin will export your posts from the WordPress database on your web host. Once the posts have been extracted, you can save, download, email or copy them to a cloud service like Dropbox. As long as you can access the backup files and copy them to your local hard drive, you can use whatever plugin and storage scheme you’d like. (For the sake of clarity, I use the term posts only. WordPress considers pages and attachments to be posts as well, and they all get backed up. Be aware that only the links to attachments are backed up in the posts table. Depending on your chosen plugin, the actual attachments may be added to the backup file.)

For this tutorial, I used iThemes Security, a great plugin for securing and backing up WordPress installations. I set the backups to be emailed to me, so that I can easily download the attachments. (Plus, I don’t want to use up server space.)

To start, you have to install your chosen plugin. Once you have activated it, find the setting that allows you to configure the backups.

How to Backup WordPress Posts
Visit iThemes Security plugin page

Weirdly, iThemes Security Backups tab emphasizes the Create Database Backup button when, in fact, your first step is to click the Adjust Backup Settings link.

iThemes Security backup tab
Do not click the button…yet

On the massive settings tab, the backup settings are about midway down. You have just three Backup Methods from which to choose.
I selected Email Only. If you choose Save Locally Only, you’ll have to transfer the file via FTP. This might actually be necessary if your email chokes on huge attachments.

You should check the box for Zip Database Backups. Compressing the original file really reduces the size of the zip file. (See final image)

iThemes backup settings
Finally, set up scheduled backups. It doesn’t matter for this project but, if you are blogging actively, you may as well reap the benefits of current backups.

Enable scheduled backups
You may as well enable scheduled backups

Back on the main iThemes Security Backups tab, click the Create Database Backup to generate a current backup. Get that file onto your hard drive so that you can begin the next step.

Now you can click the button
Now you can click the button


Here is the downloaded attachment. I opened it in 7-zip to show you the compression – the zip file is just over 20% of the original file’s size! (1.7 MB vs 375 KB attachment)

7-zip Info screen
Nearly 80% compression ratio

Project update

Aug. 28, 2014:

The WordPress Exporter provides a better way to extract blog posts. I will use it for the rest of this project.


Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

Extracting Posts from WordPress Backup Files

The parsing and extracting portion of my Blog to eBook project was a fun, one-time exercise in reading the WordPress database backup file. Even though the database can be read using a powerful tool like phpMyAdmin, I took advantage of the fact that WordPress can make a plain text file when it creates the backup. Besides, I wasn’t about to tinker around with the original database!

The key to parsing and extracting my blog posts was deciphering the backup file. Every MySQL database can export some or all of its records into a single file. Records, such as details about blog posts and pages, are stored in tables. Each table has a structure, basically a list that describes how to store each detail. The most important things I had to know were the order and type of data used to store a blog post.

This requirement is the main drawback to working directly with a file. If a future version of WordPress changes the database structure, my parser would have to be updated, as well. (In the how-to portion of this project, I’ll discuss ways to mitigate this.)

RegexBuddy

WordPress stores posts, pages and attachments in the same database table. I used a program called RegexBuddy to build two pattern-matching instructions. The first pattern matched all three types of entries. Attachments include images, video, spreadsheets and other documents. Since those were probably not going to be in my ebook, I used a second pattern to match just the attachments.

By running both patterns, I was able to extract the blog posts from the backup file. I pasted the extracted information into an Excel spreadsheet. Next, I compared the list of attachments to the list of everything and deleted the spreadsheet rows that contained attachments. Then, I sorted the records by date and went through each row, cherry-picking the posts that I wanted to include in my ebook. The last thing I had to do was to clean up the actual posts, by removing HTML tags, web addresses and embedded scripts.

I’ll explain the cleanup process next time. Here is a summary, in pictures (I chose Excel rather than RegexBuddy to display the second pattern, used to get rid of attachments. Otherwise, it won’t mean anything unless you understand regular expressions):

Parsing and extracting blog posts from a WordPress backup
(click for full size)

Project update

Aug. 28, 2014:

The WordPress Exporter provides a better way to extract blog posts. I will use it for the rest of this project.


Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

Working with Extracted WordPress Blog Posts

The last thing I had to do was to clean up the actual posts, by removing HTML tags, web addresses and embedded scripts. I used Retrievem, software that I developed for just such tasks.

Gory Blog Post
Extracted Post BEFORE Cleanup

The raw text extracted from the WordPress database is practically unreadable. Line break markers, hyperlinks and HTML formatting tags had to be removed or replaced with their visual equivalents.

Not-so-Gory Blog Post
Extracted Post AFTER Cleanup

As part of the cleanup, I added some of my own markers, tags in brackets that identified each post. A combination of Word documents and Spreadsheet references simplified the final task of choosing the posts I wanted in my e-book.

Al Gore Blog Post
Extracted Post in e-Book

Project update

Aug. 28, 2014:

The WordPress Exporter provides a better way to extract blog posts. I will use it for the rest of this project.


Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

One Way to Convert Your WordPress Blog to an e-Book

My blog to ebook project is going to be an exercise in parsing and extracting. I briefly considered using Anthologize, a WordPress plugin that many people seem to love. Personally, I want total control over the entire process. So, I’ll begin by extracting all posts, sorting them and picking out the ones that will be added to the ebooks.

The WordPress database tables that store your posts, pages, comments and other data can be exported into a simple text file. In fact, that’s what happens when you perform a backup, using a plugin such as WordPress Database Backup (WPDB) or iThemes Security.

I am taking full advantage of this. I instructed WPDB to email my backups to my Gmail account. I can save any one of them to my hard drive and unzip it into a folder. I use 7-zip, a free, open-source program that creates and manages archive files. After a bit of parsing and extracting, I end up with a spreadsheet of post titles, dates and actual text.

I’ll explain the parsing and extracting, next time. For now, here is a montage of the action:

Blog to eBook
From Database to Spreadsheet to Word Document (click for full size)

Project update

Aug. 28, 2014:

The WordPress Exporter provides a better way to extract blog posts. I will use it for the rest of this project.


Fieldnotes

Original Method: WordPress Backup Files

Convert WordPress Blog to e-Book
Backup WordPress Posts
Extracting Posts from WordPress Backup Files
Working with Extracted WordPress Blog Posts
Accessing the WordPress Database Posts Table
(While writing the last post in this list, I discovered a better way...)

How-to

Getting Started

New Method: WordPress Export Tool

Using the WordPress Exporter

The ParserMonster Project

I have installed the ParserMonster Project wiki on this site to document the features of the new ParserMonster Framework. There is not much on it, at the moment, so you should bookmark it or subscribe to the blog feed if you want to keep up with it as it grows.

DocuWiki
Check out DocuWiki.org

I decided to use DocuWiki, mostly because it doesn’t rely on a database, but also because it reminds me of TiddlyWiki, which I used to create the older version’s documentation.

Updates

As with all of the projects you will find on Morpho Designs, I’ll be sharing updates on how I actually use the DocuWiki software. I think that one of the main things I will be doing is figuring out how to automate the documentation process.

Software documentation needs to be consistent. While DocuWiki provides a consistent interface, I need to ensure that the content is presented in a uniform manner. That’s why I will probably make a bunch of boilerplate snippets. Stay tuned!

5 Reasons to Do Whatever You Want With Your Blog

Five simple truths may help you avoid analysis paralysis when it comes to radical changes on your blog:
Suffolk Downs 1975 Wooden Nickel
Creative Commons License Sean via Compfight

1. Be Yourself – Nobody Does it Better

You can’t please everybody. Generally, you can please yourself. If you take a stand, adopt a stance or draw a line, some people will stand with you, while others will try to knock you down. It’s your sandbox – don’t let anyone kick sand in your face.

2. Life is Short – Attention is Shorter

Your readers do not spend all day thinking about you. So, why do you care, really, what they think? It’s okay to ask for feedback, but you have to wear the outfit. The flip side is that you don’t have to ask for permission to let it all hang out.

3. Render Unto Google …

If you write for your readers, Google will get the message. SEO is important, but don’t let it dictate to your creativity.

4. You’re Not Making an Omelette, After All

If you have some rotten eggs on your blog, what else can you do but throw them away? If you have some Grade A content, you don’t want to break it. Either way, you shouldn’t be walking on eggshells.

5. Even a Wooden Nickel Has Two Sides

Just as you can’t please everybody, everybody can’t please you. So, make a decision, already!

  1. 5 Reasons to Delete Outdated Blog Posts
  2. 5 Reasons Why Deleting Your Blog Posts Is Stupid

Heads or tails? - ¿Cara o cruz?
DAVID MELCHOR DIAZ via Compfight