Recent Topics

1 Sep 26, 2006 01:09    

Note: This is not a tutorial on migrating a WordPress blog to B2evolution. This is about using URL rewriting (via mod_rewrite) to map WordPress fancy URLs to a B2evolution post or search results page. Read the back story for more details.

The Back Story
As I've mentioned in several other posts already, I recently lost my old WordPress blog installation due to a botched server upgrade. Having no good database backup available (clearly in violation of my own motto "Save early, save often"), I was at least able to use Google's site search to save cached copies of my content, which I've then been slowly back-posting to my new B2evolution blog.

Unfortunately, I used trackbacks a lot in my old blog, so there are many outdated links on other blogs that point to my now invalid WordPress installation. Not to mention all of the old links that still show up in some search engine results. I couldn't just install B2evo over the old WP folder since the folder and URL structures were so different as to have required redirects any ways. At first, my solution to this was a generic redirect to my new blog for any request to the old URL. This was a quick hack, and got me up and running with B2evolution, but it meant that anyone looking for an old article was SOL.

The quickest way to redirect on a post-by-post basis would have been to just forward the old post slug to the new blog, but this posed two problems - 1.) B2evolution post slugs are shorter than in WP, so a direct match isn't always possible; and 2.) WP encoded slugs differently than B2evo does. Besides, since I was back-loading over time, many old slugs would simply return no results so it would have been a fairly useless fix for a while.

After toying around with several configurations, I finally settled on a hybrid solution that was part mod_rewrite and part B2evolution hack. This allowed me to do a simple catch-all redirect but still do some complex pattern matching in order to serve up as close to the correct thing as possible.

This is still a work in progress, and any suggestions for improvement are welcome. The various parts are listed below with descriptions following each section.

The Code

Part 1 - the .htaccess file
My old WordPress folder had it's own .htaccess file which I made a back up copy of (just in case) and replaced the contents with this:

RewriteEngine On

RewriteRule /feed/rss2/ [NC,R=301,L]
RewriteRule /feed/atom/ [NC,R=301,L]
RewriteRule /feed/rdf/ [NC,R=301,L]
RewriteRule /feed/rss/ [NC,R=301,L]
RewriteRule /comments/feed/ [NC,R=301,L]
RewriteRule /feed/ [NC,R=301,L]

# request for root
RewriteRule ^(index\.php)?$ [NC,R=301,L]

# request for anything but an old resource file (useful while backposting)
# we'll figure the rest out on the receiving file's end.
RewriteRule ^index\.php(.+)$$1&smode=backport [NC,R=301,L]


  • Line 1 turns on the mod_rewrite module in Apache

  • Lines 3 - 8 redirect old feed URLs to the new feed URLs. This way any reader agents referencing the old URL will still get new posts.

  • Line 11 redirects a generic request to the new blog URL (i.e. home page to home page)

  • Line 15 redirects any other request (except images, CSS files, etc.) to the blog, but include the requested URL as a search parameter. We also add the "smode" URL parameter for use in the receiving script (see next part)

  • [/list:u]

    Part 2 - the receiving script
    This requires a hack to the /inc/ file in B2evolution. I usually try to avoid hacking source files directly, preferring to use the extended plugin architecture available. However, in this case it was just quicker to hack. If I end up porting it to a plugin at some other point I'll update this post (feel free to contribute your own too!).

    We want to insert our code after the last call to $Request->param() which should be on or around line 102. Here's what I inserted: (I included a few lines of the original for reference before and after the custom code)

    $Request->param( 'disp', 'string', 'posts', true );
    $Request->param( 'stats', 'integer', 0 );                 // deprecated
    $Request->param( 'tempskin', 'string', '', true );
    // custom field, passed from old blogs to denote search for a back posted item
    $Request->param( 'smode', 'string', '', true );
    if ( $s && ( $smode == 'backport' || $smode == 'backpost' ) )
    	// we received a request that was redirected from the old wordpress installation.
    	// lets try to find it.
    	$Messages->add( 'We suffered a <a href="">data loss</a> near '
    		. 'the beginning of September, 2006 and are working to restore all prior posts. If '
    		. 'the article you\'re looking can\'t be found, please <a '
    		. 'href="'
    		. '&recipient_id=1&redirect_to='.urlencode( $ReqURI ).'">contact</a> me and I '
    		. 'may be able to send it to you directly.', 'note' );
    	// possible formats
    	//   date: /yyyy/(mm/)(dd/)(page/page_no/)
    	//   post: /yyyy/mm/dd/post_name/
    	//   category: /category/name/(subcat_name/)(page/page_no/)
    	$_matches = array();
    	$original_s = $s;
    	$s = preg_replace( '#/page/[0-9]+/?$#i', '/', $s ); //trim off any paging info
    	if( preg_match( '#^/([0-9]{4})/?$#', $s, $_matches ) )
    		//year only
    		$Request->set_param( 's', '' );
    		$Request->set_param( 'm', $_matches[1] );
    	elseif( preg_match( '#^/([0-9]{4})/([0-9]{2})/?$#', $s, $_matches ) )
    		//year and month
    		$Request->set_param( 's', '' );
    		$Request->set_param( 'm', $_matches[1].$_matches[2] );
    	elseif( preg_match( '#^/([0-9]{4})/([0-9]{2})/([0-9]{2})/?$#', $s, $_matches ) )
    		//year, month and date
    		$Request->set_param( 's', '' );
    		$Request->set_param( 'm', $_matches[1].$_matches[2].$_matches[3] );
    	elseif( preg_match( '#^/[0-9]{4}/[0-9]{2}/[0-9]{2}/([^/]+)/?$#', $s, $_matches ) )
    		//year, month, date and post
    		$Request->set_param( 's', trim( preg_replace( '/[^a-z0-9]/i', ' ', $_matches[1] ) ) );
    		$Request->set_param( 'sentence', '1' );
    		$Request->set_param( 'exact', 0 );
    	elseif( preg_match( '#^/category/(.+)$#', $s, $_matches ) )
    		// not sure what to do here - categories are referenced by numeric ID in b2evolution
    		// do a normal search for now?
    		$Request->set_param( 's', trim( preg_replace( '/[^a-z0-9]/i', ' ', $_matches[1] ) ) );
    		$Request->set_param( 'sentence', 'OR' );
    		$Request->set_param( 'exact', 0 );
    		//do a normal search for now?
    		$Request->set_param( 's', trim( preg_replace( '/[^a-z0-9]/i', ' ', $s ) ) );
    		$Request->set_param( 'sentence', '1' );
    		$Request->set_param( 'exact', 0 );
    if( !isset($timestamp_min) ) $timestamp_min = '';
    if( !isset($timestamp_max) ) $timestamp_max = '';

    Explanation of added code

  • Line 2 uses B2evo's param() function to read the value of the smode param if it exists.

  • Line 3 tests for the value of the "smode" parameter and a valid search string ($s). I am testing for two values only because I started out using backport, but thought I might switch to backpost later on. Mainly because I'm anal.

  • Line 7 adds a message to be displayed when the page is rendered. This shows up as a note at the top of the page to give some context to what gets displayed. It also provides a link to my post about losing the data as well as a link to the contact form in case a users has any questions.

  • Line 18 sets up the array we'll use with our Regular Expression matching.

  • Line 19 saves a copy of the original search string request. It isn't used in the custom script, and the original B2evo code isn't going to use it, but I left it there just in case.

  • Line 20 uses a regular expression to get rid of any page data from the old WordPress URL. For instance, "/category/foo/bar/page/2/" becomes "/category/foo/bar/"

  • Line 22 is the start of our pattern matching. Here we are testing for a request for a specific year (e.g. "/2004/"). If the test is true lines 25 and 26 set the $m parameter and clears the $s search parameter. The page will now display a list of posts from the requested year.

  • Line 28 tests for a request for a specific year and month (e.g. "/2004/08/"). If this test is true lines 31 and 32 set the $m parameter and clears the $s search parameter. The page will now display a list of posts from the requested year and month.

  • Line 34 tests for a request for a specific year, month and day (e.g. "/2004/08/21/"). If this test is true lines 37 and 38 set the $m parameter and clears the $s search parameter. The page will now display a list of posts from the requested year, month and day.

  • Line 40 tests for a request for a specific post slug. If this test is true line 43 reformats the slug by replacing any non-alphanumeric characters with a space ("my-1st-post-slug" becomes "my 1st post slug"). Line 44 sets the search mode to "sentence" for a more accurate search and line 45 disables exact matching.

  • Line 47 tests for a request for category or sub-category. A match here does pretty much the same thing as the previous code block, except less specific. This is because where-as WordPress can reference a category by name, b2evolution uses the category id. I could do a lookup in the database here, but since I didn't recreate my categories exactly as I had them in Wordpress there was no guaranteeing a direct match any ways. This way I just look for keywords inside the posts themselves. It's not 100% accurate, but it's quick and at least gives the user a starting point.

  • Line 56 is the catch-all in case the no other pattern was matched. This block performs the same operations as the post-slug block from line 40.

  • [/list:u]

    Part 3 - disambiguation
    This part is optional, but it helps make your searches more accurate. At the end of each entry that I'm back posting I add an HTML comment that includes the text from the old WordPress slug, minus any non-alphanumeric characters. It's not guaranteed that my old post title will also appear in the original post content, so this seeds the article with text that is sure to be found using the search as formed in part 2. For instance, if my old post had the slug "it-slices-and-dices-and-julians-fries", I would add this to the end of the post content:

    <!-- backpost :: it slices and dices and julians fries -->

    The "backpost :: " part just makes it easier to find in the database later on if need be. Remember, since the WordPress slug is likely to be different from the new B2evo slug this gives us a better chance of finding the correct article. I could copy/paste, but since the B2evo slugs are shorter I'm still not guaranteed an exact match.

    That's it (plus I'm late for dinner). If you find any bugs (which are likely just typos since it's working on my blog) or have any suggestions please let me know!

    Form is loading...