Recent Topics

[HACK]Code to generate 'Google Sitemaps'

Started by on Jun 14, 2005 – Contents updated: Jun 14, 2005

Jun 14, 2005 16:50    

Hi.

Google Sitemaps is a method to inform and direct Google's crawlers, and enable then to find out what pages are present and which have recently changed (https://www.google.com/webmasters/sitemaps/docs/en/about.html)

I've created a script (based on 'rss.php') to generate this kind of files:
http://www.alianzo.com/resources/google-sitemaps.html

Hope you enjoy it.

Jun 14, 2005 19:02

Done, easy as falling off a log, thanks.
I look forward to the reults :)

Jun 15, 2005 22:53

Great idea.

I found a few problems, though.

1. There's no need to include _blog_main.php. Including _main.php is enough, if you initialize the $Blog and $blogparams variables. (This skips all the other url processing stuff that's not necessary here.)

2. "Always" means that something is changed *each time it's accessed*, which is definitely not the case on any page except the stats page (which isn't included here, rightly so.) A permalink is not likely to ever change, and most people only post to their blogs (at most) 1ce/day. Would be better to have the freq be "daily" on blogs, "monthly" on posts that have been changed, and "yearly" on others. (Of course, you could modify this if you get a lot of comments. Processing this value automatically would be nice. My errors blog might not ever change now that it's populated, but my main blog and linkblog change all the time.)

3. The sitemap has to be in the same folder as your blogs. [url=https://www.google.com/webmasters/sitemaps/docs/en/protocol.html#sitemapLocation]According to google[/url], it can't reference pages above it in the directory structure.

I made the changes that I suggested, and implemented it without much trouble. I also added a loop to go through all the different blogs, so you don't have to specify it in the URL. It works like a bloglist, and any blogs that wouldn't be shown in the bloglist have a lower priority. Posts are each only shown once. You can grab the file at
http://isaacschlueter.com/sitemap.xml.php.txt
Just strip off the .txt, and upload to your blog URL, and submit it to google.

:)

Jun 16, 2005 03:24

I love the sitemap idea, but there is a huge limitation for big blogs (like mine, with about 700-800 posts and several thousands comments):

Code

$MainList = & new ItemList...

is memory consuming, since it loads all the contents of all the posts in memory. With the default 8 MB limitation for PHP scripts, that becomes an issue.

In addition to the original script, I've added for each post the latest comment date and so, but when I process each posts' comments with something like:

Code

$CommentList = & new CommentList( $blog, "'comment'", array(), $Item->ID...

with only one comment at once (the last one), I encounter memory leaks (tested with memory_get_usage()). I tried to figure out what objects have to be freed up (by setting them to NULL) before the next post processing:

Code

$Comment->author_User = NULL;
$Comment = NULL;
$CommentList->Obj[] = NULL;
$CommentList = NULL;

however, that appears not being enough and I still have a memory leak of about 5-10 KB per loop because of the unfreed data allocated somewhere inside the "new CommentList"...

Since b2evolution doesn't care about memory at all (ie. I didn't find any "new" counterpart anywhere), I believe the best way is to access the database directly, without going through the whole b2evolution overhead.

Has anyone a better idea?

Jun 16, 2005 10:01

isaac wrote:

Great idea.

I found a few problems, though.

1. There's no need to include _blog_main.php. Including _main.php is enough, if you initialize the $Blog and $blogparams variables. (This skips all the other url processing stuff that's not necessary here.)

Yep. You're right.

isaac wrote:

2. "Always" means that something is changed *each time it's accessed*, which is definitely not the case on any page except the stats page (which isn't included here, rightly so.) A permalink is not likely to ever change, and most people only post to their blogs (at most) 1ce/day. Would be better to have the freq be "daily" on blogs, "monthly" on posts that have been changed, and "yearly" on others. (Of course, you could modify this if you get a lot of comments. Processing this value automatically would be nice. My errors blog might not ever change now that it's populated, but my main blog and linkblog change all the time.)

Good point. I've included some modifications.

isaac wrote:

3. The sitemap has to be in the same folder as your blogs. [url=https://www.google.com/webmasters/sitemaps/docs/en/protocol.html#sitemapLocation]According to google[/url], it can't reference pages above it in the directory structure.

That's right. I didn't notice it.

isaac wrote:

I made the changes that I suggested, and implemented it without much trouble. I also added a loop to go through all the different blogs, so you don't have to specify it in the URL. It works like a bloglist, and any blogs that wouldn't be shown in the bloglist have a lower priority. Posts are each only shown once.

I'd prefer to show one sitemap per blog, because we develope customized blogs.

isaac wrote:

You can grab the file at
http://isaacschlueter.com/sitemap.xml.php.txt
Just strip off the .txt, and upload to your blog URL, and submit it to google.

I also modifed my original source code. Thank you very much, Isaac.

isaac wrote:

:)

Jun 20, 2005 02:19

Thanks for your great script. I implemented it, but when I try and submit it to Google. I get the following error:

The server returned an error when we tried to access the URL provided. Please verify that the Sitemap URL is correct and resubmit your Sitemap.

This is the URL I have been trying to submit:

http://www.traduzca.net/blogs/sitemap.xml.php

Thanks for your help!

Regards,

Robert

Jul 13, 2005 16:34

This is a great script. It's the second one I tried and it rocks. The first one I tried wrote and xml file so you had to run the php file and then it would update the xml file. Doing it this way is much better because everytime google goes to update, it will get the latest information. Great work everyone!

Sep 11, 2005 22:25

In addition to the above mentioned [url=http://b2e.ex-code.com/index.php/soft/2005/09/11/b2e_google_sitemap_preliminary_release]new advanced Google sitemap for b2evolution[/url]

get the [url=http://b2e.ex-code.com/index.php/soft/2005/09/11/b2e_ping_google_patch]patch to ping Google after each post[/url] (above mentioned sitemap is required for this patch to be functional).

In this patch is missing user setting to turn it off. But would anyone need to turn it off anyway?

Oct 30, 2005 17:10

alianzo - awesome thank-you!

porton, i am fairly new and having trouble with your version.

some files seem to be missing, like google-sitemap-misc.php

The pinging idea is really cool, i was wondering how do i apply the patch?

Nov 01, 2005 04:22

Hi I posted a comment on your site. I just wanted to know how to apply the patch. Do I edit it into the original and if so on the top or on the bottom or does it matter?

Pie

Jan 05, 2006 23:53

it is not clear how the patch gets applied or what files needs to be edited.

Also, how do you test it? Is it automatically done?

Also, the password seems to appear on plain test on the XML output... is that normal?

Feb 08, 2006 08:12

I tried to implement the Alonzo sitemap file. It works but only for one blog. I really want to use the idea that Isaac uses where it goes through all the blogs. What kind of mod do I need to do that, so it will cycle through all the blogs and posts?

I couldn't get the porton poster solution to work correctly. The Alonzo files works great but only for one blog.

Anyone got the multiblog solution to work and can share the code?

Feb 10, 2006 12:01

guarriman wrote:

jeffposaka: just read the webpage by Alianzo:

Isaac Schlueter suggests to create a unique 'sitemap' for all the blogs inside your b2evo installation: http://isaacschlueter.com/sitemap.xml.php.txt

I downloaded the file "http://isaacschlueter.com/sitemap.xml.php.txt" from the site but it is an xml and map of Isaac's site. It doesn't produce a new xml map of my site. I tried it and got error messages. Am I doing something wrong?

The original Alianzo php works. It creates a sitemap but only for one blog and I want it to map all blogs.

I thought I was looking for a php file that would generate xml files of all my blogs and posts.

Any help?

Feb 10, 2006 12:20

When Isaac decided to downgrade his blog to WP a lot of the hacks he had became either dead links or useless glunk - like what you found. I don't know if anyone has the actual php file he created or not. If so it'd be nice if they shared it here...

Feb 10, 2006 12:38

Gotta love google cache ;)

Code

<?php
  // Released under GNU GPL License - www.gnu.org/copyleft/gpl.html
  // Original Author: Alianzo Networks - www.alianzo.com
  // Modified by Isaac Z. Schlueter - www.isaacschlueter.com
  
  $skin = '';
  $show_statuses = array();    
  $timestamp_min = '';          
  $timestamp_max = 'now';
  require dirname(__FILE__).'/b2evocore/_main.php';
  
  header("Content-type: application/xml");
  echo '<?xml version="1.0"?'.'>';
?>
 
<!-- generator="b2evolution/<?php echo $b2_version ?>" -->
 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
 
<?php
for( $blog = blog_list_start();
    $blog != false;
    $blog = blog_list_next() )
{
  $Blog = Blog_get_by_ID( $blog );
  $blogparams = get_blogparams_by_ID( $blog );
  $MainList = & new ItemList( $blog, '', '', '', '', '', array(), '', 'DESC', '', '', '', '', '', '', '', '', '', '999999', 'posts' );
  
  if($Item = $MainList->get_item() )
  {
    // there's something in this blog, so show it.
    ?>
    <url>
      <loc>
        <?php
        $Blog->disp('blogurl', 'xml');
        ?>
      </loc>
      <lastmod><?php $Item->issue_date('Y-m-d') ?></lastmod>
 
      <changefreq>daily</changefreq>
      <priority>
        <?php
        if($Blog->in_bloglist) echo '1.0';
        else echo '0.5';
        ?>
      </priority>
    </url>
    <?php
    // now, display all the posts in the blog.
    do {
      if( $Item->blog_ID == $blog )
      { // if it's a post for another blog, skip it.  we'll get to it later, or already did.
        ?>
        <url>
          <loc><?php $Item->permalink( 'single' ) ?></loc>
          <lastmod><?php $Item->issue_date('Y-m-d') ?></lastmod>
 
          <changefreq>
            <?php
            if($Item->mod_date > $Item->issue_date) echo 'monthly'; //not likely to change even that often, but it changed once, so it might again.
            else echo 'yearly'; //hasn't changed yet, might never
            ?>
          </changefreq>
          <priority>
            <?php
            if($Blog->in_bloglist) echo '0.8';
            else echo '0.3';
            ?>
          </priority>
        </url>
        <?php
      }
    } while( $Item = $MainList->get_item() );
  }
}
?>
</urlset>

¥

*edit*
To add missing line of code :roll:

Apr 14, 2006 05:37

It worked great! Thanks for fixed it Yabba.

Here is the code. I added the utf-8 statement.

Code

<?php
   // Released under GNU GPL License - www.gnu.org/copyleft/gpl.html
   // Original Author: Alianzo Networks - www.alianzo.com
   // Modified by Isaac Z. Schlueter - www.isaacschlueter.com
  
   $skin = '';
   $show_statuses = array();  
   $timestamp_min = '';              
   $timestamp_max = 'now';
   require dirname(__FILE__).'/b2evocore/_main.php';
  
   header("Content-type: application/xml");
   echo '<?xml version="1.0" encoding="UTF-8"?'.'>';
?>
 
<!-- generator="b2evolution/<?php echo $b2_version ?>" -->
 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
 
<?php
for( $blog = blog_list_start();
      $blog != false;
      $blog = blog_list_next() )
{
   $Blog = Blog_get_by_ID( $blog );
   $blogparams = get_blogparams_by_ID( $blog );
   $MainList = & new ItemList( $blog, '', '', '', '', '', array(), '', 'DESC', '', '', '', '', '', '', '', '', '', '999999', 'posts' );
  
   if($Item = $MainList->get_item() )
   {
      // there's something in this blog, so show it.
      ?>
      <url>
         <loc>
            <?php
            $Blog->disp('blogurl', 'xml');
            ?>
         </loc>
         <lastmod><?php $Item->issue_date('Y-m-d') ?></lastmod>
 
         <changefreq>daily</changefreq>
         <priority>
            <?php
            if($Blog->in_bloglist) echo '1.0';
            else echo '0.5';
            ?>
         </priority>
      </url>
      <?php
      // now, display all the posts in the blog.
      do {
         if( $Item->blog_ID == $blog )
         { // if it's a post for another blog, skip it.  we'll get to it later, or already did.
            ?>
            <url>
               <loc><?php $Item->permalink( 'single' ) ?></loc>
               <lastmod><?php $Item->issue_date('Y-m-d') ?></lastmod>
 
               <changefreq>
                  <?php
                  if($Item->mod_date > $Item->issue_date) echo 'monthly'; //not likely to change even that often, but it changed once, so it might again.
                  else echo 'yearly'; //hasn't changed yet, might never
                  ?>
               </changefreq>
               <priority>
                  <?php
                  if($Blog->in_bloglist) echo '0.8';
                  else echo '0.3';
                  ?>
               </priority>
            </url>
            <?php
         }
      } while( $Item = $MainList->get_item() );
   }
}
?>
</urlset>

Jun 07, 2006 13:40

I am trying to modify this to make it work in Phoenix. I get errors originating from:

Code

$skin = '';
   $show_statuses = array();  
   $timestamp_min = '';              
   $timestamp_max = 'now';
   require dirname(__FILE__).'/b2evocore/_main.php';

I changed it to:

Code

$skin = '';
   $show_statuses = array();  
   $timestamp_min = '';              
   $timestamp_max = 'now';
   require dirname(__FILE__).'/evocore/_main.inc.php';

But still get errors.

Any ideas how to hack this hack to make it work in 1.6?

Thanks

Jun 07, 2006 18:28

Try this version for phoenix:-

PHP

<?php
   // Released under GNU GPL License - www.gnu.org/copyleft/gpl.html
   // Original Author: Alianzo Networks - www.alianzo.com
   // Modified by Isaac Z. Schlueter - www.isaacschlueter.com
   
   $skin '';
   $show_statuses array();   
   $timestamp_min '';               
   $timestamp_max 'now';
   require dirname(__FILE__).'/evocore/_main.inc.php';
   
   header("Content-type: application/xml");
   echo '<?xml version="1.0" encoding="UTF-8"?'.'>';
?>
 
<!-- generator="<?php echo $app_name.' '.$app_version ?>" -->
 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
 
<?php
for( $blog blog_list_start();
      $blog != false;
      $blog blog_list_next() )
{
   $Blog Blog_get_by_ID$blog );
   $blogparams get_blogparams_by_ID$blog );
 
   $MainList = & new ItemList$blogarray(), ''''''''array(), '''DESC'''999999''''''''''''0'posts' );
   
   if($Item $MainList->get_item() )
   {
      // there's something in this blog, so show it.
      ?>
      <url>
         <loc>
            <?php
            $Blog->disp('blogurl''xml');
            ?>
         </loc>
         <lastmod><?php $Item->issue_date('Y-m-d'?></lastmod>
 
         <changefreq>daily</changefreq>
         <priority>
            <?php
            if($Blog->in_bloglist) echo '1.0';
            else echo '0.5';
            ?>
         </priority>
      </url>
      <?php
      // now, display all the posts in the blog.
      do {
         if( $Item->blog_ID == $blog )
         { // if it's a post for another blog, skip it.  we'll get to it later, or already did.
            ?>
            <url>
               <loc><?php $Item->permalink'single' ?></loc>
               <lastmod><?php $Item->issue_date('Y-m-d'?></lastmod>
 
               <changefreq>
                  <?php
                  if($Item->mod_date $Item->issue_date) echo 'monthly'//not likely to change even that often, but it changed once, so it might again.
                  else echo 'yearly'//hasn't changed yet, might never
                  ?>
               </changefreq>
               <priority>
                  <?php
                  if($Blog->in_bloglist) echo '0.8';
                  else echo '0.3';
                  ?>
               </priority>
            </url>
            <?php
         }
      } while( $Item $MainList->get_item() );
   }
}
?>
</urlset>

¥

Jun 09, 2006 10:03

Yabba,

It worked great. :D I submitted it to Google and it works.

Thanks again.

JeffPosaka

Oct 05, 2006 12:41

anyone get this to work with serenity?

Seems liek the evocore/_main.php file has moved again!

any ideas, cheers!

Oct 05, 2006 14:09

Thanks...your code rocks!

Oct 05, 2006 16:38

Try this :-

PHP

<?php
   // Released under GNU GPL License - www.gnu.org/copyleft/gpl.html
   // Original Author: Alianzo Networks - www.alianzo.com
   // Modified by Isaac Z. Schlueter - www.isaacschlueter.com
   
   $skin '';
   $show_statuses array();   
   $timestamp_min '';               
   $timestamp_max 'now';
require_once dirname(__FILE__).'/conf/_config.php';
 
require $inc_path.'_main.inc.php';
   
   header("Content-type: application/xml");
   echo '<?xml version="1.0" encoding="UTF-8"?'.'>';
?>
 
<!-- generator="<?php echo $app_name.' '.$app_version ?>" -->
 
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
 
<?php
for( $blog blog_list_start();
      $blog != false;
      $blog blog_list_next() )
{
   $Blog Blog_get_by_ID$blog );
   $blogparams get_blogparams_by_ID$blog );
 
   $MainList = & new ItemList$blogarray(), ''''''''array(), '''DESC'''999999''''''''''''0'posts' );
   
   if($Item $MainList->get_item() )
   {
      // there's something in this blog, so show it.
      ?>
      <url>
         <loc>
            <?php
            $Blog->disp('blogurl''xml');
            ?>
         </loc>
         <lastmod><?php $Item->issue_date('Y-m-d'?></lastmod>
 
         <changefreq>daily</changefreq>
         <priority>
            <?php
            if($Blog->in_bloglist) echo '1.0';
            else echo '0.5';
            ?>
         </priority>
      </url>
      <?php
      // now, display all the posts in the blog.
      do {
         if( $Item->blog_ID == $blog )
         { // if it's a post for another blog, skip it.  we'll get to it later, or already did.
            ?>
            <url>
               <loc><?php echo $Item->get_permanent_url( ); ?></loc>
               <lastmod><?php $Item->issue_date('Y-m-d'?></lastmod>
 
               <changefreq>
                  <?php
                  if($Item->mod_date $Item->issue_date) echo 'monthly'//not likely to change even that often, but it changed once, so it might again.
                  else echo 'yearly'//hasn't changed yet, might never
                  ?>
               </changefreq>
               <priority>
                  <?php
                  if($Blog->in_bloglist) echo '0.8';
                  else echo '0.3';
                  ?>
               </priority>
            </url>
            <?php
         }
      } while( $Item $MainList->get_item() );
   }
}
?>
</urlset>

¥


Form is loading...

Social CMS software – This forum is powered by b2evolution CMS, a complete engine for your website.