Recent Topics

CPU Usage Reduction Hack: Auto pruning of old stats

Started by on Aug 30, 2005 – Contents updated: Sep 25, 2013

Aug 30, 2005 01:36    

My host urges me to reduce my b2evolution blog CPU usage. He doesn't care about bandwith, but he does care about CPU usage. So, I've configured my Internet Explorer's Synchronize feature to grab my blog's pages beginning on the homepage and downloading every page listed on it, excluding images and external links. That downloads about 50 pages from the server.

After some tries, I've discovered the log_hit() function (see b2evocore/_functions_hitlogs.php) was pretty slow. So, I tried to figure out what took so much time. It appears a call to validate_url() (see b2evocore/_functions_hitlogs.php about line #73) takes about 9,2% of a page generation when a referrer is supplied (when a visitor comes from another page to one of your blog's pages) as noticed in the [url=http://forums.b2evolution.net/viewtopic.php?t=5243]CPU Usage Reduction Suggestions: Antispam[/url] thread. However that was not the heaviest feature.

Performing a synchronization of my blog (using the Internet Explorer Synchronize feature as explained above) took approximately 02:31'8. When removing the Auto pruning of old stats feature from log_hit() (see b2evocore/_functions_hitlogs.php about line #200), the same synchronization took about 01:43'7. That means the Auto pruning of old stats took 48'1 seconds and costs about 32% of a page generation.

In fact, the Auto pruning of old stats feature consist of the following code:

Code

/*
   * Auto pruning of old stats
   */
  if( isset($stats_autoprune) && ($stats_autoprune > 0) )
  {  // Autopruning is requested
    $sql = "DELETE FROM T_hitlog
             WHERE visitTime < '".date( 'Y-m-d', $localtimenow - ($stats_autoprune * 86400) )."'";
                                                            // 1 day = 86400 seconds
    $rows_affected = $DB->query( $sql );
    debug_log( 'Hit Log: autopruned '.$rows_affected.' rows.' );
  }

That code removes old hitlog entries every time a page is requested. That prevents the hitlog to increase to infinite!

Now, we can significantly reduce the CPU usage of our blog by executing that code less often. Since the default $stats_autoprune value (see conf/_advanced.php, about line #58) is defined to 30 days, nobody will complain if we clear the hitlog once a while, say once every 1000 hits. (Did you even know of such a feature in [url=http://b2evolution.net]b2evolution[/url]?!) That can be performed by modifying the following line:

Code

if( isset($stats_autoprune) && ($stats_autoprune > 0) )

with that one:

Code

if( isset($stats_autoprune) && ($stats_autoprune > 0) && (($localtimenow%1000)==0) )

Please notice the above new code doesn't clear the hitlog exactly every 1000 hits. In fact, it clears the hitlog when the current time (in seconds) is a multiple of 1,000.

Yes, those few extra characters can reduce your [url=http://b2evolution.net]b2evolution[/url] (0.9.1x) blog's CPU usage by about 32%.

Want to reduce your CPU usage even more? See the following threads:
[list][*][url=http://forums.b2evolution.net/viewtopic.php?t=5243]CPU Usage Reduction Suggestions: Antispam[/url][*][url=http://forums.b2evolution.net/viewtopic.php?t=4672]Simple Cache Hack[/url][/list:u]

Aug 30, 2005 02:00

I've never enabled that feature, but I'll give you another hack that I think helps with server work load. Stop logging hits that are either robots or syndication! Maybe other people care about those hit types but I sure don't. I mean, do I really care that a robot indexed me or someone told their aggregator to update? You get hit by aggregators even if you have no new posts, so why bother? My opinion, and others probably have a different view. Anyway here's the hack to stop logging those hits. (BTW I'm assuming not logging a hit means your server works a teeny bit less.)

Open b2evocore/_functions_hitlogs.php and find this bit:

Code

$ignore = 'rss';
    // don't mess up the XML!! debug_log( 'Hit Log: referer ignored (RSS));
  }
  else
  {  // Lookup robots
    foreach ($user_agents as $user_agent)
    {
      if( ($user_agent[0] == 'robot') && (strstr($UserAgent, $user_agent[1])) )
      {
        $ignore = "robot";
        debug_log( 'Hit Log: '. T_('referer ignored'). ' ('. T_('robot'). ')');
        break;
      }
    }
  }
  
  if( $ignore == 'no' )
Now add two little ifs right before the last line above:

Code

debug_log( 'Hit Log: '. T_('referer ignored'). ' ('. T_('robot'). ')');
        break;
      }
    }
  }
  
  if( $ignore == 'rss' ) {
    return;
    }
 
  if( $ignore == 'robot' ) {
    return;
    }
 
  if( $ignore == 'no' )

EDIT: sorry, but I submitted when I wanted to preview. It's a drop in the bucket, but *I think* every little bit helps. You lose the stats data, but - as I said - since it seems meaninless to me I figured why log it.

If you want to get really smart about it you would remove "log_hit()" from every file in your xmlsrv folder. That will stop rss hits from calling this function in the first place.

If you don't care about stats data AT ALL then remove that function call from your the xmlsrv folder AND your skin's _main.php file. I personally like to see referers and search engine hits, so I've left it intact. Obviously you make the call that's right for you.

Aug 30, 2005 02:12

EdB wrote:

Stop logging hits that are either robots or syndication! [... Anyway here's the hack to stop logging those hits. (BTW I'm assuming not logging a hit means your server works a teeny bit less.)

Once the Auto pruning of old stats feature removed/reduced, I don't notice any significant speed difference between an empty log_hit() function and a normal one. Bottlenecks have to be found elsewhere.

Sep 07, 2005 15:53

I just had to get rid of the log hit function completely, it didn't make a lot of sense to log visitors as I have thousands of visitors daily... the database server was going totally crazy, and in a shared hosting environment I was the black sheep causing problems... to know who visits is nice, but not a a "production" site with many visitors...

Sep 07, 2005 16:21

Did you remove it from 8 of the 9 files in the xmlsrv folder as well as the _main.php files in your skin(s)? Syndication feeds make up a heck of a lot of the hits because people update their aggregators frequently. You might have nothing new, but you get hit by them in order for them to find that out.

Also you should still keep your antispam guard up because spammers don't seem to care if they actually get through to your page. In fact they don't seem to care if they are getting 403s sent their way - once you're on their list they keep trying to spam you. Blech. I hate them.

Sep 09, 2005 00:45

What actually takes a lot of time here is MySQL checking what stats need to be pruned and which not.

This will be addressed in version 0.9.1 "Dawn". Due very shortly (sth like next week).


Form is loading...

multi-blog engine – This forum is powered by b2evolution CMS, a complete engine for your website.