Recent Topics

Huge performance issues on evo_hitlog with big visits

Started by on Jan 10, 2005 – Contents updated: Jan 10, 2005

Jan 10, 2005 15:54    

Hello all,

I am running b2 for almost 3 months and during this time my blog has gotten very big (around 4000 unique visitors PER DAY). http://goedzo.com

With this increase in visitors my server crahses reguraly and I was diving into it.

What I noticed was that the MysqlD service is using up about 99,8% of cpu time. When diving further into this the following query take just too much time:

SELECT COUNT( * ) AS totalHits, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
GROUP BY baseDomain
ORDER BY totalHits DESC
LIMIT 5

To make this a little faster I changed the index on baseDomain to a fulltext index. I noticed a performance increase. I also created a new index on hit_ignore. But I don't think this helps.

Also the query:
DELETE FROM evo_hitlog WHERE visitTime < '2004-12-11'

causes a lot of problems because this lockes the table in such case that the requests get queued which results in bad cases of a query time of 53! seconds.

Please think with me how to gain performance because this behaviour cannot be acceptable if you want to use this on a big b2 board.

p.s. at this moment the evo_hitlog has 173,779 records and is 36.5 MB

Jan 10, 2005 15:57

just to verify, here is a mysql querylist which shows clearly where to find the problem:

Truncate Shown Queries ID User Host Database Command Time Status SQL-query
Kill 702 goedzo localhost goedzo_comweblog Query 9 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 705 goedzo localhost goedzo_comweblog Query 25 Copying to tmp table SELECT COUNT( * ) AS totalHits, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
GROUP BY baseDomain
ORDER BY totalHits DESC
LIMIT 5
Kill 706 goedzo localhost goedzo_comweblog Query 13 Locked INSERT INTO evo_hitlog( visitTime, visitURL, hit_ignore, referingURL, baseDomain, hit_blog_ID, hit_remote_addr, hit_user_agent )
VALUES (
FROM_UNIXTIME( 1105368864 ) , '/index.php/goedzo/2004/12/02/hotmail_gratis_upgraden_naar_250_mb', 'invalid', '', '', 1, '82.173.232.186', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Creative; SV1; .NET CLR 1.1.4322)'
)
Kill 707 goedzo localhost goedzo_comweblog Query 17 updating DELETE FROM evo_hitlog WHERE visitTime < '2004-12-11'
Kill 710 goedzo localhost goedzo_comweblog Query 15 Locked SELECT COUNT( * ) AS totalHits, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
GROUP BY baseDomain
ORDER BY totalHits DESC
LIMIT 5
Kill 712 goedzo localhost goedzo_comweblog Query 14 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 713 goedzo localhost goedzo_comweblog Query 15 Locked SELECT COUNT( * ) AS totalHits, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
GROUP BY baseDomain
ORDER BY totalHits DESC
LIMIT 5
Kill 714 goedzo localhost goedzo_comweblog Query 17 Locked DELETE FROM evo_hitlog WHERE visitTime < '2004-12-11'
Kill 715 goedzo localhost goedzo_comweblog Query 7 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 716 goedzo localhost goedzo_comweblog Query 7 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 717 goedzo localhost goedzo_comweblog Query 8 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 718 goedzo localhost goedzo_comweblog Query 7 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 719 goedzo localhost goedzo_comweblog Query 1 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5
Kill 721 goedzo localhost goedzo_comweblog Query 0 Locked SELECT visitID, UNIX_TIMESTAMP( visitTime ) AS visitTime, referingURL, baseDomain
FROM evo_hitlog
WHERE hit_ignore
IN (
'no'
)
ORDER BY visitID DESC
LIMIT 5

Jan 10, 2005 16:08

I added an index to the visitTime column, because the delete statement takes a lot of time too, maybe this will speed things up.

Jan 10, 2005 17:16

Well, i just emptied the table because I don't know how to make this quick. It looks like nothing realy helps. Emptying the table does help, but you can imagine that is not realy a fix. Please help.

Jan 10, 2005 17:45

It doesn't hurt to be wrong so...

Autopruning happens every time you get a hit to your page, so, with the amount of traffic you're getting it's running like crazy. A possible method to reduce the overall load is to only run that particular bit of the function when admin is visiting the blog. In b2evocore/_functions_hitlogs.php find this bit:

	/*
	 * Auto pruning of old stats
	 */
	if( isset($stats_autoprune) && ($stats_autoprune > 0) )
	{	// Autopruning is requested
		$sql = "DELETE FROM T_hitlog
						 WHERE visitTime < '".date( 'Y-m-d', $localtimenow - ($stats_autoprune * 86400) )."'";
																														// 1 day = 86400 seconds
		$rows_affected = $DB->query( $sql );
		debug_log( 'Hit Log: autopruned '.$rows_affected.' rows.' );
	}

and make it only autoprune if the visitor is logged in as admin with this:

	/*
	 * Auto pruning of old stats
	 */
	if( is_logged_in() ) { // check for logged in
	if( $current_User->get('ID') == 1 ) { // check for admin
	if( isset($stats_autoprune) && ($stats_autoprune > 0) )
	{	// Autopruning is requested
		$sql = "DELETE FROM T_hitlog
						 WHERE visitTime < '".date( 'Y-m-d', $localtimenow - ($stats_autoprune * 86400) )."'";
																														// 1 day = 86400 seconds
		$rows_affected = $DB->query( $sql );
		debug_log( 'Hit Log: autopruned '.$rows_affected.' rows.' );
	}
	}
	}

I reckon you could give yourself another login with powers and abilities equal to ID#1, then log in as admin through a browser you don't like. That way once a day (or whatever) you could open that browser to autoprune instead of logging out and in and out.

Oh and I guess you would have to add $current_User to the globals.

Jan 10, 2005 17:45

Are you using 0.9.0.11?

If you are, crack open conf/_advanced.php and change (or add it in somewhere)

/**
 * How many days of stats do you want to keep before auto pruning them?
 *
 * Set to 0 to disable auto pruning
 *
 * @global int $stats_autoprune
 */
$stats_autoprune = 0; // Default: 30 days

To $stats_atuoprune to be 30 or something that leaves your database at an acceptable size.

Jan 10, 2005 19:18

Yes I am on 0.9.0.11

At this moment my $stats_autoprune is already 30. But that isn't really good I guess. I will try the option EdB proposes below.

Jan 10, 2005 22:03

EdB wrote:

It doesn't hurt to be wrong so...
Oh and I guess you would have to add $current_User to the globals.

How do I do that?

Jan 10, 2005 22:20

Up at the top of that function there are, well, lemme fire up an editor...

Up around line 33 you should see

/**
 * Log a hit on a blog page / rss feed
 *
 */
function log_hit()
{
	global $DB, $localtimenow, $blog, $tablehitlog, $blackList, $search_engines, $user_agents;
	global $doubleCheckReferers, $comments_allowed_uri_scheme, $HTTP_REFERER, $page, $ReqURI, $ReqPath;
	global $stats_autoprune;
	
	# TODO: check for already logged?


Change it to

/**
 * Log a hit on a blog page / rss feed
 *
 */
function log_hit()
{
	global $DB, $localtimenow, $blog, $tablehitlog, $blackList, $search_engines, $user_agents;
	global $doubleCheckReferers, $comments_allowed_uri_scheme, $HTTP_REFERER, $page, $ReqURI, $ReqPath;
	global $stats_autoprune, $current_User;
	
	# TODO: check for already logged?

Actually that's probably possibly no good because $current_User probably possibly doesn't exist if the visitor is not logged in. It's worth a shot, but you'll have to be logged out to see if it will work for non-member visitors. If it fails I think you can get away with something different.

I recently copied a hack that had a new global added inside the middle of a function instead of at the top like normal. I didn't even notice the hack had a global in the middle of a function, but it did and it worked so I guess it's allowable. Way down where you added the 2 new "if" conditions you could try adding it after you know it's a logged in user.

	/*
	 * Auto pruning of old stats
	 */
	if( is_logged_in() ) { // check for logged in
	global $current_User;
	if( $current_User->get('ID') == 1 ) { // check for admin
	if( isset($stats_autoprune) && ($stats_autoprune > 0) )
	{	// Autopruning is requested
		$sql = "DELETE FROM T_hitlog
						 WHERE visitTime < '".date( 'Y-m-d', $localtimenow - ($stats_autoprune * 86400) )."'";
																														// 1 day = 86400 seconds
		$rows_affected = $DB->query( $sql );
		debug_log( 'Hit Log: autopruned '.$rows_affected.' rows.' );
	}
	}
	}

One or the other should be legit, and I'll probably test this on my blog just to be sure of what works or not.

Jan 10, 2005 22:36

I used

global $current_User;
   if( $current_User->get('ID') == 1 )

And this works like a charm :)
thanks a lot EdB, my weblog is realy fast now! I hope the folks of b2 are reading this aswell, because this is something a big log would be killed if this happens too much.

Jan 10, 2005 22:50

Cool. I just tried it the other way - with the $current_User up top with the rest of the globals and it did not generate an error even when logged out. The real trick will be to see that it autoprunes when logged in.

I think I'm going to make up an id that doesn't do anything except trigger the autoprune bit, then log into that new account on a browser I don't use much. That way admin will be like normal and 'gardener' can take care of autopruning for me. Get it? Gardener? Pruning? :roll: I can also test that it really does autoprune by just not logging into that account for a few days and seeing if it trims the hitlog table when I finally do log in.

So uh . . . nims wrote:

To make this a little faster I changed the index on baseDomain to a fulltext index. I noticed a performance increase.

How do you do that?

Jan 10, 2005 23:05

EdB wrote:

nims wrote:

To make this a little faster I changed the index on baseDomain to a fulltext index. I noticed a performance increase.

How do you do that?

I logged in the php my admin tool to edit the mysql tables directly. I clicked on the table, en edited the index that was on the column. When you're at it, also add the index to the visitTime column. This realy helps aswell.

Please notice! creating and changing indexes can create a lot of performance drain on your server and it can even hang because it has to walk over the complete table! Changing the index to a full text index took the server about 30minutes! and during that time no process could use the blog, because the table was locked.

If you want to do it quickly, and you don't care mucht about your current stats, first empty the evo_hitlog table and then edit / add the indexes!

Jan 12, 2005 00:01

To the creators of b2evolution.

Please make it so that the autopruning only happens once per hour. This wil increase the performance dramatically. I have implemented EdB's proposal, but that is not really a "good" solution. I think the "right" soluation would be to prune the post only once per hour, instead doing it by every request.

Feb 17, 2005 22:28

I'll take care of it when refactoring the Hit logging..


Form is loading...

CMS software – This forum is powered by b2evolution CMS, a complete engine for your website.