Recent Topics

1 Jul 31, 2005 12:45    

WARNING: Before any change, always make a backup copy!

About referrer stats
A lot of web sites display referrers in their stats. [url=http://b2evolution.net]b2evolution[/url] powered blogs do that. Adding ?disp=stats to a [url=http://b2evolution.net]b2evolution[/url] powered blog's URL makes statistics appear (that may not work on some blogs where the stats page has been removed or renamed once for all).

How do referrer stats work?
Referrers are web pages people come from. Your web browser sends the URL of the page it comes from to the web page it comes to. It's a kind of courtesy. Web browsers are polite and they just say something like "Hello New Page! I came to you from www.blahblahblah.com/greatlinks.html" to the page they come to. That helps web masters to identify sites their traffic come from.

How do spammers to modify referrer stats?
Spammers use sites' stats to make their own sites appear, in order to improve their sites traffic. They do so by using robots to crawl the web pretending they come from their site while they don't. Their robots lie about the page they come from making your referrer stats wrong.

Some webmasters or their visitors also like to change other sites referrer stats by endlessly clic on links to external pages...

How to reduce the weight of the referrer spam on referrer stats?
In order to reduce the weight of those behaviors, I modified the way the stats are counted by:
[list]

  • adding a period of time stats are taken into account to compute best referrers (spammers tend to come for long periods of time while only the most recent referrers are interesting);

  • counting only once each hit's IP (spammers use the same IP several times).[/list:u]

  • How to implement these changes?
    We are going to modify the refererList function you can find in the b2evocore/_functions_hitlogs.php file of your [url=http://b2evolution.net]b2evolution[/url] install.

    About line 296, add the $period parameter like this:

    function refererList(
    	$howMany = 5,
    	$visitURL = '',
    	$disp_blog = 0,
    	$disp_uri = 0,
    	$type = "'no'",		// 'no' normal refer, 'invalid', 'badchar', 'blacklist', 'rss', 'robot', 'search'
    	$groupby = '', 	// baseDomain
    	$blog_ID = '',
    	$get_total_hits = false, // Get total number of hits (needed for percentages)
    	$get_user_agent = false, // Get the user agent
    	$period = '' )	  		 // Period (in seconds) to check


    Then, about line 309, declare we are going to use the $localtime global variable:

    {
    	global 	$DB, $tablehitlog, $res_stats, $stats_total_hits, $ReqURI;
    	global	$localtimenow;


    About line 329, make each IP count only once:

    		if( $groupby == '' )
    	{	// No grouping:
    		$sql = "SELECT visitID, UNIX_TIMESTAMP(visitTime) AS visitTime, referingURL, baseDomain";
    	}
    	elseif( $groupby == 'baseDomain' )
    	{	// group by
    		$sql = "SELECT COUNT(DISTINCT hit_remote_addr) AS totalHits, referingURL, baseDomain"; //<<<MK
    	}
    	else
    	{
    		$sql = "SELECT COUNT(*) AS totalHits, referingURL, baseDomain";
    	}
    


    About line 354, take only the most recent hits:

    	if ($visitURL != "global")
    	{
    		$sql_from_where .= " AND visitURL = '$visitURL'";
    	}
    	if( !empty( $period ) )
    	{
    		$sql_from_where .= " AND visitTime>'".date('YmdHis', $localtimenow-$period)."'";
    	}
    
    	$sql .= $sql_from_where;


    About line 375, count each IP only once:

    	if( $get_total_hits )
    	{	// we need to get total hits
    		//$sql = "SELECT COUNT(DISTINCT hit_remote_addr) ".$sql_from_where;
    
    		$sql = "SELECT COUNT(DISTINCT hit_remote_addr) ".$sql_from_where;
    		$stats_total_hits = $DB->get_var( $sql );
    	}
    	else
    	{	// we're not getting total hits
    		$stats_total_hits = 1;		// just in case some tries a percentage anyway (avoid div by 0)
    	}
    

    You can also copy-past the full listing to replace the refererList function (about lines 290 and so):

    /**
     *
     * {@internal refererList(-) }}
     *
     * Extract stats
     */
    function refererList(
    	$howMany = 5,
    	$visitURL = '',
    	$disp_blog = 0,
    	$disp_uri = 0,
    	$type = "'no'",		// 'no' normal refer, 'invalid', 'badchar', 'blacklist', 'rss', 'robot', 'search'
    	$groupby = '', 	// baseDomain
    	$blog_ID = '',
    	$get_total_hits = false, // Get total number of hits (needed for percentages)
    	$get_user_agent = false, // Get the user agent
    	$period = '' )	  		 // Period (in seconds) to check
    {
    	global 	$DB, $tablehitlog, $res_stats, $stats_total_hits, $ReqURI;
    	global	$localtimenow;
    
    	autoquote( $type );		// In case quotes are missing
    
    	$ret = array();
    
    	//if no visitURL, will show links to current page.
    	//if url given, will show links to that page.
    	//if url="global" will show links to all pages
    	if (!$visitURL)
    	{
    		$visitURL = $ReqURI;
    	}
    
    	if( $groupby == '' )
    	{	// No grouping:
    		$sql = "SELECT visitID, UNIX_TIMESTAMP(visitTime) AS visitTime, referingURL, baseDomain";
    	}
    	elseif( $groupby == 'baseDomain' )
    	{	// group by
    		$sql = "SELECT COUNT(DISTINCT hit_remote_addr) AS totalHits, referingURL, baseDomain";
    	}
    	else
    	{
    		$sql = "SELECT COUNT(*) AS totalHits, referingURL, baseDomain";
    	}
    	if( $disp_blog )
    	{
    		$sql .= ", hit_blog_ID";
    	}
    	if( $disp_uri )
    	{
    		$sql .= ", visitURL";
    	}
    	if( $get_user_agent )
    	{
    		$sql .= ", hit_user_agent";
    	}
    	
    	$sql_from_where = " FROM $tablehitlog WHERE hit_ignore IN ($type)";
    	if( !empty($blog_ID) )
    	{
    		$sql_from_where .= " AND hit_blog_ID = '$blog_ID'";
    	}
    	if ($visitURL != "global")
    	{
    		$sql_from_where .= " AND visitURL = '$visitURL'";
    	}
    	if( !empty( $period ) )
    	{
    		$sql_from_where .= " AND visitTime>'".date('YmdHis', $localtimenow-$period)."'";
    	}
    
    	$sql .= $sql_from_where;
    
    	if( $groupby == '' )
    	{	// No grouping:
    		$sql .= " ORDER BY visitID DESC";
    	}
    	else
    	{	// group by
    		$sql .= "	GROUP BY $groupby ORDER BY totalHits DESC";
    	}
    	$sql .= " LIMIT $howMany";
    
    	$res_stats = $DB->get_results( $sql, ARRAY_A );
    	
    	echo( "\n<!-- $sql -->\n" );
    
    	if( $get_total_hits )
    	{	// we need to get total hits
    		//$sql = "SELECT COUNT(*) ".$sql_from_where;
    		$sql = "SELECT COUNT(DISTINCT hit_remote_addr) ".$sql_from_where;
    		$stats_total_hits = $DB->get_var( $sql );
    	}
    	else
    	{	// we're not getting total hits
    		$stats_total_hits = 1;		// just in case some tries a percentage anyway (avoid div by 0)
    	}
    }

    After the previous update, you don't have to change anything else to make your stats displayed correctly. The default behaviour is to use (as previously) the whole hitlog into account. The only difference is each IP address is taken into account only once.

    You can also display your top 10 referrers counting only unique visitors (based on their IP address) on the last 24 hours like this (this is what I use in [url=http://blog.lesperlesduchat.com/perles.php]my own skin[/url], in the right margin):

    <div>
    	<h4><?php echo T_('Top Referers') ?></h4>
    	<?php refererList(10, 'global', 0, 0, 'no', 'baseDomain', '', false, false, 3600*24*1 ); ?>
    	<p>
    	<?php if( count( $res_stats ) ) foreach( $res_stats as $row_stats ) { ?>
    	<a target="_blank" href="<?php stats_referer() ?>" title="<?php stats_basedomain() ?>"><?php $baseDomain = stats_basedomain( false ); if( strlen( $baseDomain ) > 20 ) $baseDomain = preg_replace( '/^(.{18}).*/i', '${1}...', $baseDomain );  echo( $baseDomain ); ?></a> (<?php echo( stats_hit_count() ); ?>)<br/>
    	<?php } // End stat loop ?>
    	</p>
    </div>
    


    In the above example, the regular expressions are used to limit the number of characters per referrer in order to avoid them to be displayed on several lines (for very long referrers; my margin is tight).

    Any feedback is welcome.

    4 Sep 08, 2005 16:36

    I've just updated the above code in order to get the right stats in the backoffice when checking the robots and aggregators hits.

    Previously, I recommanded to modify the following lines (starting about line 329):

       if( $groupby == '' ) 
       {   // No grouping: 
          $sql = "SELECT visitID, UNIX_TIMESTAMP(visitTime) AS visitTime, referingURL, baseDomain"; 
       } 
       else 
       {   // group by 
          $sql = "SELECT COUNT(*) AS totalHits, referingURL, baseDomain"; 
       }


    by the following (correct, but not so good) code:
    [phpAbout line 329, make each IP count only once:

       if( $groupby == '' ) 
       {   // No grouping: 
          $sql = "SELECT visitID, UNIX_TIMESTAMP(visitTime) AS visitTime, referingURL, baseDomain"; 
       } 
       else 
       {   // group by 
          //$sql = "SELECT COUNT(*) AS totalHits, referingURL, baseDomain"; 
          $sql = "SELECT COUNT(DISTINCT hit_remote_addr) AS totalHits, referingURL, baseDomain"; 
       }


    The above modification makes robots and aggregators stats summary are underestimated, each robot's IP is counted only once in the summary, while it should be counted as many times as the robot accessed your blog, even from the same IP. So, the right modification should be:

    	if( $groupby == '' )
    	{	// No grouping:
    		$sql = "SELECT visitID, UNIX_TIMESTAMP(visitTime) AS visitTime, referingURL, baseDomain";
    	}
    	elseif( $groupby == 'baseDomain' )
    	{	// group by
    		$sql = "SELECT COUNT(DISTINCT hit_remote_addr) AS totalHits, referingURL, baseDomain";
    	}
    	else
    	{
    		$sql = "SELECT COUNT(*) AS totalHits, referingURL, baseDomain";
    	}


    The code appearing on the original post above has been correctly updated.

    I have never seen that case happening, but referrer spammers might use fake user agent strings including their domain to promote instead of using the referrer string. They would become "user agent spammers"! ;)


    Form is loading...