Recent Topics

1 Sep 07, 2008 08:55    

There should be a better word count function since the current one doesn't count non-ASCII words, or at least it doesn't count Russian words :-/

By the way the default PHP function str_word_count sucks, even with plain ASCII text it counts characters - and ' as words!!!

I used inc/items/model/_item.class.php file from b2evo 2.4.2 as text source.

str_word_count - 12319
my_word_count - 11220

I was wondering why these results have about 10% difference... until I checked the words 8|

str_word_count( $string, 1 )


All these extra words were characters ' and -. Then I removed fake words with this code

function array_fix( $array )
{	
	$arr = array();
	foreach( $array as $str )
	{
		if( eregi("[A-Za-z]", $str) )
			$arr[] = $str;
	}
	return $arr;
}

echo str_word_count( implode( "\n", array_fix( str_word_count( $string,  1 ) ) ) );

And it finally displayed the same number as my function

echo 'str_word_count - '.str_word_count($string);
echo '<br />str_word_count fixed - '.str_word_count( implode( "\n", array_fix( str_word_count( $string,  1 ) ) ) ); 
echo '<br /> my_word_count - '.my_word_count($string);

str_word_count - 12319
str_word_count fixed - 11220
my_word_count - 11220

Here's my_word_count function, it counts non-ASCII characters (tested on utf-8 only) and doesn't count ' and - as words.

function my_word_count( $str, $format = 0, $strip_tags = false )
{
	if( $strip_tags )
		$str = trim(strip_tags($str));
	
	$words = 0;
	$array = array();
	
	// Remove everything except letters, ' and -
	$pattern = "/[\d\"^!#$%&()*+,.\/:;<=>?@\]\[\\\_`{|}~ \t\r\n\v\f]+/";
	$str = @preg_replace($pattern, " ", $str);
	$str_array = explode(' ', $str);
	
	foreach( $str_array as $word )
	{
		if( @preg_match('/[A-Za-z\pL]/', $word) )
		{	// Check if the $word has at least one letter
			$array[] = $word;
			$words++;
		}
	}	
	if( $format == 1 )
		return $array;	
	return $words;
}

Example #2

$string = " one and two - ' -- '' ";

echo 'str_word_count - '.str_word_count($string);
echo '<br />str_word_count fixed - '.str_word_count( implode( "\n", array_fix( str_word_count( $string,  1 ) ) ) ); 
echo '<br /> my_word_count - '.my_word_count($string);

str_word_count - 7
str_word_count fixed - 3
my_word_count - 3

3 Sep 09, 2008 00:16

Thanks,

I forgot to mention, to use this function create new file conf/_config_TEST.php and copy the function there.
Then open file inc/items/model/_item.funcs.php and edit the following code starting on line 255

function bpost_count_words($string)
{
	return my_word_count( $string, 0, true );
	
	$string = trim(strip_tags($string));
	if( function_exists( 'str_word_count' ) )
	{ // PHP >= 4.3
		return str_word_count($string);
	}

Good luck


Form is loading...