Recent Topics

so any solution to HTML character entities problem in UTF-8?

Started by on Dec 18, 2005 – Contents updated: Dec 18, 2005

Dec 18, 2005 06:17    

i can't edit my arabic posts in the backoffice because it doesn't display correctly. it displays the posts in HTML character entities instead!

i attempted several tricks to solve the problem.

1. i changed the character set in the meta tag in _menutop.php of /admin/ from:

<meta http-equiv="Content-Type" content="text/html; charset=<?php locale_charset() ?>" />


to


<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />


it didn't work. so, i replaced UTF-8 with Windows-1256 for Arabic, and it still didn't work.

2. i used php function html_entity_decode() to try to translate input fields in the form. while

<?php
echo html_entity_decode('#1578;#1580;#1585;#1576;#1577;');
?>

note: a '&' sign should be added in front of each '#' sign above. i removed it so that it renders correctly for this example.


in a file by itself (ie convert.php) worked, the function didn't make any difference when i put it in _item.form.php line 139 like this:

$Form->text( 'post_title', html_entity_decode($post_title), 48, T_('Title'), '', 255 );


so i went to the _form.class.php in /evocore/ and tried to modify the text function directly, but it still didn't produce any results.

may be someone more experienced can help us out!

thanks,

Dec 18, 2005 10:14

by html entity do u mean: %20, &nbsp, or \" ?

Dec 18, 2005 10:20

these #1578;#1580;#1585;#1576;#1577 (with a '&' in front of each '#') are character entities. browsers automatically convert these entities into human-readable format. see, i will write the same stuff above but with the & added, and look what i get تجربة (<- in my computer, this is normal Arabic text, even though i wrote it with &#numbers;)

Dec 18, 2005 10:23

i may be wrong, i'm merely uttering what my 30 minutes of research revealed.

Dec 18, 2005 10:28

alrite its out of my league, u'll need to wait for a experienced person.

And you can still get &s to display correctly just throw them in a php tag ;)

//Someone fix the php tag problem
// تجربة

Do arabians read backwards btw coz when i was deleting those arabic characters it didnt do them in the expected order.....

Dec 18, 2005 10:40

the character entities are displayed as perfect arabic (for the most part) outside of the backoffice. only in the backoffice's input forms and fields is the arabic text converted back to character entities...
using the backoffices input forms and fields to enter arabic text for the first time works fine. this problem happens only when you try to display the text in those forms later.

yes, arabs (& hebrews) read from right to left. everything is right to left oriented, and that makes localizing b2evo twice as hard. if it's not handled correctly, the text can get garbled and messy.

oh, i just got the PHP tag thing...hehehe...thanks!

Dec 18, 2005 10:54

the only other thing i can think of is mayb when the $Form property is actually displayed it encodes the characters again....

Try finding the exact spot where $form is printed and perform ur decode there....

And mayb try adding this so u can make sure ur decode is working:

//FIX THE DAM PHP TAGS
<script language="javascript" type="text/javascript">alert("<?=html_entity_decode($post_title)?>");</script>

Dec 18, 2005 11:09

is there a missing 'php' in '<?' ? like should it be '<?php' or is this something else?

Dec 18, 2005 15:21

<?=$variable?>


Is short for

<?php echo($variable); ?>


So yes it is something else ;)

It will need to be placed outside a php tag if you don't know what i'm talking about just put this directly above the line that u tried html_entity_decode on before:

//FIX THE DAM PHP TAGS (Code attached is below)
?><script language="javascript" type="text/javascript">alert("<?=html_entity_decode($post_title)?>");</script><?php 

Dec 18, 2005 22:34

msafi, what b2evo version are you using?

I think the problem is format_to_output() that gets used by the Form class.

I hope that you're not using Phoenix, because it looks like it should be fixed there. The function get_field_params_as_string() formats the input's value as "formvalue" and other attributes as "htmlattr".
I suppose that your b2evo version does not differenciate between "value" and other attributes and therefor uses convert_chars() for the "htmlattr" format in format_to_output()..

But from the file names you've mentioned, you are probably using phoenix.

So, probably it is a problem already when inserting it into DB.. how do the values look in DB?

Dec 18, 2005 22:44

i just checked phpMyAdmin for the value in the database. they are in html character entity format.

Dec 19, 2005 04:57

I see.

The fix I see would be to decode numeric entities in POSTed data. Then it would be inserted into DB correctly.

But when displaying, we don't want to convert everything into numeric entities again, so the Charset must be set properly. This is currently the one of the user's locale (UTF-8 in your case).

It will lead to problems when someone visits your blog and his locale gets detected as not-UTF8, as b2evo would use this user's locale in the http-equiv Content-type meta tag.

It even gets more complicated when Apache sends a Content-type header (AddDefaultCharset directive) - currently b2evo only sets the http-equiv meta tag; we might (and should IMHO) use header() to override any default charset from the webserver.
And we should always use UTF-8 as charset (because of mixed content).

Ok. Confused?

Here might be the fix:
1. Make sure you always send UTF-8 as character set. Either use

header( 'Content-Type: text/html; charset=UTF-8' );

at the top of your skins' _main.php (so it always gets send, but not for your RSS feeds) or add a AddDefaultCharset directive to your webserver (in .htaccess vor example).

2. Add the following two functions to /conf/hacks.php (create this file, if it does not exist yet).


/**
 * Decode numeric entities (&#\d+ and &#x\d+).
 *
 * @param string|array
 * @return string UTF-8 string
 */
function decode_numeric_entities( $mixed )
{
	if( is_array( $mixed ) )
	{
		foreach( $mixed as $k => $v )
		{
			$mixed[$k] = decode_numeric_entities( $v );
		}
	}
	else
	{ // convert numeric entities to chr (unicode)
		$mixed = preg_replace( '/&#(\d+);/me', 'utf8_chr($1)', $mixed);
		$mixed = preg_replace( '/&#x(\d+);/me', 'utf8_chr(0x$1)', $mixed);
	}
	return $mixed;
}

/**
 * Return a character code utf8 encoded.
 * From {@link @link http://de.php.net/manual/en/function.html-entity-decode.php#57613}.
 */
function utf8_chr($code)
{
	if($code<128) return chr($code);
	else if($code<2048) return chr(($code>>6)+192).chr(($code&63)+128);
	else if($code<65536) return chr(($code>>12)+224).chr((($code>>6)&63)+128).chr(($code&63)+128);
	else if($code<2097152) return chr($code>>18+240).chr((($code>>12)&63)+128)
                               .chr(($code>>6)&63+128).chr($code&63+128);
}

Third, but not least, edit the function param() in /evocore/_misc.funcs.php:
Find

		if( isset($_POST[$var]) )
		{
			$$var = remove_magic_quotes( $_POST[$var] );
			// $Debuglog->add( 'param(-): '.$var.'='.$$var.' set by POST', 'params' );
		}


and add our new function around it:


		if( isset($_POST[$var]) )
		{
			$$var = decode_numeric_entities(remove_magic_quotes( $_POST[$var] ));
			// $Debuglog->add( 'param(-): '.$var.'='.$$var.' set by POST', 'params' );
		}

Now, when a param is taken from POSTed data, it will decode the numeric entities to UTF-8.
Note that this will not work if register_globals (PHP setting, see phpinfo()) is on, because the $$var in param is already set then. (When this fix should be integrated into b2evo it may make sense to look if $_POST[$$var] isset() and then apply the fix also to already set params.)

Please let me know, if and how it works for you.

Dec 19, 2005 05:25

register_globals is On. i don't think i can contact yahoo webhosting and ask them to turn it off!! does that mean i cannot apply the hacks in the post?

Dec 19, 2005 05:57

Ok, so we make it a little bit more dirtier, but it should do the same:

Add the following to param() in /evocore/_misc.funcs.php (instead of what I've said in the last post). Add it before

	// type will be forced even if it was set before and not overriden

.

	// HACK start
	if( isset( $_POST[$var] ) && remove_magic_quotes($_POST[$var]) == $$var )
	{ // we're assuming that it was POSTed (this is register_globals safe).
		$$var = decode_numeric_entities( $$var );
	}
	// HACK end

Dec 19, 2005 06:04

i see

nt color="#000000">

in all your php tags. could you please tell me what it means? it's just a displaying error that shouldn't be included in the actual code, right?

Dec 19, 2005 06:08

Yes, this forums is buggy on using [ php ] tags.. :/

Dec 22, 2005 06:18

There is no file called:

/evocore/_misc.funcs.php
:!:

Dec 22, 2005 06:48

I have that file and i'm running 1.6 alpha, you never said which version of b2evo you are running?

Dec 22, 2005 08:05

Okay, i tried the hacks. unfortunately, they don't work. even the values in the database are still the same.

Dec 23, 2005 04:37

Hi blueyed,

I'm using V. 09.0.12
and have located the functions you are talking about in /b2evocore/_functions.php.

When I tried to implement the code you suggested above, all get is blank page as my main page.

If I remove the functions from hacks.php, every thing goes back to normal (my old blog)

Any ideas?

Dec 23, 2005 04:44

kskhater,

you might have miss-typed some thing some where.. make sure your syntax and everything else is in order.

Dec 23, 2005 06:32

This is the code I pasted to hacks.php


function decode_numeric_entities( $mixed )
{
    if( is_array( $mixed ) )
    {
        foreach( $mixed as $k -> $v )
        {
            $mixed[$k] = decode_numeric_entities( $v );
        }
    }
    else
    { // convert numeric entities to chr (unicode)
        $mixed = preg_replace( '/&#(\d+);/me', 'utf8_chr($1)', $mixed);
        $mixed = preg_replace( '/&#x(\d+);/me', 'utf8_chr(0x$1)', $mixed);
    }
    return $mixed;
}

/**
 * Return a character code utf8 encoded.
 * From {@link @link http://de.php.net/manual/en/function.html-entity-decode.php#57613}.
 */
function utf8_chr($code)
{
    if($code<128) return chr($code);
    else if($code<2048) return chr(($code>>6)+192).chr(($code&63)+128);
    else if($code<65536) return chr(($code>>12)+224).chr((($code>>6)&63)+128).chr(($code&63)+128);
    else if($code<2097152) return chr($code>>18+240).chr((($code>>12)&63)+128)
                               .chr(($code>>6)&63+128).chr($code&63+128);
} 

And this the code I added to /b2evocore/_functions.php:


   // HACK start
    if( isset( $_POST[$var] ) && remove_magic_quotes($_POST[$var]) == $$var )
    { // we're assuming that it was POSTed (this is register_globals safe).
        $$var = decode_numeric_entities( $$var );
    }
    // HACK end 
	// type will be forced even if it was set before and not overriden

Note: I've this hack for post rating in hacks.php before I added your two functions:


DEFINE("IMAGEROOT", "http://www.alkhater.net/blog/img/stars/");  // set the image path to show star
DEFINE("MAX_RATE", "50");        // more popular site adjust higher

function post_stars($p_ID, $show_img= true){
   global $DB;
   $sql_query = 'SELECT post_rating FROM post_rate WHERE post_ID = ' . $p_ID;
   $rating = $DB->get_var($sql_query);
   $alt = "alt=\"\"";
   if( $show_img ){
      if( $rating == 0 )
         $file="rating_0";
      if( ($rating > 0) && ($rating < MAX_RATE/10 ) ) $file="rating_0";
      if( ($rating >= MAX_RATE/10) && ($rating < 2*MAX_RATE/10) )  $file="rating_1";
      if( ($rating >= 2*MAX_RATE/10) && ($rating < 3*MAX_RATE/10) ) $file="rating_2";
      if( ($rating >= 3*MAX_RATE/10) && ($rating < 4*MAX_RATE/10) ) $file="rating_3";
      if( ($rating >= 4*MAX_RATE/10) && ($rating < 5*MAX_RATE/10) ) $file="rating_4";
      if( ($rating >= 5*MAX_RATE/10) ) $file="rating_5";
  //    if( ($rating >= 6*MAX_RATE/10) && ($rating < 7*MAX_RATE/10) ) $file="7";
  //    if( ($rating >= 7*MAX_RATE/10) && ($rating < 8*MAX_RATE/10) ) $file="8";
  //    if( ($rating >= 8*MAX_RATE/10) && ($rating < 9*MAX_RATE/10) ) $file="9";
  //    if( ($rating >= 9*MAX_RATE/10) ) $file="10";
      echo "Often Visited Rating: ";
	  echo "<img src=\"" . IMAGEROOT . "$file.gif\" $alt >";
   }
   else {
      if( $rating == '' )$rating = 0;
      echo "views: $rating";
   }
}

function save_rate($p_ID){
   global $DB;
   $sql_query = 'SELECT post_rating FROM post_rate WHERE post_ID = ' . $p_ID;
   $post_rating = $DB->get_var($sql_query);
   if( $post_rating > 0 ){
      $post_rating++;
      $sql_query = "UPDATE post_rate SET post_rating =  $post_rating  WHERE post_ID =". $p_ID;
   }
   else{
      $sql_query = 'INSERT INTO `post_rate` ( `post_ID`, `post_rating`) VALUES (' . $p_ID . ',1)';
   }
   $DB->query($sql_query);
}

Dec 23, 2005 07:29

as a start, the arrow in the first 'foreach' should be a => not ->

check that first and see if it works.

Dec 23, 2005 07:35

I've tried that before but still, it did not work!

Thanks

Dec 23, 2005 07:51

it could be because you're using an old version of b2evolution (ie amsterdam). the hacks blueyed wrote me were specifically for phoenix alpha (ie 1.6), i think. we'll have to wait and see what he says.

Dec 23, 2005 08:23

msafi,

How do find phoenix alpha (ie 1.6). Is it working fine with Arabic?

I carried out lots of hacking to get my Arabic blog the way it's now. What should I expect if I upgrade?

You can check my Arabic blog @ www.alkhater.net/ablog

Dec 23, 2005 08:34

i've read some of your previous posts on this forum, and i think i encountered less problems arabizing phoenix than you encountered with amsterdam.

i simply added the arabic entries to messages.po directly using notepad (not poedit) then i modified the skin for the alignment and the other stuff.

if you upgrade, i'm not sure if you'll be able to use your old .po file (most likely not because file names and placement of codes have changed), but i think you can use your old skin.

the process wasn't extremely tedious for me because all i wanted was to arabize the visitors' interface not the backoffice.

Dec 23, 2005 08:43

Thanks,

Do see any advantage in upgrading to phoenix?

I would like to visit you blog and see how it looks with phoenix.

Please send me the URL at Khalid@alkhater.net

Thanks,

Dec 23, 2005 15:51

kskhater, just a blank page usually means that it's a fatal PHP error and you're not displaying errors.

So either look at the server log files or use


error_reporting( E_ALL );
ini_set( 'display_errors', 1 );

You should put that in /conf/_config.php so that it gets loaded/executed before the parse error in hacks.php. I'm not even sure if it will work with setting it just before.

Anyway, you should see it in the error log.

Dec 23, 2005 16:17

This might sound stupid, but where I can find the server error log file?

And the code you suggested should go where in _config.php ? Is it a PHP code?

Thanks for all your time.
:oops:

Dec 23, 2005 16:34

You'll find the server error logs on your server, usually in /var/log/apache2, /var/log/httpd, /var/log/apache or something like that. Probably, if you've not seen your logs, you could have no access to them.

Yes, the part for config.php is php code.

Dec 23, 2005 18:44

Still I get a blank page without any error message!

These two functions are suppose to translate the Edited message in the backoffice from HEX to the correct form.

I Just wonder why when I go to the EDIT tab in the BO, I get the post in arabic but when I click edit, it goes back to HEX.

And why some variable that are used in BO translate and some don't, even though they are of the form, just an input text and even textarea. For ex.

Blog
Full name
Short name
Tafline

Do no translate,
while
Long Description:
Tanslate OK.

I used the function format_to-output()

with the above three but it did not work.

This is why I call it a mistry!

Dec 23, 2005 19:13

These functions are meant to not transform the params TO numeric entities ON SAVING. So you would have to edit your post, save it, and then it should show up without the entities.

However, given a blank page, you cannot actually do that. There might be just a bad typo (e.g. when you've copied it from the forum something got replaced by a "smiley"-arrow) somewhere.

Dec 23, 2005 19:41

I followed the code line by line and I can't see any problem.
are you sure this work. Did someone try it and found it working?

I had the same problem with comment but I solved it by format_to_output()

But the char where not Hex, they were those strange ones although when I look it up in MySql they show up as hex.

Dec 23, 2005 19:55

kskhater, it's hard to see from your post where you say what you've pasted where if you've messed up with the "<?php" tags for example.


<?php
echo 1;
<?php
echo 2;


for example would produce a parse error and without error output a blank page.

Please try to find the error with some decent editor, that supports syntax highlighting.

It is hard for me to understand what is wrong. The "hack" I've provided was meant to test if it fixes this issue.

Also try to be clear with your terms. What do you mean by "hex"? I think you mean "numeric entities", but I cannot be sure.

Dec 23, 2005 20:08

First, i use Notepad++ which is an excellent editor with color highlights and collapseable section so you can see where each section start-end.

The Hex I'm talking about is this:
&#****
where **** is some number.

Mar 12, 2006 16:47

wtf? the linked post above has gone..

Mar 12, 2006 16:51

blueyed wrote:

wtf? the linked post above has gone..

Remember back when the forums were hacked lots of stuff got lost. The time frame between most recent backup and the assault was significant. No promises that this is the reason the above referenced post is lost - only that I suspect it is highly likely.


Form is loading...

Build your own site! – This forum is powered by b2evolution CMS, a complete engine for your website.