## Storing non-English characters in the DB

Started by on May 27, 2008 – Contents updated: May 27, 2008

#### #1edbMay 27, 2008 07:11

How do I do that? I'm (still) trying to make a 'plugin translation support system' and it's working pretty good except for some stuff. Specifically, when I upload the French translation of the core (from messages.po of course) it does a rather nice job of making a useful _global.php file. Unfortunately it doesn't store the translated bit at all well.

For example

Code

 #: ..\..\..\summary.php:32 #: ..\..\..\summary.php:75 msgid "Summary Demo" msgstr "Démo Résumé"
comes from the messages.po file, and my _global.php file shows what it is supposed to

Code

 'Summary Demo' = > 'Démo Résumé',

So far so good, except I want to store the translations in a table the plugin made. A line for each translatable string with all existing translations for that particular string. This way IF a string shows up in a plugin that happens to have already been translated then I can automagically create a _global.php file for the plugin even though I personally know no other languages. Therefore when my plugin writes each line in _global.php it also stores the values as "translatable_text" and "translated_fr_fr", which is where the problem comes in. For this particular line, my table shows

D

for the French translation of "Summary Demo". So obviously it choked on the "é" character.

I'm quite certain b2evolution has a way to deal with storing non-english characters. I'm hoping there is a reasonably easy way to take advantage of it when this plugin is doing it's thing. Do I need to somehow temporarily set the locale to *whatever* during execution the way $Item->locale_temp_switch() does for a skin? Or is there a cool function I can call prior to or at the moment of inserting the stuff into the database? Right now after I have$english_bit and $other_bit and write them to _global.php I try to put them in the database table with Code  $query = "INSERT INTO ".$this->get_sql_table('translations')." SET plugin_name = '".$DB->escape( $plugin_name )."', plugin_version = '".$DB->escape( $plugin_version )."', translatable_text = '".$DB->escape( $english_bit )."', ".$translated_locale." = '".$DB->escape($other_bit )."'"; if( ! $DB->query($query ) ) {   $this->bad_news .= ' INSERTING the translation table failed :( ';  } There is of course a similar line for if I already have$english_bit in there but not in this language so that, in theory, I can follow the French with the German and the Russian and the Turkish and so on and so forth.

AND IT'S ALL A WASTE OF TIME IF I CAN'T STORE NON-ENGLISH CHARACTERS BUT I KNOW I CAN BECAUSE NON-ENGLISH-BLOGGING BLOGGERS DON'T GET "D" WHEN THEY WANT "Démo" SO IT MUST BE STORING THEM BUT DAMMIT I HAVE NO IDEA HOW!

I should just give up :'(

#### #2yabbaMay 27, 2008 09:16

It might be worth changing your table to UTF-8 ..... dunno if you need to fake a locale when storing to make it all happen because english works for me ;)

¥

#### #3ianlewisMay 27, 2008 10:50

It should work to make sure that all your tables are set to UTF-8 or whatever you are using. For what it's worth I've been blogging using Japanese characters for quite a while.

ALTER TABLE table_name CHARACTER SET utf8;

That might not change the collation on the fields so be careful.

It might also be good to set the database's default encoding to UTF-8 as well. Something like the following should do that.

ALTER DATABASE db_name CHARACTER SET utf8;

#### #4edbMay 27, 2008 19:04

Already set to utf-8 (or is it utf8 ... or UTF8?) in the database, compliments of Afwas' well-documented method to make that happen. This is an installation that has no past so it was easy to make it be purely utf8 yah?

Anyway the problem exists in this utf8'd installation so I figure there must be something b2evolution does to a post with non-english to get it to store properly.

hmmm... I can just follow the comment path I guess because when ¥åßßå leaves a comment on my web it stores and plays back as those characters. This would be an old-fashioned "not utf-8" database by the way. Except when I had to restore stuff due to a mistake on my part. I then got junk where his name belongs so I went in and manually altered the database to have the right stuff in it.

hmmm... I can also fake a messages.po file and have "¥åßßå" be the translation for something and see if I can store that. I noticed the issue with the French translation of Summary Demo because it was the earliest entry in the file and therefore the earliest for me to see. I did not dig deep and find if ALL non-english characters get bonked.

Must be something it is doing to make it happen!

#### #5yabbaMay 27, 2008 19:07

Code

 #: ..\..\..\sanity.php:1 #: ..\..\..\displaced.php:7 msgid "¥åßßå" msgstr "Réplace Usér and Réboot"

:D

¥

#### #6edbMay 27, 2008 19:15

Nah man msgid is the English bit. msgstr is the fancy character bit. Like this:

Code

 #: ..\..\..\in\sanity.php:1 #: ..\..\..\displaced\oxygen.php:7 msgid "All your base are belong to us" msgstr "Usér ¥åßßå friéd sérvér"

BTW YES I will use that in my test file. Drop it into the top of the French translation, then hack off about 9000 lines from the end of it. When I first ran my little plugin I thought I was actually going to pull a ¥åßßå and take down my server given how long it was taking to read a line and figure out what to do with it ;)

#### #7edbMay 27, 2008 23:56

I think this is going to be a dead end for me.

I can post all sorts of characters no matter what my locale is set to, and I can see them stored in the database as the actual characters, but I can't store them as they are via a fairly straightforward query.

I tried echoing the values before and after "DB->escape" and they do not show properly that way.

When the actual output file is stored in a format that will open up in a viewing window (like .php for _global.php) I get ? for non-English characters, but when I make it be a file that automagically gets downloaded the characters are properly visible. Creating the file happens before trying to fill in the database, but hey I was looking for clues and found that I can not store the file in the format it needs to be used in ... but I haven't made .php files be forced to a download which will probably fix that problem.

I looked into (but haven't played with) how I can take advantage of b2_htmltrans in conf/_formatting - it seems to be all about displaying what the database has in it. Same with the convert_chars and convert_charset functions - useful for displaying but doesn't help me actually store stuff like à or é or whatever.

Obviously b2evolution can store these characters in the database, but I sure can't figure out how it gets done :'(

#### #8sam2kbMay 28, 2008 01:13

Can you run this query and check if the blog name is correct?

Code

 mysql_query("UPDATE evo_blogs SET blog_name='йёщяъ' WHERE blog_ID=1");

You may want to save the file in UTF-8 first.

#### #9edbMay 28, 2008 01:41

Wow those are really cool characters! English is boring :(

Okay I'll give it a shot, but I don't understand "save the file in UTF-8 first". What file? I will backup the database first, then I figured on doing a copy/paste from the forum to a file with enough stuff to connect to the database. I guess the part I don't understand is how I would save a file with a charset ?

#### #10sam2kbMay 28, 2008 01:54

Wow those are really cool characters! English is boring

There are some more funny letters, but I'll keep them for other queries ;)

You need to save the file (with code to connect to db, where you are going to paste this query) in UTF-8 or you'll see ??????. Open in text editor, select save as and choose UTF-8.

I have no Idea how to do it on Macs :( , but I'm pretty sure your editor can do it.

#### #11edbMay 28, 2008 01:56

Actually I just found out the hard way that good old notepad was the friendliest for this task. Uploading and testing ... NOW.

Okay that was interesting. So okay notepad saved the file as UTF-8 after telling me I would lose my stuff if I saved it ANSI style. I then opened the file with html kit (my preferred editor) and saw your characters as pretty much junk. Specifically, I see Ð¹Ñ‘Ñ‰ÑÑŠ in that editor, but re-opening in notepad shows me йёщяъ so I feel confident the characters are stored properly in my quick_hack.php file.

Running that file results in the following on my monitor: ï»¿

There is, of course, nothing other than that to view when I view-source because quick_hack.php was not designed to actually output anything.

Now when I view the database I get the following in that field: Ð¹Ñ‘Ñ‰ÑÑŠ which, of course, is what is showing up when I actually look at the blog.

For the record: the collation on that particular field in that particular table is utf8_unicode_ci, as it is for all other fields that have anything in that column and all tables in that database. This is what I get after having followed the instructions that Afwas rounded up and documented concerning how to make a new b2evolution blog be friendly with UTF-8.

#### #12sam2kbMay 28, 2008 02:22

I forgot one more thing...
Try to add this stuff after select_db

PHP

 mysql_query("SET NAMES 'utf8'"); mysql_query("SET collation_connection='utf8_general_ci'"); mysql_query("SET collation_server='utf8_general_ci'"); mysql_query("SET character_set_client='utf8'"); mysql_query("SET character_set_connection='utf8'"); mysql_query("SET character_set_results='utf8'"); mysql_query("SET character_set_server='utf8'");

If it won't help try to paste this query in phpMyAdmin.

#### #13edbMay 28, 2008 02:42

HOORAY!

Hey if you get rid of the fourth character it probably stops being a word but it also looks sort of like it could be "noob" ;)

Can you make an explanation of what these 7 lines do? Also for my purposes here would I need to do these SET commands once in the plugin or each time I am about to perform another INSERT or UPDATE command?

But yeah HOORAY because now I know of a method that actually lets me store non-English characters in a database without resorting to posting it via the back office.

#### #14sam2kbMay 28, 2008 02:53

Great!
These lines do the same thing as \$db_config['connection_charset'] = 'utf8';

I think you only have to do it once right after you select a db.

And you can delete some of these lines, try to leave the first line only.

#### #15edbMay 28, 2008 03:22

Cool. I'll report back any and all successes I have with this. At the moment my mission has changed to "mowing the lawn". Funny how something that gets no water still manages to grow enough to become an issue...