Recent Topics

UTF-8 problem restoring from backup

Started by on Dec 13, 2005 – Contents updated: Dec 13, 2005

Dec 13, 2005 14:52    

Hope you can help to solve this problem: :oops:

Situation: Just restored all comments on my blog from a backup ( created with 'mysqldump --opt -p database > backup-file.sql' ).

Problem: All my blog is in UTF-8 and the accented chars were displaying correctly before. Now, after the restore, they are all damaged.
Can it be fixed by either converting the backup file or in MySQL or PHP?

Compare: [url=http://tinyurl.com/d83lw]before[/url] and [url=http://tinyurl.com/afp3g]after[/url] (see the comments section)

Dec 13, 2005 18:14

You could probably do one of these:
a) convert the dumped file using "iconv"
b) specify a charset when importing the dump (something like "mysql --default-character-set=utf8 -u user -p < dump_file.sql") - if the dump is utf8.
c) specify a charset when exporting the SQL (?)

Dec 13, 2005 20:12

Thanks, Blueyed!

blueyed wrote:

You could probably do one of these:
a) convert the dumped file using "iconv"
b) specify a charset when importing the dump (something like "mysql --default-character-set=utf8 -u user -p < dump_file.sql") - if the dump is utf8.
c) specify a charset when exporting the SQL (?)

I tried to use 'iconv' and also 'recode' which is a bit more powerful. But it did not work - most probably the problem is in correctly determining what is the current encoding the backup is in and what is the encoding that needs to be fed to MySQL to have it display correctly in b2evolution.

I use UTF-8 in all my blog posts (and display the blog in UTF8), but just checked that my database encoding settings are "$dbcharset = 'iso-8859-1';" [since the setting says 'If you don't know, don't change this setting.' ]

Could you suggest a combination of formats to convert from and to that might work?

P.S. option (c) does not apply, as I don't have the initial data any more - was restoring from the backup because I lost data while experimenting with something :(

P.P.S. Characters in the backup look like UTF-8, but apparently are not - when I look at the backup via browser the chars still appear as a garbage.

Dec 13, 2005 20:36

Have you tried $dbcharset = 'utf-8'; then? (maybe "utf8", dunno).

What does "mysql --help|grep default-character-set" give you (second line probably)?

have you tried importing the dump via mysql, specifying a character set?

I'd try vim to see if it says "Converted" when opening the file. If it does say it here, it means it's not utf-8.

Dec 18, 2005 16:28

I did try setting $dbcharset. Did not help, but maybe this should have been changed to UTF8 when I started to use Unicode in my blog posts?

Default character set for MySQL turned out to be "windows-1257" (Latvian encoding for windows). I did try to import into MySQL with specifying UTF8 when importing, but that did not help [maybe something changed, have to explore more]

VIM does not say 'Converted' when opening the file.

It looks like the problem started in export:
- the characters are written in UTF-8
- database $dbcharset = 'iso-8859-1' - how do these UTF-8 chars no appear in the database?
- backup then tries to export these UTF-8 chars, possibly encoded in the database as 'Latin-1' (if that's what $dbcharset determines), into plain text backup file
- possibly MySQL's default charset influences how it is exported as well

Special chars in the exported file look strange - they can be as short as 3 bytes and as long as 5 bytes, which is longer than UTF-8 should be i guess. Strangely enough, a part of chars (that are 4 or 5 bytes long) display OK.

E.g.,

ū - imports back OK, is encoded as C3 85 C2 AB
ลก - not OK, is encoded as C3 85 3F (note last char is '?' - maybe that's what is wrong)


Form is loading...

Photo gallery software – This forum is powered by b2evolution CMS, a complete engine for your website.