Today I have upgraded from 5.0.6 to 5.1.2. First of all - it needs more than one try to get my database to the actual version. (12280?)
After this, I decide to upgrade my DB to utf-8. There will be an option about this in the installer too. It was the best way to kill almost 5 hrs in the afternoon. I am happy about this, really.
If I would write this script, I would implement a test routine in it. So I would get an output about the errors there will be, if it would run.
So this script changes difference words to the same entry. And produce the same entry in more than one data. This will make an error and the script stops until you change or delete the data. Great thing, really, because this all will be the same (in the evo_track__keyphrase):
- wunschmarke
- Wunschmarke
- Wünschmarke
- wünschmarke
- wuenschmarke
- Wuenschmarke
And every time you will have only one of them, it will you give an error 1062 back. You can look for the entry in the database and change or delete this. After this, you need to start the convert script again.
I have a database with german words in it. I don't know, how many entries I have which will be double after changing to utf-8 - but it needs almost 5 hrs at the moment to delete or change and start the script again.
This it not a good solution - I would say, it isn't a solution. In the error the will be no ID, so you need to search for the entry. Who has test this script before release? I guess, he/she has had only 2 or 3 double entries.
I'm not happy about this thing.
Thanks for reporting.
So let me isolate the problem:
1) Can you confirm that the DB upgrade actually worked even if you needed several passes to go to the end of it? (the requirement of several passes is because your PHP setup has a maximum time limit for scripts to run)
2) I understand the UTF-8 script generated problems. Can you confirm that ALL the problems were with the table `evo_track__keyphrase` ? If not, what other tables have problems?
3) Can you confirm that ALL the problems were due to the fact that different variations of the same word were considered to be duplicate indexes like `Wünschmarke` vs `wuenschmarke` and the solution for all problems was to delete the "similar duplicates"?
---
Note: the lowercase vs uppercase is very weird. This should not happen.
The `Wünschmarke` vs `Wuenschmarke` is a German specificity which we clearly had not tested. I also don't understand how that happens with a generic UTF-8 collection that should not be specific to German.
We will investigate of course, but please confirm the above questions as clearly as possible.