In working with Bitweaver, I have come across a nasty side affect of using MySQL- it's handling of encoding is sucky.
I have posted into the Bitweaver Wiki about this, and am sure that it applies to many similar softwares ( I did a quick search and found Moodle and EZ-Publish forum postings with people fighting this, along with numerous other blogging softwares).
In a nutshell, the default MySQL encoding settings are Latin1. Most modern software needs UTF8. This stuffs up the saving of characters,into the database, that are unusual.
UTF8 is huge, literally hundreds of unique characters. If you need to convert the Latin1 characters in your MySQL DB into UTF8, then you will need to understand what is going on and how to fix it.
To understand the character conversion craziness, I recommend you create a new page/blog post/ whatever in your software, leaving it in draft mode, and inserting all the weird characters that your audience is likely to use, in a SINGLE COLUMN.At the same time, do it within a spreadsheet column. Use a character map for this, or if you are unsure, check this page.
Now save the page. And check it in your PHPMyAdmin(I cannot tell you how to do this- it varies hugely). You should have a single column of garbage characters. Congratulations! Copy/paste this into your next spreadsheet column.
Now you have a spreadsheet table detailing the conversions you need to do - first column= character in UTF8 that you want, second= Latin1/UTF8 hybrid mangle that fills your database and needs replacement.
Read the wiki link above, apply those settings to your software if you can. Fix the cause.
Now to clean your database. Use the Export feature of PHPMyAdmin, export with DROP Tables. The resulting SQL file is the code required to rebuild your database.
Examine the following shell script (unix/macos natively; windows users need to use their web host's command line, unless somebody wants to write a BATch file):
echo I\'ll work on "$@" ...
for i in "$@" ; do
echo -n $i...
-e 's/=latin1/=utf8/g' \
-e 's/Ã©/é/g' -e 's/Ã¨/è/g' -e 's/Ãª/ê/g' -e 's/Ã«/ë/g' \
-e 's/Â / /g' -e 's/Â«/«/g' -e 's/Â»/»/g' -e 's/Â°/°/g' \
-e 's/Ã¤/ä/g' -e 's/Ã¢/â/g' -e 's/Ã®/î/g' -e 's/Â/’/g' \
-e 's/Ã¯/ï/g' -e 's/Ã¬/ì/g' -e 's/Ã²/ò/g' -e 's/Ã´/ô/g' \
-e 's/Ã¶/ö/g' -e 's/Ã¿/ÿ/g' -e 's/Ã¹/ù/g' -e 's/Ã¼/ü/g' \
-e 's/Ã»/û/g' -e 's/Ã§/ç/g' -e 's/Ã‰/É/g' -e 's/Ãˆ/È/g' \
-e 's/ÃŠ/Ê/g' -e 's/Ã‹/Ë/g' -e 's/Ã€/À/g' -e 's/Ã„/Ä/g' \
-e 's/Ã‚/Â/g' -e 's/ÃŽ/Î/g' -e 's/Ã/Ï/g' -e 's/ÃŒ/Ì/g' \
-e 's/Ã’/Ò/g' -e 's/Ã”/Ô/g' -e 's/Ã–/Ö/g' -e 's/Å¸/Ÿ/g' \
-e 's/â€¦/…/g' -e 's/â€™/’/g' -e 's/àƒâ‚¬/ä/g' -e 's/â€˜//g' \
-e 's/â€œ/“/g' -e 's/â€/”/g' -e 's/â€¹/‹/g' -e 's/â€º/›/g' \
-e 's/â€”/—/g' -e 's/â€“/—/g' -e 's/â€?/”/g' -e 's/àƒ’/à/g' \
-e 's/â’€’™/’/g' -e 's/â’€’œ/“/g' -e 's/â’€?/”/g' -e 's/àƒ’©/é/g' \
-e 's/â’€’¦//g' -e 's/â€“/\•/g' -e 's/â‚¬/€/g' \
-e 's/Ã™/Ù/g' -e 's/Ãœ/Ü/g' -e 's/Ã›/Û/g' -e 's/Ã‡/Ç/g' $i|
sed -e 's/Ã/à/g' > $i.clean
This shell script uses SED to open your exported(and thrice backed up) SQL file, fix it, then save it with .clean at the end of the file name. The code is highly customizable, and will need to be, if you have another character set giving you grief. That above is mostly Western European and abbreviated (the Cleanup script below is "Bigger, Better, Stronger!").
Now to use it, save that code as "cleanup.sh", then do
chmod +x cleanup.sh, which makes it executable. Then: sh cleanup.sh olddb.sql (where olddb.sql is the saved name of your exported SQL file from PHPMyAdmin).
Check the file in your text editor (needs to be UTF8 aware), and search for weird characters like Â and Ã. If they are still there, then there is some more characters to add to your script.
When you are finished, create the new database in PHPMyadmin, with Collation set to UTF8_general_ci. Then use the Import feature to lift your new clean SQL file upstream.
Don't forget to change your program's settings if you've renamed your database.
For PHP/MySQL apps you will want to send the query "SET NAMES UTF8" to the database JUST ONCE, immediately after first connection, so you will need to find the right place in your code. HINT: find mysql_connect.
eg: $dbconnect=@mysql_connect($host, $user ,$password);
$setnamesutf8 = mysql_query("SET NAMES UTF8");
Resources: (requires Unix/Linux/BSD and OpenOffice Calc)
Sed for Windows
Script by Rowan inc. Greek
Thanks to schwarzvogel.de and Tommie-Lie.
Trackback address for this post
two things were missing for my case:
-e 's/ÃŸ/ß/g' -e 's/â€ž/„/g'
and sed is complaining about missing whitespace between regex and backslash at line 6 and line 20
This post has 65 feedbacks awaiting moderation...