Menü schliessen
Created: January 29th 2013
Last updated: May 1st 2020
Categories: Common Web Development,  Databases,  Linux,  MySQL
Author: Marcus Fleuti

Convert your MySQL database from ANY charset to UTF-8 with a simple script, automatically! (Konvertiere eine MySQL Datenbank mit einem einfachen Skript von irgend einem Charset nach UTF-8, automatisch!)

Donation Section: Background
Monero Badge: QR-Code
Monero Badge: Logo Icon Donate with Monero Badge: Logo Text
82uymVXLkvVbB4c4JpTd1tYm1yj1cKPKR2wqmw3XF8YXKTmY7JrTriP4pVwp2EJYBnCFdXhLq4zfFA6ic7VAWCFX5wfQbCC

The problem - converting MySQL databases from any charset to UTF-8 automatically

It often occurs that an old MySQL database is using an ancient or other charset than UTF-8. There are many ways to convert a database but most of them need the user to execute a lot of SQL commands in order to convert all the data properly.

The approach to the solution

We (LEXO) have developed a script to change the charset of files within a given directory recursively. The script does that by iterating through all files and change the charset of each file using the "iconv" program on a BA(SH) shell. you can find mor information about this here.
What we now did is extending this script so that it first converts the dump file of a MySQL database into a pure UTF-8 charset (every character in the database will be converted to UTF-8) and to make sure that it will be UTF-8 in the end the database collation and charsets within the file are being set properly to UTF-8 - automatically!

Howto

This will explain how the whole process works. It's basically very simple:

  1. Download the Linux shell script for converting the database dump file into UTF-8 here:
    Download convert_db.sh Linux shell script
  2. Create a dump of your MySQL database by either using the MySQL Workbench, the MySQL GUI tools or a shell command. I prefer the MySQL-GUI tools for creating DB dump files. For this article to be complete I'm gonna explain how to create a MySQL dump on the Linux command line:
    mysqldump --opt -Q -u DB_USER -p PASSWORD -h DB_HOSTNAME DB_NAME > sql_dump.sql
  3. Put this dump file into a directory. Let's call this directory mysqldump. Please check the the file extension of the exported database file is .sql. Reason: Our shell script checks the file extension and converts only files that end with .sql.
  4. Execute the shell script like this:
    ./convert_db.sh ./mysqldump
  5. The shell script will convert all the files in the directory mysqldump (currently there's just the one sql_dump.sql file) into pure UTF-8.
  6. Re-Import the converted database-dump into your database like this:
    mysql -u DB_USER -p PASSWORD -h DB_HOSTNAME --default-character-set=utf8 DB_NAME < sql_dump.sql

Changing the database MySQL properties to UTF-8

In order for MySQL to properly allow UTF-8 connections you need to set the following settings to your MySQL configuration (usually /etc/mysql/my.cnf):

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

Also you need to check that your online application / CMS is set to work on UTF-8 as well. But that's not part of this KB entry.
Search for the keyword UTF on our Knowledge Base to find other articles about making your application ready for UTF-8

Convert database in Windows using Notepad++

See: Howto convert a MySQL/MariaDB database from any charset into UTF-8 - with Notepad++ in Windows

[Update: 06.05.2014]

There was a little but within the convert_db.sh script which sometimes generated errors. This has been fixed.