eEcho blog

A journey of a thousand miles starts with a single step.

Character Sets and Collations

A character set is a mapping from binary encodings to a defined set of symbols; you
can think of it as how to represent a particular alphabet in bits. A collation is a set of
sorting rules for a character set. In MySQL 4.1 and later, every character-based value
can have a character set and a collation.* MySQL’s support for character sets and
collations is world-class, but it can add complexity, and in some cases it has a perfor-
mance cost.

Defaults for creating objects
MySQL has a default character set and collation for the server, for each database,
and for each table. These form a hierarchy of defaults that influences the character
set that’s used when you create a column. That, in turn, tells the server what charac-
ter set to use for values you store in the column.

Settings for client/server communication
When the server and the client communicate with each other, they may send data
back and forth in different character sets. The server will translate as needed:
• The server assumes the client is sending statements in the character set specified
by character_set_client.
• After the server receives a statement from the client, it translates it into the char-
acter set specified by character_set_connection. It also uses this setting to deter-
mine how to convert numbers into strings.
• When the server returns results or error messages back to the client, it translates
them into character_set_result.

You can use the SET NAMES statement and/or the SET CHARACTER SET statement to
change these three settings as needed. However, note that this command affects only

the server’s settings. The client program and the client API also need to be set cor-
rectly to avoid communication problems with the server.
Suppose you open a client connection with latin1 (the default character set, unless
you’ve used mysql_options( ) to change it) and then use SET NAMES utf8 to tell the
server to assume the client is sending data in UTF-8. You’ve created a character set
mismatch, which can cause errors and even security problems. You should set the
client’s character set and use mysql_real_escape_string( ) when escaping values. In
PHP, you can change the client’s character set with mysql_set_charset( ).

Comments are closed.