There’s a lot to learn about character encoding – the good news is that unless you really want to do so, you shouldn’t bother. If you’re looking for a recommendation, just use UTF-8. Its rapidly becoming the new ASCII because its reasonably efficient, supported by everything, and can comfortably handle transmitting every character a modern font might support.
Your goal should be to do absolutely everything using this encoding. That applies to source-code files you save on your personal machine that might have literal text in them, configuration properties that tell your database how to sort strings, you name it. Rather than a long explanation, this will be a quick reference to tell you how to set this in many of the places you’ll need to:
Assuming that you’re correctly authoring your webpages (static or dynamic) in UTF8, you need to tell the world to expect that encoding. The best way to do that is to include an HTTP header such as:
Content-Type: text/html; charset=UTF-8
The first part of the content-type should reflect what you’re actually sending – which tells the receiver how the to use the file – the second type tells it how to read the file. You can also send the encoding within the file, for example in a meta tag, but that only works with XML or HTML files and will break the first time you find yourself needing to push something like a CSV, so get the headers right and don’t worry about it again.
Email servers include headers as well. The appropriate header to set will be
Content-Type: text/html; charset="UTF-8"
This will also allow you to properly send complex characters in parts of the email (such as a text version, or the subject line) that would be interpreted before an HTML meta tag was referenced.
If you have access to your my.cnf file, set the following parameters:
[client] default-character-set=utf8 [mysqld] default-character-set=utf8 default-collation=utf8_unicode_ci character-set-server=utf8 collation-server=utf8_unicode_ci
Otherwise, when you create your database, use the following syntax:
CREATE DATABASE db_name CHARACTER SET utf8 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT COLLATE utf8_unicode_ci ;
When you create your database use the following syntax:
CREATE DATABASE db_name WITH ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' ;
When you create your database, specify the AL32UTF8 character set:
CREATE DATABASE db_name CHARACTER SET AL32UTF8 NATIONAL CHARACTER SET AL16UTF16 ;
SQL Server natively uses UTF-16. If you’re using a Microsoft stack, you’re probably running in UTF-16 anyway, so you won’t have to worry about communicating with the database at least. If you’re not, make sure that you convert to-and-from in your DB wrapper, whichever one you choose.
The bad news is that Couchbase doesn’t support different collation or encoding settings. The good news is that it uses UTF-8 as its fixed default, so everything’s fine here.
MongoDB uses UTF-8 as its native representation too, as do most other modern platforms.
Make sure that these variables are set in any environment (interactive or automatic):
export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 export LANGUAGE=en_US.UTF-8
Windows and its standard libraries inherently “think” in UTF-16. Make sure that every text editing application you use is manually set to expect (and write) UTF-8. Expect to change everything you use for development (IDEs, appservers, etc) and be suspicious if you can’t find a way to do so.
Go to terminal > preferences > advanced and make sure that UTF-8 is set as your character encoding. Also make sure that “Set locale environment variables on startup” is checked.
Also, many applications seem to follow the Safari default settings, so go to Safari > Advanced > Default Encoding and make sure that UTF-8 is set there as well.
If you use bash, add the following to your ~/.inputrc file:
set meta-flag on set input-meta on set convert-meta off set output-meta on
Add the following to your httpd.conf file:
Add this as an attribute to the connector element in your server.xml file:
Add this to your php.ini file:
default_charset = "utf-8"
Set the following in your Web.config file:
Ruby on Rails
Ruby 1.9 and above already defaults to UTF8. You’re good to go!
In Hudson’s /configure page, go to the Global Properties section and check the Environment Variables checkbox. Add a variable pair with the name
JAVA_TOOL_OPTIONS and the value
In the configuration properties dialog, go to General > Workspace and set the Text file encoding to
In the settings dialog, go to Template Project Settings > File Encodings and set the IDE encoding to