Setting the Characters

metallic type

Looking for the lessons? Get started!

Quick description of “charsets,” and a recommendation.

The entire Web Design Principles section can be accessed through the menu below.

Unfortunately, not everyone’s browser will support the spiffy JavaScript menu behind that innocent-looking button. If your browser won’t display the menu, just click on the button and you’ll be taken to a separate page with the entire menu displayed in clear, non-JavaScript HTML.

Did you ever get an email from your friends in Bulgaria with the subject line "???? ?????? ??? ????"? — Joel Spolsky

html icon

Charsets, or character sets, are one of those things that you can set early and forget about — or you can not set them properly and have easily avoidable trials and tribulations as a result.

Put it in the head of your HTML page, right under the <head> element. It comes first of all your <head> elements, always.

My friend Tommy Olsson gives us an excellent definition:

A character set is the total set of abstract characters that we have at our disposal. For HTML, the standard character set is ISO 10646, which is virtually the same thing as Unicode. It is a set of tens of thousands of characters representing most of the written languages on the planet.

Here’s some basic information about charsets. My first advice is to use this particular one in every design you make, unless you have a good reason not to do so:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Having said that, you should know that you will also run into this one quite often:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

So what’s the difference? The difference is in the ending: UTF-8 vs. ISO-8859-1.

And your response is, "So?"

I could give you a longwinded, abstrusely technical response that would put all of us to sleep (or you could do some research yourself, if you’re interested). However, let me see if I can sum this up in a quick sentence or two.

Charsets set the character encoding for a Web page. The charset ending in UTF-8 supports almost all the characters (letters, numbers, and symbols) you’re ever likely to use in a Web page. It’s based on Unicode, which is the computing industry’s standard for transforming computer code gibberish into recognizable letters, numbers, and symbols. As for ISO-8859-1, it only recognizes 191 letters from a Latin-based script. It recognizes English-based letters and numbers, but not all of the punctuation symbols and other characters, especially ones used in foreign languages.

If you use ISO-8859-1, at some point you’ll find that your browser doesn’t recognize a punctuation mark or a symbol you’ve typed. And you might get angry e-mails from users asking why they can’t read your work. So just avoid the controversy, use the UTF-8 charset, and move on to something more interesting.

Olsson has written much more extensively and informatively on charsets.

Note: There are times when using the ISO-8859-1 standard is appropriate. By the time that occurs, you’ll know why you need to do it, and you won’t worry about this little lesson. And, there are lots of other charsets. If you need them, you’ll know it. One example: as Olsson writes, “Avoid Windows-1252 on public web pages, since it’s a Windows-specific encoding. Use ISO 8859-1 instead (or ISO 8859-15, if you need the Euro sign).” A lot of sites recommend the Windows-1252 charset, and some site creation software packages use it. Don’t you make that mistake. You can find out more about the various 8859 variants from this site.

2nd Note: Also, the charsets must be written just so, including the rather persnickety placement of the quotation marks. Just copy and paste it directly from this page. Easier on everybody.