Utf-8

UTF-8 is an elegant unicode encoding. Every ascii sign that uses 7 bit is written as this ascii sign, using 1 byte. In that case, the first bit is 0. As soon as the first bit is 1, utf-8 takes a second byte to find out what sign is represented. Let's take the editor yudit to create an example. Save a text u.txt, unicode format utf-8. Only write an "A" into the text. cat u ll u hexdump u Shows you, that this text consists of one byte, an "A", character 41h, 65d, 01000001b. Now, save a single &Auml;. With ll, you see that your text (consisting of one sign) needs two bytes. With cat, you see, that your &Auml; has been saved correctly. With hexdump, you see, that your text needs two bytes, 84 c3, and both have their first bit set.

As a test, use the following lines to output all 256 ascii signs: for l in $(seq 0 1 7); do for i in $(seq 0 1 7); do for n in $(seq 0 1 7); do \ echo -en "\0${l}${i}${n} ";done;  done;done

= See also =
 * unicode
 * utf-16
 * http://en.wikipedia.org/wiki/Utf-8