Unicode study

Sam Stuewe samuel.stuewe at gmail.com
Wed Feb 6 16:20:24 UTC 2019


Sorry, I know I am late to the party. But, out of interest, why are
you looking at version 5 of the standard? The latest version is 11,
and it's freely available (both as HTML text:
<https://www.unicode.org/versions/Unicode11.0.0/> and as PDF:
<http://www.unicode.org/versions/Unicode11.0.0/UnicodeStandard-11.0.pdf>)
from the Unicode Consortium itself.

In addition, if you are starting to look into Unicode as a study, I
would recommend the following resources as intro points and
programming tools:

* ICU (the only officially-endorsed (by the Consortium) library for
handling unicode properly): http://site.icu-project.org/
* “The Absolute Minimum Every Software Developer Absolutely,
Positively Must Know About Unicode and Character Sets (No Excuses!)”
(an article by Joel Spolsky that is a great introduction):
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
* graphemica (a website to easily check information on a particular
codepoint): https://graphemica.com

As a final note, for those of us that have a C background (or are
headed in that direction), please pay special attention to the
standard annex on text segmentation (UAX #29:
<http://www.unicode.org/reports/tr29/tr29-33.html>). This lays out the
proper way to do the equivalent of iterating over characters in a
string for unicode text. (note that ICU handles a lot of this for you
which is one of the huge reasons to leverage it wherever possible.)

Also, please do keep posting to the list about the plans for the
study; I'm always interested in getting more familiar with unicode.

All the best,

-Sam


More information about the Friends mailing list