Unicode study

Joe Nelson joe at begriffs.com
Wed Feb 27 07:15:19 UTC 2019


Sam Stuewe wrote:
> * ICU (the only officially-endorsed (by the Consortium) library for
> handling unicode properly): http://site.icu-project.org/
> ...
> Also, please do keep posting to the list about the plans for the
> study; I'm always interested in getting more familiar with unicode.

I've begun making some utilities with ICU, and I'm wondering if you can
help debug this one: https://github.com/begriffs/unicode-utils/blob/master/diacritic.c

It is supposed to remove diacritics like "résumé façade" -> "resume
facade", and it does work for that string. It also works on the Ō
character, but not on Ō̄. It appears that the latter grapheme doesn't
decompose with unorm2_getDecomposition(). Any thoughts?

I'll be at the Hack Factory on Weds if you want to join me and hack on
more ICU stuff.


More information about the Friends mailing list