Unicode study

Joe Nelson joe at begriffs.com
Sun Feb 3 22:02:44 UTC 2019


> The character set extends the ASCII seamlessly by detecting the first
> bit as an indication of non-printable ASCII, which implies a second
> byte is needed. The process is repeated for (I think) up to 4 bytes
> total.

I think what you're describing is one of the "Unicode Transformation
Formats" (UTF-8) rather than Unicode per se, or the Universal Coded
Character Set. But I'm sure our studies will cover all the nuances.

> Joe, please point me to the document you printed.

Which document was that again?

> Also, I will encourage the cluttering of your inbox with responses to
> this thread. Simply put, whatever Joe and I (and whoever else) will be
> discussing will be lost in the electron-passages, while it will remain
> on our list archive otherwise.

Sounds good, I'll reply on-list as things come up.

> And finally, I will set one more goal. Can we think about what utility
> would make our life better from a shell-based unicode use perspective
> and either write it or find one? (I am sure all useful things exist.)

Yeah, that sounds fun! Some ideas:

* A filter to normalize text (like the apostrophe thing we were talking
  about)
* A tool to detect tricky characters in a file. So you could pipe urls
  into it from a mutt message to scan for possible phishing attacks that
  are designed to mimic a reputable domain
* A filter to go the other way: disguise a string. Use weird, slightly
  off characters in the string to fool naive search engines, making the
  text hard to find
* Unicode-enabled word counter which properly segments words in any
  language
* Grep that works modulo diacritics, so I could search for "eclat" and
  match éclat

> Some things just aren't what they used to be, and I mean that companies
> with a large market share are starting to influence the direction in
> which things are going.

It's interesting because that apostrophe is the one the unicode
committee recommends, so in this case Apple is going with the standards.
But in general yeah those big companies do like to meddle.


More information about the Friends mailing list