Unicode study

Ioannis Nompelis nompelis at nobelware.com
Sun Feb 17 16:37:08 UTC 2019

> Are you reading Unicode Demystified? Please keep notes about things that
> surprised you or anything extra you want to add that the book left out.

No. I was reading the more readily availabel stuff on encodings, mostly from
Wikipedia and some of the links to the standards. I am doing a "first pass"
to collect some notes on points of interest, which is a point you brought
up and it is a good one.

I will pull the code from my regular system when I boot it up and post it
here (it is a very short piece of C code).

> A few years ago I used base64 encoding/decoding as an example to study
> property based testing. The tests would generate random binary data
> and check that the byte length of the message expanded by a certain
> percent after encoding, that the result was padded with the correct
> number of equal signs, and that encoding-decoding would undo each other
> properly. (RFC4648 guarantees the roundtrip will work only on "canonical
> encodings," for example decoding "1yx=" and re-encoding it produces
> "1yw=". Found this out through a fuzz testing failure actually.)
> Here's the relevant section in my article about it:
> https://begriffs.com/posts/2017-01-14-design-use-quickcheck.html#test-case-distribution-and-shrinking

Solid! I will look at it. I want to bring up a quick point. From the more
philosophical point of view, base64 encoding is ambiguous, and there is a
specific choice of which printable characters to use for the non alphanumeric.
(The A-Z, a-z and 0-9 give 62, but for a 2^6 6-bit encoding we need 64, so
there needs to be a chocie made for 2 more printable characters.) In the
standard base64 encoding -- I gather -- it is "+" and "/" that are used,
while the "=" is the padding printable character; this is for the case where
the length of the buffer to encode modulo 6-bits has a remainder. I think
I just described UTF-7, btu UTF-8 is the more interesting one.

I will look at your weblog post this afternoon. (You have a lot on that
site that is interesting to me and I never get to sit and pick at it.)

We will hold a separate discussion on obscurity/obfuscation. I had sent a
link to this dude's website with those text-generating gems and that
incredible prime number generator from Hell. Look back in the list archives.

More information about the Friends mailing list