Linux: Unicode emoji issues
The MacBook I put Linux on has been working pretty stellar, for the most part the things that annoy me are things I can live with, or that I just don’t care strongly enough about to put in the effort to fix them (ie, I’d very much like to have the “Debian style” permanent-dock at the bottom of the Gnome desktop, but I don’t care enough to figure out how to make it happen on Arch).
However I fell into one trap earlier this week: quite some time ago, I started using elaborate Unicode emoji (for example: ಠ_ಠ) “ironically”, because I found it annoyed others around me, to the point where I created aliases for them in my IRC client. Like most things you start doing “ironically”, after some time a habit forms and you begin doing things in earnest and well… here we are.
A problem arose that I never set the locale on this Arch installation. Most other OSes and Linux distributions do it for you, picking a sensible default for your region, and frankly I just never got around to it, so I was surprised to see a pair of squares and an underscore where my awesome “disapproval” face should be. This seemed like an easy fix, I checked locale -a
and noticed all the things were C
, so I fixed it. Then that didn’t work, so I read (at Avi’s direction) the Arch Wiki page on Locale and fixed it properly (having to generate the locale I selected). I also made sure the ttf-liberation
package was installed so that I had some reasonable fonts to work with.
This fixed most of them, but a couple of characters like the above-referenced face still eluded me. Things quickly spiraled out of control and this was no longer a problem I could ignore, it was outright bothering me, as it should be so simple!
The facts I knew at this point:
- My locale was 100% set correctly - I was using
en_US.UTF-8
, and I even tried switching toen_AU.UTF-8
because of a misleading (IMHO) StackOverflow answer, to no avail.LC_ALL
was not overriding anything (it was set toC
at first). - The font I’m using for most things - IRC, my editor (Sublime Text), my Terminal (kitty, should be unicode capable) - Fantasque Sans Mono, works fine on my other Mac, and worked fine on Ubuntu.
- It didn’t work in Firefox either, when I go to fetch a new version of the emoji to make sure it wasn’t corrupted in my IRC client’s settings somehow.
- I have the
ttf-liberation
font package installed, which should give me sufficient fonts to cope with Unicode.
After some time, a bit of experimenting, I learned something new about Unicode and fonts: it’s typical for fonts, even those designed after the Unicode period, to not include the full unicode set, and for the OS to basically do it’s best to mix and match fonts to cover the entire set (which is fair enough, as they’re continually adding to it).
So my assumption that installing a modern font like Fantasque was enough was incorrect - I was missing the various regional alphabets that we westerners abuse for our stupid emoji. After installing a shotgun blast of other fonts, I finally came to the correct alphabet for the above: pacman -Syu ttf-indic-otf
.
I’m not sure how many others I installed that I didn’t actually need, but I have no issues with silly emoji that don’t work, and I have these fonts installed:
pacman -Qe | grep ttf
ttf-dejavu 2.37+18+g9b5d1b2f-3
ttf-droid 20121017-10
ttf-indic-otf 0.2-11
ttf-liberation 2.1.4-1
Hopefully should the need arise to blow away this install and start again, I can avoid the same issue.