A Brief Intro to Input Method Framework, Linux IME, and XIM

Jun 27, 2017 23:06 @ Ithaca

There are chances one need an input method editor (IME). For CJK users, supporting unicode and wide characters from Chinese, Japanese and Korean is not enough, since it only gives the display of their native languages, not the way of input. Western people, especially who can manage to type their characters and words directly from a standard keyboard, may not understand the need for such input facility, which could possibly be the reason why CJK support is usually added as an additional feature in the end of a software system.

Briefly speaking, imagine the case where English has more than 26 alphabets, far more than that, what would happen? Imagine a language with tens of thousands of basic alphabets (characters, or typographically, glyphs). How would you design the input stack of a computer system to let users input efficiently? Since we cannot introduce a "super" keyboard having thousands of keys, a better way is to try to "spell" each character by making a series of key strokes. So, inaccurately, if you do this in English, it is like you spend some time pressing the keys to get an "a" in the end. Or press more than five keys (probably 15 keys or more) to have "linux" shown up in your text editing software. This way, we only incur logarithmic time complexity to index a character in CJK space (thinking about looking up a word in an English dictionary by tracing the leading letters). Another good news is, using very basic statistical methods or advanced NLP effort, such way of making input can be fairly efficient in spite of multiple candidates given the same key press combination. The ambiguity comes from the fact that, many mainstream input methods of Asian languages use English alphabets (some language, such as Japanese, calls it "Romaji", related to old Romanian alphabets) to represent the pronunciation of a character. It is likely that, in some languages, for example Chinese, to have different characters or words spelled with the same sequence of alphabets. For example, both 「元音」("vowel") and 「原因」("reason") are spelled by "yuan yin" in pinyin scheme, the pronunciation notation standardized by government of China (mainland). Another scheme, zhuyin (or Mandarin Phonetic Symbols), advocated by Taiwan, is also used for users in that area.