A Brief Intro to Input Method Framework, Linux IME, and XIM

There are chances one need an input method editor (IME). For CJK users, supporting unicode and wide characters from Chinese, Japanese and Korean is not enough, since it only gives the display of their native languages, not the way of input. Western people, especially who can manage to type their characters and words directly from a standard keyboard, may not understand the need for such input facility, which could possibly be the reason why CJK support is usually added as an additional feature in the end of a software system.

Briefly speaking, imagine the case where English has more than 26 alphabets, far more than that, what would happen? Imagine a language with tens of thousands of basic alphabets (characters, or typographically, glyphs). How would you design the input stack of a computer system to let users input efficiently? Since we cannot introduce a "super" keyboard having thousands of keys, a better way is to try to "spell" each character by making a series of key strokes. So, inaccurately, if you do this in English, it is like you spend some time pressing the keys to get an "a" in the end. Or press more than five keys (probably 15 keys or more) to have "linux" shown up in your text editing software. This way, we only incur logarithmic time complexity to index a character in CJK space (thinking about looking up a word in an English dictionary by tracing the leading letters). Another good news is, using very basic statistical methods or advanced NLP effort, such way of making input can be fairly efficient in spite of multiple candidates given the same key press combination. The ambiguity comes from the fact that, many mainstream input methods of Asian languages use English alphabets (some language, such as Japanese, calls it "Romaji", related to old Romanian alphabets) to represent the pronunciation of a character. It is likely that, in some languages, for example Chinese, to have different characters or words spelled with the same sequence of alphabets. For example, both 「元音」("vowel") and 「原因」("reason") are spelled by "yuan yin" in pinyin scheme, the pronunciation notation standardized by government of China (mainland). Another scheme, zhuyin (or Mandarin Phonetic Symbols), advocated by Taiwan, is also used for users in that area.

Read more...

Comments

「智障」致「神經病」

倘若我有一絲慧根尚存
願以之悉數奉出
纏繞交疊
你每一縷思緒
合理或瘋狂
而後
僅剩傻傻的我
看著你痴痴地笑
Comments

Scripts for Adding Bookmarks to a PDF

As I gradually get used to reading more and more e-books, I find that it could be 100x more efficient if there were bookmarks that help you jump around the chapters for a scanned version pdf. It could be even more pleasant than reading the original paper-based book since the table of contents can be always on one side of the screen which is more accessible.

However, the true story is, those scanned pdfs usually don't come with such detailed bookmarks naturally. It is a painstaking task to use a pdf editing software to add bookmarks one after another via a graphical UI. Thus, a more desirable solution is to have a scripting-like way to add these bookmarks. Luckily, with the help of Adobe pdfmark Reference and this article, it is fairly easy to achieve:

The code above is used for generating the bookmark description according to the pdfmarks reference. Finally, we can concatenate the original pdf file with the generated bookmark file. Therefore:

Note that you need to change the variable toc in the above script to one which describes your table of contents.

Comments

Application for PhD

Comments

所謂幸福

我想有個姑娘
能同她聽金色音樂會
亦可去吞街頭牛肉麵
狹小店裏
於盈盈熱氣中
促膝而坐
吸溜着叼滿一嘴的麵條
咕噥了聽不清什麼的話
相視一笑
這便是心中的幸福

Read more...

Comments
Contents & Theme © 2017 Ted Yin
TypoPro & Adobe TypeKit