Because I choose to.

A Brief Intro to Input Method Framework, Linux IME, and XIM

There are chances one need an input method ed­i­tor (IME). For CJK users, sup­port­ing uni­code and wide char­ac­ters from Chi­nese, Japan­ese and Ko­rean is not enough, since it only gives the dis­play of their na­tive lan­guages, not the way of input. West­ern peo­ple, es­pe­cially who can man­age to type their char­ac­ters and words di­rectly from a stan­dard key­board, may not un­der­stand the need for such input fa­cil­ity, which could pos­si­bly be the rea­son why CJK sup­port is usu­ally added as an ad­di­tional fea­ture in the end of a soft­ware sys­tem.

Briefly speak­ing, imag­ine the case where Eng­lish has more than 26 al­pha­bets, far more than that, what would hap­pen? Imag­ine a lan­guage with tens of thou­sands of basic al­pha­bets (char­ac­ters, or ty­po­graph­i­cally, glyphs). How would you de­sign the input stack of a com­puter sys­tem to let users input ef­fi­ciently? Since we can­not in­tro­duce a "super" key­board hav­ing thou­sands of keys, a bet­ter way is to try to "spell" each char­ac­ter by mak­ing a se­ries of key strokes. So, in­ac­cu­rately, if you do this in Eng­lish, it is like you spend some time press­ing the keys to get an "a" in the end. Or press more than five keys (prob­a­bly 15 keys or more) to have "linux" shown up in your text edit­ing soft­ware. This way, we only incur log­a­rith­mic time com­plex­ity to index a char­ac­ter in CJK space (think­ing about look­ing up a word in an Eng­lish dic­tio­nary by trac­ing the lead­ing let­ters). An­other good news is, using very basic sta­tis­ti­cal meth­ods or ad­vanced NLP ef­fort, such way of mak­ing input can be fairly ef­fi­cient in spite of mul­ti­ple can­di­dates given the same key press com­bi­na­tion. The am­bi­gu­ity comes from the fact that, many main­stream input meth­ods of Asian lan­guages use Eng­lish al­pha­bets (some lan­guage, such as Japan­ese, calls it "Ro­maji", re­lated to old Ro­man­ian al­pha­bets) to rep­re­sent the pro­nun­ci­a­tion of a char­ac­ter. It is likely that, in some lan­guages, for ex­am­ple Chi­nese, to have dif­fer­ent char­ac­ters or words spelled with the same se­quence of al­pha­bets. For ex­am­ple, both 「元音」("vowel") and 「原因」("rea­son") are spelled by "yuan yin" in pinyin scheme, the pro­nun­ci­a­tion no­ta­tion stan­dard­ized by gov­ern­ment of China (main­land). An­other scheme, zhuyin (or Man­darin Pho­netic Sym­bols), ad­vo­cated by Tai­wan, is also used for users in that area.

Read more...

1 Comment
Contents & Theme © 2020 Ted Yin
TypoPro & Adobe TypeKit