Write a Diary with Sphinx

Jun 22, 2018 13:49 @ Palo Alto

My favorite lightweight markup language is reStructuredText [1]. Unlike XML/HTML, a lightweight markup language is more human-readable, while it still preserves some basic abilities of expressing hyper-text content. This article (and also the blog) is written in reStructuredText. Though not as famous as Markdown [2], reStructuredText, in my opinion, is more feature-rich and extensible. The official Python document [3] is written in it, with the help of Sphinx [4], the documentation generator, powered by the lower-level reST engine, docutils [5].

Usually documentation generators are used by code projects to generate accompanied references and tutorials. However, thanks to the versatility of Sphinx, we can also keep our personal diaries with it! Here I'm gonna show how one can keep multiple diaries at the same time via a single Sphinx project.

Quickstart

First thing to do is to make sure sphinx is installed on your machine. In Archlinux, it is simply:

Then we can make use of the quick start tool to create a template documentation project:

This will start a wizard which asks several questions about the preference of your project. Suppose you mostly follow the default settings, the project directory diaries/ will end up like this:

>> cd diaries/
** 02:28:50 /t/diaries ymf@Pixelbook **
>> ls -l
total 16
drwxr-xr-x 2 ymf ymf   40 Jun 22 02:28 _build
-rw-r--r-- 1 ymf ymf 4723 Jun 22 02:28 conf.py
-rw-r--r-- 1 ymf ymf  449 Jun 22 02:28 index.rst
-rw-r--r-- 1 ymf ymf  607 Jun 22 02:28 Makefile
drwxr-xr-x 2 ymf ymf   40 Jun 22 02:28 _static
drwxr-xr-x 2 ymf ymf   40 Jun 22 02:28 _templates

Here, we can include our reST files in index.rst, each keeps track of one diary. In index.rst:

As you might have guessed, we should accordingly add two reST files life.rst and research.rst to our project directory. In life.rst:

In research.rst:

Finally, try to build your diaries:

This command shall generate HTML output to _build/html directory. Open the browser to see the compiled diaries:

The link of "My Life" leads to the diary content:

Hacking

So far so good. The idea of using sections as diary entries is not bad because there won't be too many entries for a browser to handle: imagine you write 10 entries per day, and do it for 100 years, the total number of sections will only be 365,000, which can be easily rendered by modern browsers (I suppose). Scalability won't be a real issue if only for diary purpose. If it will, just partition your diaries into multiple files, according to years, for example.

However, the inconvenience of keeping diaries this way is about the order of sections. One tends to append new writing at the end of the reST file instead of prepending it to the beginning. However, docutils will render the sections according to their order of appearance in reST, which makes sense for an article with only few sections, but not quite so for a diary that "misuses" sections. Naturally, we'd like to see our latest entry at the top of the page instead of scrolling all the way to the end of the page to see it. Luckily, there is a way of reordering the sections both in text and in the table of contents. Add the following code to the end of your conf.py:

The sections are reversed as expected with the hack:

[1]	http://docutils.sourceforge.net/rst.html

[2]	https://daringfireball.net/projects/markdown/

[3]	https://docs.python.org/3/

[4]	http://www.sphinx-doc.org/en/master/

[5]	http://docutils.sourceforge.net/

Comments

A Brief Intro to Input Method Framework, Linux IME, and XIM

Jun 27, 2017 23:06 @ Ithaca

There are chances one need an input method editor (IME). For CJK users, supporting unicode and wide characters from Chinese, Japanese and Korean is not enough, since it only gives the display of their native languages, not the way of input. Western people, especially who can manage to type their characters and words directly from a standard keyboard, may not understand the need for such input facility, which could possibly be the reason why CJK support is usually added as an additional feature in the end of a software system.

Briefly speaking, imagine the case where English has more than 26 alphabets, far more than that, what would happen? Imagine a language with tens of thousands of basic alphabets (characters, or typographically, glyphs). How would you design the input stack of a computer system to let users input efficiently? Since we cannot introduce a "super" keyboard having thousands of keys, a better way is to try to "spell" each character by making a series of key strokes. So, inaccurately, if you do this in English, it is like you spend some time pressing the keys to get an "a" in the end. Or press more than five keys (probably 15 keys or more) to have "linux" shown up in your text editing software. This way, we only incur logarithmic time complexity to index a character in CJK space (thinking about looking up a word in an English dictionary by tracing the leading letters). Another good news is, using very basic statistical methods or advanced NLP effort, such way of making input can be fairly efficient in spite of multiple candidates given the same key press combination. The ambiguity comes from the fact that, many mainstream input methods of Asian languages use English alphabets (some language, such as Japanese, calls it "Romaji", related to old Romanian alphabets) to represent the pronunciation of a character. It is likely that, in some languages, for example Chinese, to have different characters or words spelled with the same sequence of alphabets. For example, both 「元音」("vowel") and 「原因」("reason") are spelled by "yuan yin" in pinyin scheme, the pronunciation notation standardized by government of China (mainland). Another scheme, zhuyin (or Mandarin Phonetic Symbols), advocated by Taiwan, is also used for users in that area.

Scripts for Adding Bookmarks to a PDF

Dec 26, 2015 17:26 @ Singapore

As I gradually get used to reading more and more e-books, I find that it could be 100x more efficient if there were bookmarks that help you jump around the chapters for a scanned version pdf. It could be even more pleasant than reading the original paper-based book since the table of contents can be always on one side of the screen which is more accessible.

However, the true story is, those scanned pdfs usually don't come with such detailed bookmarks naturally. It is a painstaking task to use a pdf editing software to add bookmarks one after another via a graphical UI. Thus, a more desirable solution is to have a scripting-like way to add these bookmarks. Luckily, with the help of Adobe pdfmark Reference and this article, it is fairly easy to achieve:

The code above is used for generating the bookmark description according to the pdfmarks reference. Finally, we can concatenate the original pdf file with the generated bookmark file. Therefore:

Note that you need to change the variable toc in the above script to one which describes your table of contents.

Comments

Code Syntax Highlight

Jul 06, 2014 06:07 @ Singapore

Test music score:

Test syntax highlight for Python:

Test syntax highlight for C++:

Test LaTeX:

\begin{equation*} e^{ix} = \cos x + i\sin x \end{equation*}

Comments