Monday, June 28, 2010

where programming and art collide

I was looking for a higher level abstraction for image creation and finally found one worthy of note. The NodeBox Gallery shows what the toolkit is capable of and is the first accessible drawing library that allows for Tufte's Just Noticeable Differences. I'm only now beginning to explore it, and am uncertain whether it encourages working in JND or the gallery curator just has an eye for such.

In other tech news - by which I mean "news to me" - I'd long had a love affair with the elegance of the TeX typesetting system and the amount of typographical art that Knuth formalized in its page layout algorithms. To my knowledge nobody has bested its running text composition abilities (it may in fact be optimal, see below the fold). But, put simply, TeX is a pain to work with. It is a batch-oriented system dating from the time when editing ASCII markup was quick but page creation couldn't be computed in real time. I'd sort of vaguely thought that computers have become fast enough to do TeX in real-time. Well they had, I just didn't know that the people at LyX had already done it. I was creating good looking documents again within three minutes of installing LyX. I'm perhaps too happy about that.


So, he asked parenthetically (and rhetorically), how could a paragraph layout mechanism be optimal? Well, first of all, this is Donald Knuth we are talking about whose stature in the computer-science realm for examining the bounds of optimality is unparalleled. But there is a difference between an optimal sort algorithm and a claim of optimality for something so much a practical art as typesetting is. But first an historical aside.

Knuth set out to write a multi-volume set of definitive review works in computer science. His first volume Fundamental Algorithms was set by hand in the good old style of Gutenberg. If photo-typesetters were in use at the time, I'm pretty sure that they couldn't handle the extensive mathematical "penalty copy" of Algorithms. By the time Knuth was working on volume two, automated typesetting had come into play and Knuth was disgusted with the initial galley proofs. So what does a skilled developer do when existing software isn't good enough? He writes his own, of course. In the course of his excursion into developing TeX, Knuth also made a stop in the realm of typography and created Metafont which was a language for describing how to draw letters. True to his academic roots, the TeX language is one of the few I know that allows the redefinition of the language syntax within the language; I've never decided if this was overkill.

But my favorite story from the design of TeX was how Knuth decided to deal with mathematical rounding errors.  Floating point truncation is a problem well known to programmers but for reasons esoteric if you ask a modern computer to add 0.1 + 0.1 + 0.1 ... ten times, you'd expect it to come up with 1.0 but it doesn't, it yields something like 1.000000119 depending on how you do the math. People have the same sort of problem when we do arithmetic: just try to write a complete and accurate decimal expansion for 1/3.  For a precise guy like Knuth, this wasn't good enough so he decided to do only integer arithmetic inside TeX and then was left with a question of how big and how small the numbers should be so that it would never be an practical limit. This is from memory, but he set the upper bound of his numbers used for measurement to allow for a page the size of a large room. On the other end of the scale, he allowed measurements so precise that rounding errors would be smaller than the wavelength of visible light, so - by definition - any rounding errors would be literally, physically, invisible.

Such thinking has lead me to believe that it is possible for a paragraph layout algorithm to be optimal. The TeX paragraph compositor has a notion of "badness" which incorporates things like how much extra space did I have to put between the letters on this row relative to the rows above and below, where a higher spacing difference between lines is more "bad". Other bad elements of composition are widow and orphan lines and Knuth's nemesis: artificially inserted hyphenation (which by default has a really high badness value). If the compositor is laying out a paragraph and hits a threshold amount of badness it goes back and attempts to redo the composition making different choices. Indeed, TeX may, on successive attempts backtrack all the way to the first line of a paragraph to get the layout sufficiently unbad. How does this make for optimality? Because the badness factors and thresholds can be set by a skilled typographer to reproduce all the artistic judgments she would make had she been doing the layout herself. Indeed, Knuth knew his limitations and had a highly skilled typographer set the "knobs" for him because it was beyond his skills.

No comments:

Post a Comment