The Most Uncopyrightable Dermatoglyphics in all of Swashbucklerdom


Seeking out words with lots of unrepeated letters...

Some weeks ago, in response to my post about words with interesting spellings, Miranda posted a couple of comments about counting non-repeating letters. This got me thinking about writing small programs to scan lists of English words, to locate words having especially many or few repeated letters, and I've now got some early results to share.

First, a few comments on how I went about this. I decided that this might be a good time to start exploring a relatively new programming language, known as Ruby, that has been getting a lot of attention lately as a useful scripting language, especially for web development. Free Ruby implementations are readily available, but I use a Mac at home, and Ruby is packaged with Mac OS X, so I didn't even have to download it. So far, I'm pretty impressed with the language; it seems quite sensibly designed. But I'm not going to delve into it here.

Besides the programming language, I also wanted one or more fairly comprehensive lists of English words. Such lists are not hard to find, because they are useful for applications such as spelling checkers. Several web sites offer lists in simple formats, with a range of properties (such as whether they use American or British spellings, whether they include inflected forms, proper nouns or acronyms, and how comprehensive they are). For spelling checkers, you may not want to include too many rare cases, because they can cause unintended spellings to slip through. But for my purposes, I mostly just wanted lots of words. I eventually settled on a combination of lists from SCOWL Revision 6 that amounts to over 400,000 words, but excludes proper nouns and acronyms.

For today, I'm going to focus on words with large numbers of non-repeated letters, or LOJOs (Letters Occurring Just Once). The highest LOJO score I found was 15, for the words uncopyrightable and the rather more obscure dermatoglyphics (the study of skin patterns--such as fingerprints--on hands and feet), both having no repeated letters at all.

I found 16 words with 14 LOJOs. Only ambidextrously seemed to me to be a word you might actually encounter in everyday life, though a few others such as pseudocharitably, pseudomythical, troublemakings and undiscoverably are readily understood. Some of the others are technical terms such as benzhydroxamic, dermatoglyphic (of course), hydropneumatic and ventriculography. The longest is superacknowledgement, in which both e and n are repeated. The one I found initially most puzzling was sulphogermanic, for which I lacked even a plausible use (except maybe to describe a fire-and-brimstone preacher from Stuttgart?) until I guessed that it is probably a term for certain chemical compounds containing both sulphur and germanium.

There are more than 90 words with 13 LOJOs, ranging from commonplace (unpredictably) through the unlikely (swashbucklerdom) to the obscure (lepidothamnus, a genus of small conifers).

Words with very low LOJO counts (for their length) will be a topic for another day.

Posted: Wed - March 21, 2007 at 04:47 AM       by email

|

Weblog Commenting and Trackback by HaloScan.com



©