The Worst Ideas. Updates every Monday!

Your weekly source for terrible ideas.

Tag: Language

Programmers love this one weird trick to handle Unicode characters without any complexity! “Visual-literation” replaces the old-fashioned way of transliteration. Watch as linguists wail mournfully at the years they wasted trying to transliterate sounds between alphabets!

The issue:

Many computers are unable to handle letters that don’t fall into the set of Latin characters used by English.

Even though the Unicode standard has greatly improved multi-character-set accessibility, problems still arise:

  • A character might not exist in a chosen font. For example, “Egyptian Hieroglyph of a bird catching a fish” is probably not available in Comic Sans.
  • Systems may be unable to cope with characters that look exactly the same (“homoglyphs”: https://en.wikipedia.org/wiki/Homoglyph).
    • For example, “Latin A” and “Cyrillic A” look identical, but have different underlying Unicode codes.
    • So an email from “YOUR BANK.COM” might actually be from a different site, with an imposter letter “A” (https://en.wikipedia.org/wiki/IDN_homograph_attack).
    • (This is an issue in English as well, with 0 (zero) versus O (capital “o”) and “I / l / 1” (capital i, lower-case L, numeral 1).)
  • Systems may not allow certain letters for certain situations; for example, if your username is “Linear B ‘stone wheel’ + Mayan jaguar glyph,” it is extremely unlikely that you will have an easy time logging into your user account.

The current failure mode is usually to display a blank rectangle instead, which is unhelpful.

Proposal:

Instead, we can use a sophisticated image-recognition system to map each letter from every language onto one or more Latin characters (Fig. 1).

Usually, this is called transliteration (https://en.wikipedia.org/wiki/Transliteration). But in this case, rather than using the sound of a symbol to convert it, we are using the symbol’s visual appearance, so it’s more like “visual-literation.”

easy-vs-hard

Fig. 1: With a limited character set, it may be easy to display the “Å” as  “A”, or “ñ” as “n.” But it’s unclear what should be done with the Chinese character at the bottom, which isn’t similar to any specific Latin letter.

more-abstract

Fig. 2:

Top: Image analysis reveals that the Chinese character (meaning “is”) can be most closely matched to the Latin capital “I.” Bottom: The Greek capital “∏” (pi) is disassembled into two Ts.

Some letters actually do somewhat resemble their Latin-ized versions (like “∏” as “TT”). However, some mappings are slightly less immediately obvious (Fig 3).

highly-unrelated

Fig. 3: Many complex symbols can—with a great degree of squinting—be matched to multi-letter strings.

Conclusion:

Linguists will love this idea, which forever solves the problem of representing multiple character sets using only the very limited Latin letters.

PROS: Gives every word in every language an unambiguous mapping to a set of (26*2) = 52 Latin letters.

CONS: Many symbols may map to the same end result (for example, “I” could be the English word “I,” or it could have been a “visual-literated” version of ““).

 

letter-translation

Fig. 4: A collection of potential mappings from various symbols to an ASCII equivalent. Finally, the days of complex transliteration are over!

 

 

Advertisements

Modern “Emoji” characters will become the basis for writing systems of civilizations 1000 years from now.

Background:

Our current alphabet is derived from an ancient system of representational icons. These icons were once pictures of actual objects, but have been simplified to an easier-to-write form over the millennia.

For example, according to the inerrant source of knowledge known as Wikipedia (http://en.wikipedia.org/wiki/Phoenician_alphabet):

The letter “Q” used to be one of these:

q

This is the head of a needle, called “qop.” Presumably the ancient Phonecian word for “head of a needle” sounded something vaguely like “qop.”

Similarly with “K,” which used to look like this:

k

Supposedly this was the palm of a hand, called “kap.” Just like above, presumably the ancient word for “palm” started with a “k” sound.

Today:

So in the modern era, whenever we want to write out a “k” sound, we draw a tiny pictogram of the palm of a hand, all because the word for “palm” started with a “k” three thousand years ago.

Some letters are indirectly derived from ancient Egyptian hieroglyphs.

owl

So if we ask why a specific letter is shaped in a certain way, the answer is because it looked like a sketch of an owl that some scribe drew 5000 years ago!

The predicted future:

In the future, we expect that these trends will continue.

In the example below, we see the icon of a floppy disk (which also represents the word “Save”). A floppy disk is a device that was once used by the ancestral people of Silicon Valley to store written knowledge.

Here are two predicted possible evolutions of a new character (the final form of which is based loosely on Chinese characters), which may represent one of three things:

  1. In a fully ideographic system, it would continue to represent the verbto save.”
  2. In a syllabic system, it would represent the syllable “sa” or “say.”
  3. In an alphabetic system, it would represent the sound “sss.”

emoji_evolve_2

Fig 1: In the distant future, the “save” icon (left) will become an ideogram via one of the two paths seen at right. The two paths (top row and bottom row) represent different ways of abstracting away the floppy disk; in the top path (green arrow), the angled edge is exaggerated, while in the bottom path, the metal slide cover is emphasized.

Conclusion:

Just as obsolete iconography of the past continues to live on today (the head of a needle, the Egyptian owl, etc…), our Emoji of the beginning of the third millennium will undoubtedly influence the writing systems of people in the distant future.

PROS: Since this is inescapably our future, it has no “pros.” It merely is.

CONS: As above, there are no cons to this vision of the future. We must simply accept it as destiny.

How to destroy a programming language (or natural language?) that you don’t like in one easy step with three difficult sub-steps

The issue:

Sometimes, you don’t like a programming language (like Perl or Python), or a natural language (like English or Spanish).

You might have your reasons, or maybe not—maybe you just want to destroy it completely for no reason at all!

 

Proposal: Here’s a simple way to go about wreaking destruction on the language in question while leaving no one the wiser:

  1. Propose a “new and improved” version of the language. Example: “Perl 6 will be so much better than Perl 5!” Or: “Esperanto: it’s like English, but the spelling is much more regular!”
    1. Make sure it’s very similar at first glance, but annoyingly incompatible in key regards.
    2. Next, make sure there are a few bonus features, but not enough to actually justify the switching cost.
  2. For programming languages, start creating software in this language. For natural languages, start creating novels, newspapers, and works of art in this language.
  3. Make sure there is a HUGE delay in switching; “everyone should learn English 2.0, but it isn’t ready quite yet… so in the meantime, English 1.0 is deprecated.”
  4. Finally, you just have to wait! Instead of switching to the “upgraded” language, people will probably switch to an entirely different one.

 

Great examples in history:

  • Successful destruction: Perl 5 –> Perl 6
  • To be determined: Python 2 –> Python 3
  • Failure: English –> Esperanto

PROS: Lets you surreptitiously destroy the language that has drawn your wrath.

CONS: None!