17.6. Accent marks and compound lerfu words

Many languages that make use of the Latin alphabet add special marks to some of the lerfu they use. French, for example, uses three accent marks above vowels, called (in English) “acute”, “grave”, and “circumflex”. Likewise, German uses a mark called “umlaut”; a mark which looks the same is also used in French, but with a different name and meaning.

These marks may be considered lerfu, and each has a corresponding lerfu word in Lojban. So far, no problem. But the marks appear over lerfu, whereas the words must be spoken (or written) either before or after the lerfu word representing the basic lerfu. Typewriters (for mechanical reasons) and the computer programs that emulate them usually require their users to type the accent mark before the basic lerfu, whereas in speech the accent mark is often pronounced afterwards (for example, in German “a umlaut” is preferred to “umlaut a”).

Lojban cannot settle this question by fiat. Either it must be left up to default interpretation depending on the language in question, or the lerfu-word compounding cmavo tei (of selma'o TEI) and foi (of selma'o FOI) must be used. These cmavo are always used in pairs; any number of lerfu words may appear between them, and the whole is treated as a single compound lerfu word. The French word “été”, with acute accent marks on both “e” lerfu, could be spelled as:

Example 17.18.

tei	.ebu	.akut.bu	foi	ty.	tei	.akut.bu	.ebu	foi
(	e	acute	)	t	(	acute	e	)

and it does not matter whether .akut. bu appears before or after .ebu; the tei…foi grouping guarantees that the acute accent is associated with the correct lerfu. Of course, the level of precision represented by Example 17.18 would rarely be required: it might be needed by a Lojban-speaker when spelling out a French word for exact transcription by another Lojban-speaker who did not know French.

This system breaks down in languages which use more than one accent mark on a single lerfu; some other convention must be used for showing which accent marks are written where in that case. The obvious convention is to represent the mark nearest the basic lerfu by the lerfu word closest to the word representing the basic lerfu. Any remaining ambiguities must be resolved by further conventions not yet established.

Some languages, like Swedish and Finnish, consider certain accented lerfu to be completely distinct from their unaccented equivalents, but Lojban does not make a formal distinction, since the printed characters look the same whether they are reckoned as separate letters or not. In addition, some languages consider certain 2-letter combinations (like “ll” and “ch” in Spanish) to be letters; this may be represented by enclosing the combination in tei…foi.

In addition, when discussing a specific language, it is permissible to make up new lerfu words, as long as they are either explained locally or well understood from context: thus Spanish “ll” or Croatian “lj” could be called .ibu, but that usage would not necessarily be universally understood.

Section 17.19 contains a table of proposed lerfu words for some common accent marks.