Lingua Franca

Brian Foote and Don Roberts

DRAFT

Department of Computer Science
University of Illinois at Urbana-Champaign
1304 W. Springfield
Urbana, IL 61801 USA

foote@cs.uiuc.edu (217) 328-3523
droberts@cs.uiuc.edu (217) 244-0431

Monday, 8 June 1998

Fifth Conference on Patterns Languages of Programs (PLoP '98)
Monticello, Illinois, 11-14 August 1998

Contents

  1. Abstract
  2. Introduction
  3. English Only
  4. Convertible Currencies
  5. Lingua Franca
  6. Discussion
  7. References

Abstract

This paper examines a set of three patterns that addresses the problems one encounters when multiple representations and languages emerge in some domain. When freely CONVERTIBLE CURRENCIES are available, data are not isolated and trapped by incompatable representations. Where a ENGLISH ONLY/STANDARD REPRESENTATION can be established, communication is becomes less necessary. However, this simplicity is not without its cost. A LINGUA FRANCA addresses the problem posed when representations proliferate. If a standard secondary tongue is adopted, a much smaller number of translators or converters is required to ensure universal translation.

Introduction

Tower of Babel

It has long been recognized that a variety of problems can be greatly simplified if everyone would just speak the same language. It has been recognized for nearly as long that it is impractical to expect this to always be so.

This paper examines a set of three patterns that addresses the problems one encounters when multiple representations and languages emerge in some domain. Where a ENGLISH ONLY/STANDARD REPRESENTATION can be established, communication is simplified. However, this generality is not without its cost. When freely CONVERTIBLE CURRENCIES are available, data are not isolated and trapped by incompatable representations. A LINGUA FRANCA addresses the problem posed when representations proliferate. If a standard secondary tongue is adopted, a smaller number of translators or converters is required to ensure universal translation.

ENGLISH ONLY

alias
STANDARD REPRESENTATION
COIN OF THE REALM
EURO
MONOPOLY
VHS

Pentium(R) II processor logo

100 Euro Note

In 1999, eleven European countries will begin replacing their currencies with a single, common European currency, the euro.

v v v

Programs that are separated physically or temporally need to communicate information between them.

Sometimes, you can greatly simplify things by mandating a single format. Sometimes, a defacto standard format will emerge anyway.

Therefore, Allow one format to serve as a standard language, or vernacular.

There is no question that if you can avoid using multiple representations for the same subject matter, you can avoid a lot of problems.

Euro Coins

Euro Coins

There are those who claim that software is a natural monopoly. Certainly one way to avoid the complexity associated with supporting and converting among multiple formats is to vanquish any competing formats, and claim the entire field for yourself.

Once force that comes into play here is that when multiple representations exist, and free conversion between them is desired, designers can be reduced to supporting the intersection of the capabilities of all the formats. Any enhancement that is added by a single representation is lost in the translations. However, when the need for such compatability is removed, this restraint on the power of competing representations dissappears too. This is a mixed blessing. Compatability considerations, be they with competing products, or obsolete versions, can act as brakes on excess, as well as innovation.

Another force that encourages the emergence of standard representations is that they avoid duplication and redundancy, and encourage reuse. Once a standard representation is established, APIs, protocols, frameworks, and tools can emerge to help support it. The efficiencies brought about by such specialization can allow scarce resources to be focused on more worthy tasks than the generation of me-too representations.

The world of CPU architecture is a good place to find examples of beneficial convergence towards a handful of standard approaches. For instance, ten or twenty years ago there were a variety of incompatable, ad hoc schemes for representing floating point values. With the advent of the IEEE floating point standard [praise Kahan here], these have all but vanished, and byte order is the only issue left to contend with when portable binary floating point values are to be dealt with. This makes the design of portable binary images and virtual machines easier.

Another example is the convergence of word sizes towards standard powers of two. During the '70s and '80s, machines with 24, 36, and 60 bit words were still common. Over the last several years, these word sizes have been supplanted, and fully byte-addressable memories have become the norm. The effective elimination of odd word size considerations has allowed for a considerable simplication of subsequent portable binary formats. When you can assume such exceptional cases are gone, your job becomes easier.

Byte order itself has evolved to use either pure big-endian or pure little-endian [cite Cohen] representations. The troublesome hybrid orders seen in older DEC and Z8000 processors, for example, have effectively vanished. This, in turn, has made dealing with networked binary easier. (Indeed, network byte order itself is an example of the LINGUA FRANCA pattern at work.)

The emergence of standard tongues puts native speakers at an advantage, and non-native speakers at a disadvantage. In the realm of technology, similar technical and economic advantages acrue to those best versed in a dominant representation.

v v v

The WINNING TEAM pattern addresses some of these issues from the perspective of evolving code itself.

With any monopoly, there are benefits and dangers. For instance, because competition is stifeled, the need to innovate is diminished, and incremental adaptation can be deferred. The SOFTWARE TECTONICS pattern discusses the possible consequences of letting such strain accumulate.

CONVERTIBLE CURRENCIES

alias
TRANSLATION
DYNAMIC CHANGE OF REPRESENTATION

China

v v v

Programs that are separated physically or temporally need to communicate information between them.

Sometimes, no single representation for data does everything we need. Sometimes we must match the representation to the task at hand. This matching process might be relatively coarse-grained or static, or fine-grained, and highly dynamic, with transformations taking place at runtime.

Therefore, Provide for free conversion among various representations. Hence, one can use the representation best suited to the task at hand.

Representations are freely convertible when one can move back and forth between them with relative impunity. They need not be totally invertible functions. It is only necessary that the losses incurred be of minor practical significance.

Just as with currency conversions and translations among natural languages, there are costs associated with changing representations. There are a variety of forces and issues with which designers may need to concern themselves when designing a system that employs multiple, freely convertible representations.

MS Word Conversions

Time: Conversions take time. Indeed, they can be extremely time consuming in some cases. In some cases, this overhead can be avoided by caching all or part of the the converted objects so that they need not be converted again until one of the copies changes. This strategy requires that a mechanism to make sure these multiple versions are consistent be put in place, and that these copies be invalidated or updated when changes occur.

Consistency: When multiple versions of the same data exist, questions of consistency naturally arise. The designer might require that a change made to any copy be propagated to all versions, or might designate a particular version as the authoritative copy. Indeed, many of the issues that arise in distibuted systems and databases can arise when this pattern is employed.

Space: One representational scheme may take up more space than another. Indeed, trading off space and time is a recurring theme when such representations and conversions are devised. For example, some formats for representing images take more space than others. Issues such as pixel depth and compression strategies can dramatically affect the space requirements associated with images.

Overhead: Some representations are more costly to use than others. For instance rendering compressed images can incur more runtime computational overhead than uncompressed ones might.

Complexity: Some representations are more complicated than others. While complexity often comes hand-in-hand with power, it can mean means that programmers might find these representations harder to deal with. When this complexity is exposed to users, it can cause confusion as well.

Predictability: Not only do representations differ in the time that it takes to convert among them, and the overhead associated in using them, but they can differ in the degree to which this overhead is predictable. Here again, image compression schemes illustrate this issue. An all white bitmap might be represented quite concisely, while random noise might be essentially uncompressable. A consequence of this is that the overhead associated with rendering an image might vary tremendously in ways that are highly dependent on the data.

Swiss Army Knife

Utility: When a particular representation supports tools or operations that are unavailable in other formats, it may be easier to convert to this representation, use the tool, and convert back, than it is to attempt the manipulation in some other representation. For instance, some word processors might provide easy access to spelling checkers or grammar tools, while others might generate output suitable to certain printers or distribution formats.

This is the difference between finding the right tool for the right job and employing a Swiss Army Knife.

Fidelity: Fidelity is always an issue when data are converted. A major consideration in the design and use of conversions is data loss. When the target format is less rich than the source, detail must be sacrificed. When it is richer, decisions must be made as to reasonable default values and assumptions during conversion. When you convert from ASCII to Word 97 format, decisions about default fonts and sizes must be made. Conversely, when you convert to pure text formats, decisions about line breaks must be made, and font choices are lost. Of course nearly all formatting information to be lost.

When you convert from JPG to GIF89, you need to decide how to handle transparent background colors. Converting from GIF To JPG can result in a loss of image resolution, because of JPG's compression. So can a change from a 24-bit to an 8-bit image depth. Sometimes, the price of such conversions is obvious and immediate. Other times it is more subtle, as in multiple natural language translations, image processing, and printing.

Versions are just alternative representations. You can convert among these too. This can be thought of as a sort of floating exchange rate, or inflation adjustment.

Physical Unit conversions are conversions too. Ward Cunningham refers to this as the WHOLE VALUE pattern. Conversions work when specific objects are fungible commodities. When object identity is significant, conversions are harder to use.

Word processors usual provide mechanisms for importing and exporting a variety of formats besides their native formats. These provide compatability with previous versions of the same product, products from other vendors, and standard formats such as ASCII text or Postscript. Some conversions may be read and not written, and vice versa. Postscript might be generated as output, but not read as input. Text with a variety of line separators might be read, but a only a single text format might be written. For instance, Microsoft Word can read data from Excel spreadsheets, but cannot save a document as a spreadsheet.


This oft recounted tale from the journals of the Lewis and Clark expedition illustrates the joys of multiple translations:

Sacagawea Statue (Bismark, ND)

Ordway filled in more details of the visit: "our officers took down Some of their language found it verry troublesome Speaking to them as all they Say to them has to go through Six languages, and hard to make them understand." The captains' questions were translated into French by Drouillard or Labiche, directed to Charbonneau, who conveyed them to Sacagawea in Hidatsa, who rephrased them in her native language to a young Shoshone who was with the tribe and could speak Salishan - "a gugling kind of languaje," noticed Clark, "Spoken much thro the Throught." Answers followed the same route in reverse.

To further illustrate this phenomenon, the following prose shows how Alta Vista's Babelfish translates this paper's abstract into French, and then retranslates the result into English.

This paper examines a set of three patterns that addresses the problems one encounters when multiple representations and languages emerge in some domain. When freely CONVERTIBLE CURRENCIES are available, data are not isolated and trapped by incompatable representations. Where a ENGLISH ONLY/STANDARD REPRESENTATION can be established, communication is simplified. However, this generality is not without its cost. A LINGUA FRANCA addresses the problem posed when representations proliferate. If a standard secondary tongue is adopted, a much smaller number of translators or converters is required to ensure universal translation.

Cet article examine un ensemble de trois configurations qui adresse les problèmes un rencontre quand les representations multiples et les langages émergent dans un certain domaine. Quand librement les DEVISES de CONVERTIBLE sont disponibles, des données ne sont pas isolées et ne sont pas emprisonnées par les représentations incompatable. Là où une REPRÉSENTATION ANGLAISE d'cOnly/standard peut être établie, la transmission est simplifiée. Cependant, cette généralité n'est pas sans son coût. Une LINGUA-FRANCA adresse le problème a posé quand les représentations prolifèrent. Si une langue secondaire standard est adoptée, un plus petit nombre de traducteurs ou de convertisseurs est exigé pour assurer la traduction universelle.

This article examines a whole of three configurations which addresses the problems a meeting when the multiple representations and the languages emergent in a certain field. When freely the CURRENCIES of CONVERTIBLE are available, from the data are not isolated and are not imprisoned by the representations incompatable. Where an ENGLISH REPRESENTATION of ONLY/STANDARD can be established, the transmission is simplified. However, this general information is not without its cost. A LINGUA-FRANCA addresses the problem posed when the representations proliferate. If a standard secondary language is adopted, a smaller number of translators or converters is required to ensure the universal translation.


Currency Converter pages abound on the web. For example:

Instructions: You can view an exchange rate on any day from 1 January 1990 to the present. Type the amount you wish to exchange. Choose the desired currencies using the scroll bars to see more currencies. Choose a date. Push the button 'See the exchange rate' to see the results.

Convert amount with rate of

TO

164 Currency Converter ©1997-1998 by OANDA


v v v

Contrast ENGLISH ONLY/STANDARD REPRESENTATION and LINGUA FRANCA.

Contrast ADAPTER too.

The OBSERVER pattern can be used to implement mechanisms to enforce mutual consistency among multiple views of a set of objects.

Free convertability among different subjects, guises, facets, roles, aspects, extensions, "unknowns", perspectives, or views can counteract the tendency to build monolithic, one-size-fits-all objects, and encourage the emergence of better factored architecures that exploit finer-grained, convertible parts using representations that fit the task at-hand.

Oman

LINGUA FRANCA

alias
ENGLISH

President Robert Mugabe of Zimbabwe addresses the Security Council

v v v

Programs that are separated physically or temporally need to communicate information between them.

Otherwise compatable data are stored in a variety of different formats. Converting among them can require, in the worst case, a conversion program for each pair of formats. One problem that is often seen when representations proliferate is that the number of converters needed to convert among them can increase with the square of the number of representations.

Therefore, Allow one format to serve as a universal second language, a lingua franca. Then conversion between arbitrary formats will require only two conversions, one from the first format to the lingua franca, the second from the lingua franca to the second format.

Just as with currency conversions, there are costs associated with changing representations. Conversions take time. In this approach, two conversion are usually needed, one two convert to the standard representation and one to convert from the standard representation. If these are extremely time consuming, this can be a serious drawback. Another force that impacts this solution is lossiness. If the translations to and from the standard representation lose information (e.g., GIF->JPEG conversion), then this solution may not be acceptable. For instance, consider the case where the standard representation is JPEG (a poor choice). If we want to convert from GIF to TIFF, we must convert from GIF to JPEG and then JPEG to TIFF. However, each of these conversions degrades the image slightly. If this series of conversions occurs many times, the image will get increasingly corrupted.

Examples abound.

Word processors usual provide mechanisms for importing and exporting a variety of formats besides their native formats.

The Portable Bitmap Tools or Portable Bitmap Utilities [cite Jeff Poskanzer] that come with the X Window System are one example. PBM format is widely used as a lingua franca.

Byte Code or other machine independent code representations are examples too, in the sense that multiple languages can be translated to these representations, and a single native code generator that translates from this representation to native code can be written for each native instruction set or operating system.

Byte code representations serve as a lingua franca that stands between compilers, and machines. They have a long history. Kay recalls that he first saw them in Euler. Smalltalk has used bytecode to represent executable methods since the mid '70s. UCSD Pascal also used machine independent byte code. The phenomenal success of the Java language has led to a resurgence of interest in byte code. Indeed, it is because byte code serves as a lingua franca that stands between the compiler, and the sundry platforms on which Java is to run, that its write-once/run anywhere promise can be made. When virtual machine instruction sets are well-designed, they can support multiple languages as well as multiple platforms.


The table below illustates the problem. To convert among all the currencies in question, a conversion between each is needed. For the 10 currencies shown, the number of conversions needed is N2-N (discounting the identity conversions).

  USD DEM CNY BEF GBP IEP MYR OMR RUB CHF
US Dollar . USD 1 1.77 8.28 36.60 0.61 0.7 3.86 0.385 6.14 1.48
German Mark . DEM 0.56 1 4.67 20.64 0.35 0.4 2.17 0.217 3.46 0.84
Chinese Yuan Renminbi . CNY 0.12 0.21 1 4.422 0.07 0.1 0.47 0.047 0.74 0.18
Belgian Franc . BEF 0.03 0.05 0.23 1 0.02 0 0.11 0.011 0.17 0.04
British Pound . GBP 1.63 2.9 13.5 59.77 1 1.2 6.3 0.629 10 2.42
Irish Punt . IEP 1.42 2.52 11.7 51.92 0.87 1 5.47 0.546 8.71 2.1
Malaysian Ringgit . MYR 0.26 0.46 2.15 9.495 0.16 0.2 1 0.1 1.59 0.38
Omani Rial . OMR 2.6 4.61 21.5 95.07 1.59 1.8 10 1 15.9 3.85
Russian Rouble . RUB 0.16 0.29 1.35 5.963 0.1 0.1 0.63 0.063 1 0.24
Swiss Franc . CHF 0.68 1.2 5.59 24.71 0.41 0.5 2.6 0.26 4.14 1

  USD DEM CNY BEF GBP IEP MYR OMR RUB CHF
US Dollar . USD 1 1.77 8.28 36.60 0.61 0.7 3.86 0.385 6.14 1.48
German Mark . DEM 0.56 --- --- --- --- --- --- --- --- ---
Chinese Yuan Renminbi . CNY 0.12 --- --- --- --- --- --- --- --- ---
Belgian Franc . BEF 0.03 --- --- --- --- --- --- --- --- ---
British Pound . GBP 1.63 --- --- --- --- --- --- --- --- ---
Irish Punt . IEP 1.42 --- --- --- --- --- --- --- --- ---
Malaysian Ringgit . MYR 0.26 --- --- --- --- --- --- --- --- ---
Omani Rial . OMR 2.6 --- --- --- --- --- --- --- --- ---
Russian Rouble . RUB 0.16 --- --- --- --- --- --- --- --- ---
Swiss Franc . CHF 0.68 --- --- --- --- --- --- --- --- ---

The table above illustrates how this problem is simplified when one representation (in this case, the dollar (USD)), becomes the lingua franca. Here, the number of conversion needed becomes 2*N-1.

Of course, each time a full conversion is needed, two conversions needed to be done. One converts from the first currency into dollars, and the second converts from dollars to the second currency. The time and complexity involved in performing two conversions are traded-off against the simplicity of providing a smaller number of converters.

On one level, the example above is artificially simple. Maintaining a fully stocked table of numeric conversions among all the currencies listed is an easy programming task. However, in general, format conversions will be more complex. For that matter, anyone who thinks real world currency conversion is a simple problem hasn't traveled very much.


Simultaneous Translation

v v v

The term lingua franca comes from an Italian phrase for "Frankish language". The term is often associated with the world of diplomacy. It is something of an anachronism. At one time Latin and Greek played this role. These days, English has assumed the role of the lingua franca in many parts of the world. French is still widely spoken in parts of Africa, Canada, the South Pacific and the Carribean, and, of course, in France, where it is used exclusively. Its role as a lingua franca, or second language of choice, is seen most often in francophone Africa.

Contrast CONVERTIBLE CURRENCIES and ENGLISH ONLY/STANDARD REPRESENTATION.

Discussion

Interestingly, many of the forces that drive the emergence of standard and non-standard representations resemble those that drive biological speciation. Analogs to geographic separation, convergent evolution, niches, divergent needs, and specialization can all be seen. It is not always obvious how we should distinguish priceless, irreplaceable info-diversity from pointless blind alleys for which the best requiem is a chorus of "good riddance".

The forces that drive representations towards oligarchy and monopoly, and the forces that drive them to Balkanization and Babel are in constant tension. A common tongue promotes simplicity, harmony and community. Yet, an obsolete or dead language is of no use to anyone. A standard that is inadequate to address the needs of its users, and promotes incompatable extensions (e.g. Pascal's lack of string support?), can lead to a din of diverging dialects, which must ultimately be reconciled. When can one size fit all? It is essential that lines of communcation be kept open, translations maintained, and that the need for a lingua franca be kept in mind as simple domains grow more rich and complex.

While in an ideal world, a minimal number of representations for data might be needed, in ours, this is seldom the case. When a wide range of representations are present, one (or more) will frequently emerge as a lingua franca. Such a common format permits the benefits of focused, domain-specific representations to be retained, while providing a path from such representations to a wealth of others.

Acknowledgments

We are soon to be eternally indebted, as usual, to the members of the University of Illinois Patterns Group, John Brant, Ian Chai, Ralph Johnson, Lewis Muir, Dragos Manolescu, Brian Marick, Eiji Nabika, and Your Name Here, who will soon have to endure several similarly raw drafts of this work.

Dragos Manolescu has looked at several drafts of these patterns, and suggested a number of improvements.

Molly Lojo proofread an early version of this paper and provided valuable corrections and suggestions. We are grateful to OANDA for making it easy to embed their currency converters in third-party web pages.

References


[Alexander 1979]
Christopher Alexander
The Timeless Way of Building
Oxford University Press, Oxford, UK, 1979
http://www.oup-usa.org/

[Alexander et. al 1977]
C. Alexander, S. Ishikawa, and M. Silverstein
A Pattern Language
Oxford University Press, Oxford, UK, 1977
http://www.oup-usa.org/

[Coplien 1995]
James O. Coplien
A Generative Development-Process Pattern Language
First Conference on Pattern Languages of Programs (PLoP '94)
Monticello, Illinois, August 1994
Pattern Languages of Program Design
edited by James O. Coplien and Douglas C. Schmidt
Addison-Wesley, 1995

[Cunningham 1995]
Ward Cunningham
The CHECKS Pattern Language of Information Integrity
First Conference on Pattern Languages of Programs (PLoP '94)
Monticello, Illinois, August 1994
Pattern Languages of Program Design
edited by James O. Coplien and Douglas C. Schmidt
Addison-Wesley, 1995

[Foote & Opdyke 1995]
Brian Foote and William F. Opdyke
Lifecycle and Refactoring Patterns that Support Evolution and Reuse
First Conference on Patterns Languages of Programs (PLoP '94)
Monticello, Illinois, August 1994
Pattern Languages of Program Design
edited by James O. Coplien and Douglas C. Schmidt
Addison-Wesley, 1995

This volume is part of the Addison-Wesley Software Patterns Series. [Foote & Yoder 1996] Brian Foote and Joseph W. Yoder Evolution, Architecture, and Metamorphosis Second Conference on Patterns Languages of Programs (PLoP '95) Monticello, Illinois, September 1995 Pattern Languages of Program Design 2 edited by John M. Vlissides, James O. Coplien, and Norman L. Kerth Addison-Wesley, 1996

This volume is part of the Addison-Wesley Software Patterns Series. [Gamma et al. 1995] Eric Gamma, Richard Helm, Ralph Johnson, and John Vlissides Design Patterns: Elements of Reusable Object-Oriented Software Addison-Wesley Longman, Reading, MA, 1995 [Roberts & Johnson 1996] Don Roberts and Ralph E. Johnson Evolve Frameworks into Domain-Specific Languages Third Conference on Pattern Languages of Programs (PLoP '96) Monticello, Illinois, September 1996 Pattern Languages of Program Design 3 edited by Robert Martin, Dirk Riehle, and Frank Buschmann Addison-Wesley, 1997


This page has been referenced times since 5/11/98.
Brian Foote foote@cs.uiuc.edu
Last Modified: 11 June 1998