METADATAMetadata

and

Active Object-Models

 

Brian Foote

Joseph Yoder

 

Department of Computer Science

University of Illinois at Urbana-Champaign

1304 W. Springfield

Urbana, IL  61801The Refactory, Inc.

209 W. Iowa

Urbana, IL  61801  USA

 

foote@cs.uiuc.edurefactory.com  (217) 333328-34113523

yoder@cs.uiuc.edurefactory.com  (217) 244-4695

 

ThursdayWednesday, 9 September 199929 August 2001

 

Tuesday - August 04, 1998

 

 

Abstract

 

A number of forces shape the way in which software evolves.  One is a desire to make programs as general as possible.  Another is to push configuration decisions out into the data.  Yet another is to push them out onto the users.  Still another is to defer such decisions until runtime. 

 

The patterns herein explore how complexity migrates from the code to the data as systems mature.  As the data become more sophisticated, they increase the power that can can, in turn, be brought to bear upon them at runtime. increases.

 

This paper presents six several patterns from a larger, emerging pattern language:  DATA, METADATA, PROPERTY, SMART VARIABLE, SCHEMA, and ACTIVE OBJECT-MODEL.It focuses on PROPERTIES, and observes that three distinct intents underlie what have commonly been called "properties".

 


 

Introduction

 

 

Introduction

 

Inside every domain-specific framework, there is a language crying to get out.

Thomas Jay Peckish II

 

A number of forces shape the way inin which software evolves.  One is a desire to make programs as reusable as possible.  Another is to push configuration decisions out into the data.  Yet another is to push such decisions out onto the users.  Still another is to defer such these decisions until runtime. 

 

Data themselves become more universal and reusable when they are accompanied by descriptions of themselves that let other programs make sense of them.  They can become even more independent when they are accompanied in their travels by code. 

 

The patterns in this paperour emerging pattern language begin to chronicle how domain specific languages emerge as programs evolve.  A program may begin simply, performing but a single task.  Later, programmers may broaden its utility by adding options and parameters.  When more configuration information is needed, separate configuration files may emerge.  As these become more complex, entries in these files may be connected to entities in the program using properties, dynamic variables, and dialogs.  Simple values may not suffice.  Once properties and dynamic values are present, simple parsers and expression analyzers often are added to the system.  This, in turn creates a temptation to add control structures, assignment, and looping facilities to the functional vocabulary provided by the expression analyzer.  These can flower into full-blown scripting facilities.

 

After a while, the domain or business objects come to constitute a program of sorts, which can be dynamically constructed and manipulated by users themselves.  During this evolutionary process, descriptions of the data, such as maps of the layouts of data objects, and references to methods or code, are needed to permit these heretofore anonymous capabilities to be accessible during runtime.  These descriptions allow these objects to be composed, edited, stored, imported, exported, and (these are programs, after all) debugged. 

 

As this evolutionary process unfolds, and the architecture of a system matures, knowledge about the domain becomes embodied more and more by the relationships among the objects that model the domain, and less and less by logic hardwired into the code.  Objects in such an ACTIVE OBJECT- MODEL are subject to runtime configuration and manipulation like any other data.  Changes to this runtime constellation of objects constitute changes to the model, and to the operations that traverse or interpret it.

 

Data that describe other data, rather than aspects of the application domain itself, are called metadata.  Naturally, these layout and code descriptions should be objects too.  Hence, metadata have metadata as well.

 

A successful application inevitably draws a crowd.  A host of users on a hosts of hosts will want to use such a program, and the data that go with it.  It is important that data produced by one copy of the program be usable by other users at other sites.  Such data might reside in a shared or distributed repository such as a database or persistent object base.  They might also migrate across a network, via wires, satellites, fibers, radio waves, and even diskettes or tapes. 

 

It is important, too, that these data be accessible not only from copies of the applications that spawned them.  Other programs must be able to deal with them as well.  When such data are mere "punch card images", or undifferentiated byte streams, this is hard to do.  However, when data are escorted by machine readable descriptions of what they mean, they become welcome in a wider range of processing venues.

 

Our story then, is about how data earn their wings.  It chronicles the forces that drive data to become more general.  It describes their ascent from digits on punch cards, to lines on data files, and bytes in streams,

through structures, and on through their marriage to behaviors, which begot objects.  It continues as the need to describe these objects incubates self-descriptions, which themselves are cast as objects, which, in turn, allow objects to aspire to escape the processes and images in which they were trapped, and roam unencumbered across the network.

 

The drive to become more general begins modestly.  A simple application may acquire command line switches and parameters, to allow its behavior to vary, or permit additional input streams to be specified.  As a program becomes yet more general, additional configuration information may be needed.  This information may complex, and may even be provided interactively, by end users.  Simple, textual interfaces may yield to graphical user interfaces, which themselves may grow more powerful, and, alas more complicated.

 

As an object-oriented application evolves, the elements of a object-oriented framework emerge.  Where raw, undifferentiated, white-box code once was, dynamically pluggable black-box components begin to appear.  Internal structure, which was once haphazard, becomes better differentiated, and more refined.

 

As such a framework evolves, the these elements themselves, together with the protocols and interfaces they expose, come to constitute a domain specific language for the framework's target domain. 

 

Often, something else happens as well.  The configuration user interface and tools grow more powerful too, so as to expose more and more flexibility and power to the users.  At first, simple parameters are exposed.  Later, expressions and simple logical rules may be proffered.  Finally, control structures might emerge, and the full power of this emerging language is exposed to the user.  Users may be offered existing behaviors, or new behaviors might be added using scripts which might be interpreted, or even compiled at runtime.  Editors emerge that allow users to directly manipulate the objects that constitute their "programs".

 

This story might have a familiar ring to those readers who have followed the research done over the years into reflection and metalevel architectures.  Of course, the reflection literature has earned it's recondite reputation the hard way (that is, through unrepentant abstruseness.).  Our tale might be seen as an attempt to render their Finnigan's Wake as, if not a Mother Goose Tale, at least a trip Through the Looking Glass.

 

The patterns in this paper are part of a larger pattern language that we are writing.  We currently envision a language that will include the following patterns. 

 

The patterns included in this  PLoPOOPSLA  '98 version of this work are shown in bold:

 

The METADATA patterns in this collection can be broken down into the following categories:

 

1.      DATA

2.      METADATA

 

Patterns that arise from pushing decisions out onto the user:

 

3.      PARAMETERIZATION

4.      CONFIGURATION

5.      EXPRESSIONS

6.      SCRIPTS

7.      DIALOGS

8.      TABLES

 

Patterns that arise as a domain specific languages emerges:

 

9.      PROPERTY

10.   SMART VARIABLES

11.   SCHEMA / DESCRIPTOR

12.   ACTIVE OBJECT-MODEL

13.   SPECS

14.   MESSAGE ROUTING

15.   CONTEXT

16.   NAMESPACES

17.   EDITOR

18.   VISUAL BUILDER

19.   DYNAMIC VALIDATION

20.   HISTORY

21.   VALUE HOLDER / SMART VALUES

 

Patterns that become relevant as data become "self aware" (or more reflective)

 

22.   METACLASS

23.   IDEMPOTENCE

24.   SYNTHETIC CODE

25.   CODE AS DATA

26.   CAUSAL CONNECTION

27.   BOOSTRAPPING

 

Global Forces

 

A variety of forces impinge upon evolving systems.  Some of them pervade the pattern languages below, and are enumerated here to avoid duplication:

 

Portability: When an artifact works with a variety of applications, on a variety of platforms, it is more likely to be reused.

 

Efficiency: Highly dynamic systems can be inimical to efficiency.  However, efficiency is often a false idol.  For instance, the cost of referencing an object in a remote database may be several orders of magnitude more expensive than accessing a local object, and such overhead may overwhelm secondary concerns, such as the cost of accessors vs. direct variable references.

 

Complexity: Complex data structures and code are hard to debug and comprehend.  Alas, many programmers are better at creating complexity than simplicity.

 

Dynamism: Interactive programming environments, visual builders and debuggers, and distributed applications all benefit from a more dynamic approach to software system architecture. 

 

Dynamism can be dangerous, though.  More dynamic systems can be harder to debug, maintain, and understand.  One wouldn't let a child learn to ride a bicycle on a busy highway.

 

Resources: Dynamic strategies can be costly in terms of space, processing time, secondary storage, etc.

 

Safety: Dynamic strategies allow users to circumvent and undermine compile-time safeguards.

 

Flexibility: A program should be versatile, and usable in a variety of contexts.  This, in turn enhances:

 

Reusability: A versatile, flexible application, or, for that matter, a code-level artifact, should be as reusable as possible.  The reuse of such code avoids duplicated effort, eases the learning and comprehension burden of new programmers, and makes maintenance easier, since multiple, redundant copies of essentially the same code need not be maintained. 

 

Adaptability: It is essential that an artifact be flexible enough so as to confront and address changing requirements.  We distinguish several "shades" of adaptability.

 

Maintainability: It is important that an artifact be maintainable enough to as to confront and address changing requirements.  Code that can't be worked on will lapse into stagnation.

 

Tailorability: One size does not fit all.  Often, an artifact will not fit the needs of a particular user "off the rack", but can be tailored to do so when certain "alterations" can be made.

 

Customizability: Just an artifact can be tailored to a particular user or users, it can be customized to adapt it better for a particular task.  This may seem at first to be a lot like tailorability, but we find that distinguishing between forces for change than emanate from individual users and those that arise from taking on different tasks useful.

 

Pushing Complexity into the Data: When complexity is pushed into the data, it can be coped with dynamically, at runtime.  Configuration information can travel with the data, rather than being locked up in explicit code.

 

Pushing Configuration Decisions out onto the User: As a framework evolves, more and more configuration decisions are pushed out onto the user.  Users become programmers of sorts.  The trick, of course, is not to force them to be general purpose programmers.  They don’t have the training for this, and would fear that their social lives would be ruined.  And, real programmers would be out of jobs. 

 

Autonomy/Mobility: Once behavior and data, together with their descriptions, are liberated from application code, they can travel independently of these applications, and be used in a wider range of programs, on a wider range of platforms.

 

Comprehensibility: Metadata helps to document its associated data.  Indeed, data files with metadata in them were often referred to as “self-documenting” data files during the ‘70s.  Of course, the opposite can be true as well.

 


 

DATA

 

also known as

INPUT

OUTPUT

 

v v v

 

How do you allow your program to compute more than one result?

 

For the most part, a program that computes the same result every time is of little use.

 

Therefore, arrange for data to be able to be read in and written out of your program.

 

The distinction between program and data is so familiar that it hardly needs to be made.  Still, it will become useful when we discuss metadata and self-modifying code, so we begin our story here.

 

Data are bits.  Bits, in and of themselves, are meaningless.  Data, therefore, are not merely disembodied bits.  Data are about something.  Data represent something.  Bits, whether parceled into masks, bytes, words, arrays, or structures, are symbols, that represent relevant notion drawn from the application domain from whence these notions came.

 

Random bits are about nothing.  Baring a vanishing small, monkey-at-a-typewriter fluke, a processor almost certainly can't execute very many of them as non-trivial instructions.  They will make little sense when interpreted as complex data, but can (to the consternation of programmers tracking down uninitialized memory problems) masquerade as simple data on occasion.  On those rare occasions where "random bits" are used to simulate stochastic processes, they are data.  Of course, such bits must be chosen carefully, since generating genuine randomness a chore to which computers are particularly ill-suited.

 

Traditionally, data have come in two fundamental guises, text and binary.  Text, in turn, was strung together as strings and streams.  Text is interpreted in terms of standard character sets, and has traditionally carved memory into 5, 6, 7, 8, 16 or 32-bit chunks.  There are fewer such restrictions with binary.

 

One good thing about binary data is that if you can get back the same bits you had the last time you ran, you can pick up where you left off.  This may seem obvious, but it is actually an important principle.  As long as data are read back in the same order in which they were written out, it is possible to get the state of memory back to the way it was when the data were written in the first place.  Every beginning programmer learns this "trick", and it is the cornerstone of truly traditional (> nine months) streamed data processing. 

 

Complications such as byte order, word size, and floating point formats enter the discussion quickly, though.  More complex objects require more complex layout schemes.

 

This approach should have a name.  Is it CLASSIC STREAMING, or SNAPSHOTTING?  Is it two classic patterns?  It is an example of a broad principle from which the MEMENTO pattern is derived.

 

v v v

 

 


 

METADATA

 

also known as

SELF-DESCRIBING DATA

DESCRIPTORS

DATA ABOUT THE DATA

MANIFEST

 

v v v

 

How do you avoid the limitations of fixed-format data?

 

Data are about what the program does.

 

The data in an airline reservation are about travel plans.  These are data.  The data that steer the program, and the program itself, are about the program.  These are metadata.  Of course, the data steer the program too. 

 

You know what?  This is wrong.  The programs themselves are about manipulating travel plans.  The objects that represent the programs are about the program.  It's subtle, but it's crucial.

 

Of course, this is a fuzzy boundary to draw.  Are the bits that represent the algorithm for making seat assignments about air travel, or are they about how to perform a computation?

 

Every program has its expectations as to how it wants its data laid out.  When no descriptive data is present, this "fixed format" must be adhered to by any data set that wishes to work with a particular program.

 

Therefore, arrange for a description of the data to accompany your data where ever they go.

 

A better solution is to let data carry a description of how they are laid out around with themselves.  Subsequent patterns describe how to do this.

 

You can embed extra flags, descriptive information, and other interpretable information in your data stream.  For instance, a program that operates on two-dimensional time series data might break it up into chunks, and precede each chunk with a count of the number of channels and points in each chunk.  A program that reads such data would read the metadata before each chunk, and then read the data. 

 

A simpler example is a binary stream broken into chunks where each chunk is preceded by its size in bytes.  Another example is the frequently seen scheme that prepends a signature word to each chunk, both for initial identification, and for byte-order identification.

 

v v v

 

Metadata can have schemas too.

 


 

PROPERTY

 

also known as

ATTRIBUTES

ANNOTATIONS

DYNAMIC ATTRIBUTES

DYNAMIC VARIABLES

VARIABLE STATE

DYNAMIC SLOTS

PROPERTY LIST

 

v v v

 

How do you allow individual an instance objects to add and remove new attributes on-the-flyto augment their state at runtime?

 

Image a system in which objects that track the assembly of products in a manufacturing shop are themselves routed through this system.  The original designs for these objects might have focused on concerns such as part numbers and inventory information.  New requirements might dictate that certain objects have a manufacturing routing slip attached to them as they move through the system.  The original system made no provisions for such attachments.  Once way to address this problem might be to add a new field for these routing slip attachments.  However, there are several problems associated with this approach.  One is that only a handful of instances will ever need such attachments, while the overhead cost for this field will be paid by every product object in the system.  Another is that there may be a variety of these attachments.  For instance, some products might have timestamp annotations made as they pass certain stations.  We could add fields for all such annotations, but the costs and complexity would escalate rapidly.  What we really want is a way to add a new variable to any object on-the-fly.

 

The following minimal set of operations on properties will usually be present in some form:

 

void addProperty(Indicator name, Descriptor descriptor, Object value);

void removeProperty(Indicator name);

boolean hasProperty(Indicator name);

void setProperty(Indicator name, Object value);

Object getProperty(Indicator name);

 

The hasProperty() will either be explictly or implicitly present.  When it is not explictly present, a distinguished value such as Property.ABSENT might be returned by getProperty() and setProperty() to indicate the absence of a property, or an exception might be generated. 

 

Some implementations don't provide an explicit addProperty() operation, and  allow the first call to setProperty() to create a new property instead.  This is often the case when property Attributes are not present.

 

Similarly, the removeProperty() operation can be dispensed with by providing for removal of a property when a designated value is assigned to it, such as Property.REMOVE.  This value, naturally, must be one that need never be the value of a Property.

 

One or more of the following additional operations might be present in some form as well:

 

Descriptor getDescription(Indicator name);

Descriptor[] getDescriptors();

Object[] propertyList();

 

The role, if any of the Descriptor objects, will vary depending upon the language and implementation strategy used.  In dynamically typed languages such as CLOS, Smalltalk, or Self, they may not be present at all.  In languages such as C++, Java, and C, minimal type information is might be used to indicate how different property value should be downcast.  It is also used by tools such as editors, visual builders and debuggers.

 

Therefore, provide runtime mechanisms for accessing, altering, adding, and removing properties or attributes at runtime.

 

An implementation of the PROPERTY pattern will involve the following participants:

 

Indicators

 

These are the key or name values with which properties will be looked up.  The name is taken from the original Lisp 1.5 implementation of property lists.

 

Descriptors

 

Objects that describe the attributes of a property.  They may include display names, type information, the indicator objects, constraints, default values, and references to accessor functions.  

 

List

 

Properties are usually stored in a random access data structure, such as a Linked List, Dictionary or Hashtable.

 

Owner

 

This dictionary is owned by the object that possesses the properties.  Usually each instance of an object has its own property dictionary.  However, an external data structure that maps instances or instance/indicator pairs might also be used.

 

Client

 

Clients, when transparent implementations of the PROPERTY pattern are used, can be unaware they are using PROPERTIES.  More often, properties will be referenced using a different syntax than for normal variables.  Also, clients must take particular care to cope with the consequences of a property's absence, since, most objects won't be carrying them.

 

Value

 

In dynamically typed languages, an object of any type will usually be permitted as the value of a property.  Where type checking is present, downcasting from types like Object is usually used.  Some implementations use String values as property values.

 

The following minimal set of operations on properties will usually be supplied in some form by object that have properties.  These operations are generic, but are presented here using a Java-like syntax:

 

      void addProperty(Indicator name,

Descriptor descriptor, Object value);

      void removeProperty(Indicator name);

      boolean hasProperty(Indicator name);

      void setProperty(Indicator name, Object value);

      Object getProperty(Indicator name);

 

The hasProperty() will either be explictly or implicitly present.  When it is not explictly present, a distinguished value such as Property.ABSENT might be returned by getProperty() and setProperty() to indicate the absence of a property, or an Exception might be generated. 

 

Some implementations don't provide an explicit