METADATAMetadata
and
Active Object-Models
Department of Computer
Science
University of Illinois
at Urbana-Champaign
Urbana, IL 61801The Refactory, Inc.
209 W. Iowa
Urbana, IL
61801 USA
foote@cs.uiuc.edurefactory.com (217) 333328-34113523
yoder@cs.uiuc.edurefactory.com (217) 244-4695
ThursdayWednesday, 9 September 199929 August 2001
Tuesday
- August 04, 1998
Abstract
A number of forces shape the way in which software evolves. One is a desire to make programs as general as possible. Another is to push configuration decisions out into the data. Yet another is to push them out onto the users. Still another is to defer such decisions until runtime.
The patterns herein explore how complexity migrates from the
code to the data as systems mature. As the data
become more sophisticated, they increase the power that can can, in turn, be
brought to bear upon them at runtime. increases.
This paper presents six several patterns
from a larger, emerging pattern language: DATA,
METADATA, PROPERTY, SMART VARIABLE, SCHEMA, and ACTIVE
OBJECT-MODEL.It focuses on PROPERTIES, and observes that three
distinct intents underlie what have commonly been called "properties".
Introduction
Introduction
Inside
every domain-specific framework, there is a language
crying to get out.
Thomas
Jay Peckish II
A
number of forces shape the way inin which
software evolves. One is a desire to
make programs as reusable as possible.
Another is to push configuration decisions out into the data. Yet another is to push such decisions out
onto the users. Still another is
to defer such these decisions until runtime.
Data themselves become more universal and reusable when they are accompanied by descriptions of themselves that let other programs make sense of them. They can become even more independent when they are accompanied in their travels by code.
The patterns in this paperour emerging pattern language
begin to chronicle how domain specific languages emerge as programs
evolve. A program may begin simply,
performing but a single task. Later,
programmers may broaden its utility by adding options and parameters. When more configuration information is
needed, separate configuration files may emerge. As these become more complex, entries in these files may be
connected to entities in the program using properties, dynamic variables, and
dialogs. Simple values may not
suffice. Once properties and dynamic
values are present, simple parsers and expression analyzers often are added to
the system. This, in turn creates a
temptation to add control structures, assignment, and looping facilities to the
functional vocabulary provided by the expression analyzer. These can flower into full-blown scripting
facilities.
After a while, the domain or business objects come to constitute a program of sorts, which can be dynamically constructed and manipulated by users themselves. During this evolutionary process, descriptions of the data, such as maps of the layouts of data objects, and references to methods or code, are needed to permit these heretofore anonymous capabilities to be accessible during runtime. These descriptions allow these objects to be composed, edited, stored, imported, exported, and (these are programs, after all) debugged.
As this evolutionary process unfolds, and the architecture
of a system matures, knowledge about the domain becomes embodied more and more
by the relationships among the objects
that model the domain, and less and less by logic hardwired into the code. Objects in such an ACTIVE OBJECT- MODEL
are subject to runtime configuration and manipulation like any other data. Changes to this runtime constellation of
objects constitute changes to the model, and to the operations that traverse or
interpret it.
Data that describe other data, rather than aspects of the application domain itself, are called metadata. Naturally, these layout and code descriptions should be objects too. Hence, metadata have metadata as well.
A successful application inevitably draws a crowd. A host of users on a hosts of hosts will want to use such a program, and the data that go with it. It is important that data produced by one copy of the program be usable by other users at other sites. Such data might reside in a shared or distributed repository such as a database or persistent object base. They might also migrate across a network, via wires, satellites, fibers, radio waves, and even diskettes or tapes.
It is important, too, that these data be accessible not only from copies of the applications that spawned them. Other programs must be able to deal with them as well. When such data are mere "punch card images", or undifferentiated byte streams, this is hard to do. However, when data are escorted by machine readable descriptions of what they mean, they become welcome in a wider range of processing venues.
Our story then, is about how data earn their wings. It chronicles the forces that drive data to become more general. It describes their ascent from digits on punch cards, to lines on data files, and bytes in streams,
through structures, and on through their marriage to behaviors, which begot objects. It continues as the need to describe these objects incubates self-descriptions, which themselves are cast as objects, which, in turn, allow objects to aspire to escape the processes and images in which they were trapped, and roam unencumbered across the network.
The drive to become more general begins modestly. A simple application may acquire command line switches and parameters, to allow its behavior to vary, or permit additional input streams to be specified. As a program becomes yet more general, additional configuration information may be needed. This information may complex, and may even be provided interactively, by end users. Simple, textual interfaces may yield to graphical user interfaces, which themselves may grow more powerful, and, alas more complicated.
As an object-oriented application evolves, the elements of a object-oriented framework emerge. Where raw, undifferentiated, white-box code once was, dynamically pluggable black-box components begin to appear. Internal structure, which was once haphazard, becomes better differentiated, and more refined.
As such a framework evolves, the these elements themselves, together with the protocols and interfaces they expose, come to constitute a domain specific language for the framework's target domain.
Often, something else happens as well. The configuration user interface and tools grow more powerful too, so as to expose more and more flexibility and power to the users. At first, simple parameters are exposed. Later, expressions and simple logical rules may be proffered. Finally, control structures might emerge, and the full power of this emerging language is exposed to the user. Users may be offered existing behaviors, or new behaviors might be added using scripts which might be interpreted, or even compiled at runtime. Editors emerge that allow users to directly manipulate the objects that constitute their "programs".
This story might have a familiar ring to those readers who
have followed the research done over the years into reflection and metalevel
architectures. Of course, the
reflection literature has earned it's recondite reputation the hard way (that
is, through unrepentant abstruseness.). Our tale might be seen as an attempt to
render their Finnigan's Wake as, if
not a Mother Goose Tale, at least a
trip Through the Looking Glass.
The patterns in this paper are part of a larger pattern language that we are writing. We currently envision a language that will include the following patterns.
The patterns included in this PLoPOOPSLA '98 version of this work are shown in bold:
The METADATA patterns in this collection
can be broken down into the following categories:
1.
DATA
2.
METADATA
Patterns that arise from pushing decisions out onto the user:
3. PARAMETERIZATION
4. CONFIGURATION
5. EXPRESSIONS
6. SCRIPTS
7. DIALOGS
8. TABLES
Patterns that arise as a domain specific languages emerges:
9. PROPERTY
10. SMART VARIABLES
11. SCHEMA / DESCRIPTOR
12. ACTIVE OBJECT-MODEL
13. SPECS
14. MESSAGE ROUTING
15. CONTEXT
16. NAMESPACES
17. EDITOR
18. VISUAL BUILDER
19. DYNAMIC VALIDATION
20. HISTORY
21. VALUE HOLDER / SMART VALUES
Patterns that become relevant as data become "self aware" (or more reflective)
22. METACLASS
23. IDEMPOTENCE
24. SYNTHETIC CODE
25. CODE AS DATA
26. CAUSAL CONNECTION
27. BOOSTRAPPING
Global Forces
A variety of forces impinge upon evolving systems. Some of them pervade the pattern
languages below, and are enumerated here to avoid duplication:
Portability: When an artifact works with a variety of applications, on a variety of platforms, it is more likely to be reused.
Efficiency: Highly dynamic systems can be inimical to efficiency. However, efficiency is often a false idol. For instance, the cost of referencing an object in a remote database may be several orders of magnitude more expensive than accessing a local object, and such overhead may overwhelm secondary concerns, such as the cost of accessors vs. direct variable references.
Complexity: Complex data structures and code are hard to debug and comprehend. Alas, many programmers are better at creating complexity than simplicity.
Dynamism: Interactive programming environments, visual builders and debuggers, and distributed applications all benefit from a more dynamic approach to software system architecture.
Dynamism can be dangerous, though. More dynamic systems can be harder to debug, maintain, and understand. One wouldn't let a child learn to ride a bicycle on a busy highway.
Resources: Dynamic strategies can be costly in terms of space, processing time, secondary storage, etc.
Safety: Dynamic strategies allow users to circumvent and undermine compile-time safeguards.
Flexibility: A program should be versatile, and usable in a variety of contexts. This, in turn enhances:
Reusability: A versatile, flexible application, or, for that matter, a code-level artifact, should be as reusable as possible. The reuse of such code avoids duplicated effort, eases the learning and comprehension burden of new programmers, and makes maintenance easier, since multiple, redundant copies of essentially the same code need not be maintained.
Adaptability: It is essential that an artifact be flexible enough so as to confront and address changing requirements. We distinguish several "shades" of adaptability.
Maintainability: It is important that an artifact be maintainable enough to as to confront and address changing requirements. Code that can't be worked on will lapse into stagnation.
Tailorability: One size does not fit all. Often, an artifact will not fit the needs of a particular user "off the rack", but can be tailored to do so when certain "alterations" can be made.
Customizability: Just an artifact can be tailored to a particular user or users, it can be customized to adapt it better for a particular task. This may seem at first to be a lot like tailorability, but we find that distinguishing between forces for change than emanate from individual users and those that arise from taking on different tasks useful.
Pushing Complexity into the Data: When complexity is pushed into the data, it can be coped with dynamically, at runtime. Configuration information can travel with the data, rather than being locked up in explicit code.
Pushing Configuration Decisions out onto the User: As a framework evolves, more and more configuration decisions are pushed out onto the user. Users become programmers of sorts. The trick, of course, is not to force them to be general purpose programmers. They don’t have the training for this, and would fear that their social lives would be ruined. And, real programmers would be out of jobs.
Autonomy/Mobility: Once behavior and data, together with their descriptions, are liberated from application code, they can travel independently of these applications, and be used in a wider range of programs, on a wider range of platforms.
Comprehensibility: Metadata helps to document its
associated data. Indeed, data files
with metadata in them were often referred to as “self-documenting” data files
during the ‘70s. Of course, the opposite can be
true as well.
DATA
also known as
INPUT
OUTPUT
v v v
How do you allow your program to compute more than
one result?
For the most part, a program that computes the same
result every time is of little use.
Therefore,
arrange for data to be able to be read in and written out of your program.
The distinction between program and data is so
familiar that it hardly needs to be made.
Still, it will become useful when we discuss metadata and self-modifying
code, so we begin our story here.
Data are bits.
Bits, in and of themselves, are meaningless. Data, therefore, are not merely disembodied bits. Data are about something. Data represent something. Bits, whether parceled into masks, bytes,
words, arrays, or structures, are symbols, that represent relevant notion drawn
from the application domain from whence these notions came.
Random bits are about nothing. Baring a vanishing small,
monkey-at-a-typewriter fluke, a processor almost certainly can't execute very
many of them as non-trivial instructions.
They will make little sense when interpreted as complex data, but can
(to the consternation of programmers tracking down uninitialized memory
problems) masquerade as simple data on occasion. On those rare occasions where "random bits" are used to
simulate stochastic processes, they are
data. Of course, such bits must be
chosen carefully, since generating genuine randomness a chore to which
computers are particularly ill-suited.
Traditionally, data have come in two fundamental
guises, text and binary. Text, in turn,
was strung together as strings and streams.
Text is interpreted in terms of standard character sets, and has
traditionally carved memory into 5, 6, 7, 8, 16 or 32-bit chunks. There are fewer such restrictions with
binary.
One good thing about binary data is that if you can
get back the same bits you had the last time you ran, you can pick up where you
left off. This may seem obvious, but it
is actually an important principle. As
long as data are read back in the same order in which they were written out, it
is possible to get the state of memory back to the way it was when the data
were written in the first place. Every
beginning programmer learns this "trick", and it is the cornerstone
of truly traditional (> nine months) streamed data processing.
Complications such as byte order, word size, and
floating point formats enter the discussion quickly, though. More complex objects require more complex
layout schemes.
This approach should have a name. Is it CLASSIC STREAMING, or
SNAPSHOTTING? Is it two classic
patterns? It is an example of a broad
principle from which the MEMENTO pattern is derived.
v v v
METADATA
also known as
SELF-DESCRIBING
DATA
DESCRIPTORS
DATA
ABOUT THE DATA
MANIFEST
v v v
How do you avoid the limitations of fixed-format
data?
Data are about what the program does.
The data in an airline reservation are about travel
plans. These are
data. The data that steer the program,
and the program itself, are about the program.
These are metadata. Of course,
the data steer the program too.
You
know what? This is wrong. The programs themselves are about
manipulating travel plans. The objects
that represent the programs are about the program. It's subtle, but it's crucial.
Of course, this is a fuzzy boundary to draw. Are the bits that represent the algorithm
for making seat assignments about air travel, or are they about how to perform
a computation?
Every program has its expectations as to how it
wants its data laid out. When no
descriptive data is present, this "fixed format" must be adhered to
by any data set that wishes to work with a particular program.
Therefore, arrange
for a description of
the data to accompany your data where ever they go.
A better solution is to let data carry a description of
how they are laid out around with themselves.
Subsequent patterns describe how to do this.
You can embed extra flags, descriptive information,
and other interpretable information in your data stream. For instance, a program that operates on
two-dimensional time series data might break it up into chunks, and precede
each chunk with a count of the number of channels and points in each
chunk. A program that reads such data
would read the metadata before each chunk, and then read the data.
A simpler example is a binary stream broken into
chunks where each chunk is preceded by its size in bytes. Another example is the frequently seen
scheme that prepends a signature word to each chunk, both for initial
identification, and for byte-order identification.
v v v
Metadata can have schemas too.
PROPERTY
also known as
ATTRIBUTES
ANNOTATIONS
DYNAMIC ATTRIBUTES
DYNAMIC
VARIABLES
VARIABLE STATE
DYNAMIC
SLOTS
PROPERTY LIST
v v
v
How do you allow individual an
instance
objects
to add and remove new attributes on-the-flyto augment their state at runtime?
Image
a system in which objects that track the assembly of products in a manufacturing
shop are themselves routed through this system. The original designs for these objects might have focused on
concerns such as part numbers and inventory information. New requirements might dictate that certain
objects have a manufacturing routing slip attached to them as they move through
the system. The original system made no
provisions for such attachments. Once
way to address this problem might be to add a new field for these routing slip
attachments. However, there are several
problems associated with this approach.
One is that only a handful of instances will ever need such attachments,
while the overhead cost for this field will be paid by every product object in
the system. Another is that there may
be a variety of these attachments. For
instance, some products might have timestamp annotations made as they pass
certain stations. We could add fields
for all such annotations, but the costs and complexity would escalate
rapidly. What we really want is a way
to add a new variable to any object on-the-fly.
The following minimal set of operations on
properties will usually be present in some form:
void addProperty(Indicator name, Descriptor descriptor, Object
value);
void removeProperty(Indicator name);
boolean hasProperty(Indicator name);
void setProperty(Indicator name, Object value);
Object getProperty(Indicator name);
The hasProperty() will
either be explictly or implicitly present.
When it is not explictly present, a distinguished value such as Property.ABSENT
might be returned by getProperty() and setProperty() to
indicate the absence of a property, or an exception might be generated.
Some implementations don't provide an explicit addProperty()
operation, and allow the first call to setProperty() to
create a new property instead. This is
often the case when property Attributes
are not present.
Similarly, the removeProperty()
operation can be dispensed with by providing for removal of a property when a
designated value is assigned to it, such as Property.REMOVE. This value, naturally, must be one that need
never be the value of a Property.
One or more of the following additional operations
might be present in some form as well:
Descriptor getDescription(Indicator name);
Descriptor[] getDescriptors();
Object[] propertyList();
The role, if any of the Descriptor
objects, will vary depending upon the language and implementation strategy
used. In dynamically typed languages
such as CLOS, Smalltalk, or Self, they may not be present at all. In languages such as C++, Java, and C,
minimal type information is might be used to indicate how different property
value should be downcast. It is also
used by tools such as editors, visual builders and debuggers.
Therefore, provide runtime mechanisms for accessing, altering, adding, and removing properties or attributes at runtime.
An
implementation of the PROPERTY pattern will involve the following participants:
Indicators
These are the key or
name values with which properties will be looked up. The name is taken from the original Lisp 1.5 implementation of property lists.
Descriptors
Objects that describe
the attributes of a property. They may
include display names, type information, the indicator objects, constraints,
default values, and references to accessor functions.
List
Properties are usually
stored in a random access data structure, such as a Linked List, Dictionary or Hashtable.
Owner
This dictionary is
owned by the object that possesses the properties. Usually each instance of an object has its
own property dictionary. However,
an external data structure that maps instances or instance/indicator pairs
might also be used.
Client
Clients,
when transparent implementations of the PROPERTY pattern are used, can be
unaware they are using PROPERTIES. More
often, properties will be referenced using a different syntax than for normal
variables. Also, clients must take
particular care to cope with the consequences of a property's absence, since, most objects
won't be carrying them.
Value
In dynamically typed
languages, an object of any type will usually be permitted as the value of a
property. Where type checking is
present, downcasting
from types like Object is usually used. Some implementations use String values as property
values.
The
following minimal set of operations on properties will usually be supplied in some form by object that have
properties. These operations are generic, but are
presented here using a Java-like syntax:
void addProperty(Indicator name,
Descriptor descriptor,
Object value);
void removeProperty(Indicator name);
boolean hasProperty(Indicator name);
void setProperty(Indicator name, Object value);
Object getProperty(Indicator name);
The
hasProperty() will either be
explictly or implicitly present. When
it is not explictly present, a distinguished value such as Property.ABSENT might be returned by getProperty() and setProperty() to indicate the
absence of a property, or an Exception might be
generated.
Some implementations don't provide an explicit