Appendix B. Metaprogramming and DSL design – DSLs in Action

Appendix B. Metaprogramming and DSL design

Metaprogramming is a technique that’s commonly associated with designing DSLs. Using metaprogramming, you can write code that generates code. When you design a DSL, you can let the language runtime or the compile-time infrastructure generate code for you. This code might seem extremely verbose or boilerplate to your users. In this appendix, I’ll discuss common techniques of metaprogramming used in designing DSLs and how you can use these techniques to make your DSL expressive and succinct.

B.1. The meta in the DSL

In section 2.1, you learn that the powerful metaprogramming capabilities of Groovy can help you write a more expressive DSL than Java can. Languages like Groovy and Ruby let you inflect dynamic runtime behaviors on objects. You can add capabilities on the fly that make the resultant semantics much more malleable. These dynamic behaviors are governed by the metaobject protocol (MOP) (see [5] in section B.8) that each of these languages implements in their runtime. The metaobject protocol of a language defines the semantics of the extensibility of the language artifacts. Take a look at the accompanying callout for a gentle introduction to the concept of MOP in programming languages.

 

Definition

A meta-object is an abstraction that manipulates the behavior of other objects. In an OOP language, a metaclass might be responsible for creating and manipulating classes. To do that, the metaclass needs to store all information that’s relevant to the class, like type, interface, methods, and extension objects.

A meta-object protocol (MOP) for a language defines the semantics of the extensibility of programs written in that language. The behavior of the program is determined by the MOP, including aspects of the program that can be extended by the programmer during compile time or runtime.

 

Metaprogramming is the ability to write programs that generate new programs or that change the behavior of existing programs. In an OO language like Ruby or Groovy, metaprogramming implies capabilities that extend existing object models, add hooks to alter the behaviors of existing methods (or even classes), and synthesize new methods, properties, or modules during runtime through introspection. Languages like Lisp use macros as the metaprogramming tool that let you syntactically extend the language during the compilation stage. Although the primary form of metaprogramming that’s supported by Groovy or Ruby is runtime, Lisp metaprogramming is compile time, and doesn’t incur any runtime overhead. (Both Groovy and Ruby have library support for compile-time metaprogramming through explicit manipulation of the ASTs. But it’s nowhere near as elegant as Lisp. See section B.2 to see why.) Java also offers metaprogramming capabilities through annotation processing and aspect-oriented programming (AOP); it also defines all its extensibility mechanisms through its MOP.

Does your code generate code? This question is possibly the most important one to ask yourself when you’re trying to determine whether your language is effective enough to implement a DSL. You can use metaprogramming to grow your host language toward the domain syntax; it’s considered to be a potent force in DSL design. Statically typed languages like Haskell and OCaml that have traditionally relied on pure embedded semantics for designing DSLs now offer type-safe compile-time metaprogramming through extensions like Template Haskell and MetaOCaml respectively. For more information, see http://www.haskell.org/th/ and http://www.metaocaml.org/.

In this section, let’s review the basic metaprogramming capabilities in some of today’s languages that make them useful for designing a DSL. Part 2 of this book discusses each of these capabilities in greater detail, with lots of examples from the real world.

B.1.1. Runtime metaprogramming in DSL implementation

Why is metaprogramming support such an important feature for a language to host a DSL? The answer is that because metaprogramming support makes a language extensible, the DSL that you’ve implemented in an extensible language also becomes transitively extensible. OK, I’ll admit that was a mouthful. Let’s dig into an example so you can grasp this concept of extensibility when it comes to a DSL.

Consider figure B.1, which illustrates the execution model of a DSL in a language that supports runtime metaprogramming. If the MOP of the language supports extensibility of core language features, the DSL implementation can leverage this feature; programmers can alter or extend core behaviors on the collaborating objects. In this way, the DSL surface syntax remains concise, and the underlying implementation does all the heavy lifting in collaboration with the MOP of the language. Figure B.1 illustrates this behavior as a sample DSL snippet gets interpreted by the underlying implementation and finally processed through the combination of the core language runtime and the language metaprogramming behaviors.

Figure B.1. The role of the language metamodel in DSL execution

Now that you’ve seen an abstract model of the role that metaprogramming plays in the DSL execution context, let’s revisit our order-processing DSL and check out how the Groovy MOP plays a central role in adding to the expressivity of the language. See figure B.2 for a diagrammatic representation of the inflection points.

Figure B.2. Groovy metaprogramming inflection points in our order-processing DSL

All the annotated parts in figure B.2 indicate points at which core language abstractions are dynamically altered or extended through metaprogramming support that Groovy’s MOP defines. These extensions have been implemented as part of the DSL implementation, so the client contracts are concise and free of any incidental complexity.

B.1.2. Compile-time metaprogramming in DSL implementation

As you see in chapter 2, the Groovy MOP extends the core language semantics to implement dynamic program behaviors during runtime. All code generation, method synthesis, and message interception take place when the program is being executed, which means that all meta-objects in Groovy and Ruby are the runtime artifacts of the language. Compile-time metaprogramming lets you construct and manipulate programs during compile time. You can define new constructs and interact with the compiler to perform syntactic transformations and application-specific optimizations. Isn’t this in perfect harmony with Steele’s vision (see [6] in section B.3) that “a main goal in designing a language should be to plan for growth?” Truly, with compile-time metaprogramming, you can grow a language seamlessly toward your domain syntax.

To use the most common form of compile-time metaprogramming you implement syntactic macros. Macro implementations vary in complexity and power, from the textual macros offered by C preprocessors to the sophisticated AST-based ones offered by variants of Lisp and some statically typed languages like Template Haskell and MetaOCaml. In this section, we’ll take a detailed look at some of the capabilities that macros and compile-time metaprogramming add to the power of designing succinct DSLs.

Besides macros, some languages offer other preprocessor-based capabilities for compile-time metaprogramming, like the templates in C++, AOP, and annotation processing. Some languages like Groovy and Scala also have implementations of explicit compiler plugins, which provide some metaprogramming capabilities in the form of AST manipulation. We’ll discuss these features later, though our main focus will be on macro-based implementations like those in the Lisp family of languages.

C++: Templates

C++ offers templates as one of the most dominant protocols for metaprogramming. C++ templates support powerful code-generation mechanisms through manipulation of data structures during compile time. This form of compile-time metaprogramming has been used quite successfully in scientific and numerical applications for generating inline versions of algorithms that employ techniques like loop unrolling for optimizing performance.

Another useful application of C++ metaprogramming is in techniques like expression templates (see [1] in section B.3) that can serve as a useful alternative to C-style callbacks. Instead of incurring the overhead of function calls associated with callback functions, expression templates let you put logical and algebraic expressions directly inline in the function body. The C++ array-processing library Blitz++ (see [2] in section B.3) uses this technique by creating parse trees of array expressions that are used to generate customized kernels. By generating code during compile time, these techniques have also been used to design DSLs that let programmers write code of the following form on higher-order data structures like vectors and matrices:

Vector<double> result(20), x(20), y(20), z(20);
result = (x + y) / z;

Besides generating code through template instantiation, C++ offers operator overloading as another primitive form of metaprogramming. As a follower of the C programming language, C++ also inherits the macro system processed by a preprocessor to the compiler. Languages belonging to the Lisp family make use of macros to provide support for compile-time metaprogramming. Let’s look at those now.

Lisp and Clojure: macros

Lisp has the most sophisticated and complete support for compile-time metaprogramming, through its system of macros. Unlike C macros that have limited expressive power and that operate based on textual substitution, Lisp macros are powered by the full extensibility of the language.

When you have a macro call in your Lisp expression, the Lisp compiler doesn’t evaluate the arguments. Instead, it passes them unevaluated to the macro code. The macro code is processed and returns a new Lisp form that’s evaluated in place of the original macro form. This entire transformation of macro calls runs at compile time, generating code that consists of valid Lisp forms that are integrated with the AST of the main program. Look at figure B.3 for an illustration of the compile-time metaprogramming support that Lisp offers.

Figure B.3. Lisp uses macros to provide compile-time metaprogramming support

Besides the support of syntactic macros, the Common Lisp programming language has lots of other features that make it a natural fit for metaprogramming purposes. Its uniform representation of code and data, the recursive evaluation model, and the fact that it uses expression-based semantics as opposed to statement-based ones are some of the features that you’ll enjoy using.

Clojure (http://www.clojure.org) is a Lisp implementation on the JVM that was written by Rich Hickey. Clojure offers metaprogramming through syntactic macros, much like Common Lisp does. Because Clojure is implemented on the JVM, it integrates seamlessly with Java and offers interoperability with Java objects. In the following sections, I’ll use Clojure code snippets to demonstrate the Lisp way of DSL design. I’ll be using the term Lisp for general reference to the paradigm that we’re talking about. After all, Clojure is a Lisp. In section B.2 you’ll find more details about how the Lisp language design is aligned to expressive DSL implementation. If you want, look at figure B.3 again. It provides an abstract visualization of how Lisp macros generate code in the precompilation phase.

Let’s go through the macro expansion process and how the final Lisp form that’s evaluated by the compiler is generated. Suppose you’ve got the following DSL snippet that processes client orders and, based on some criteria, submits the order to the trading engine:

(when (and (> (value order) 1000000)
          (is-premium-client? client))
 (make-trade order broker)
 (update-journal client))

In that snippet, when is a macro that has the following form:

(defmacro when [test & body]
 (list 'if test (cons 'do body)))

When the Lisp compiler encounters the call to the macro, it doesn’t have a runtime parameter to evaluate the arguments with. All that the compiler has is the source code. What it does is pass the following Lisp lists that represent the source code, unevaluated:

(and (> (value order) 1000000) (is-premium-client? client))
(make-trade order broker)
(update-journal client)

The compiler then runs the macro with these three list forms as the arguments. The parameter condition is bound to the list form (and (> (value order) 1000000) (is-premium-client? client)) and the forms (make-trade order broker) and (update-journal client) are bound to the &rest body parameter. The macro expansion takes place and is replaced with the following code, which is generated from the backquote expression in the macro body:

(if (and (> (value order) 1000000)
        (is-premium-client? client))
 (do
   (make-trade order broker)
   (update-journal client)))

Like the Groovy MOP, code generation takes place with Common Lisp macros as well, but unlike Groovy, it takes place during the precompilation phase. The Lisp runtime never sees any of the meta-objects; it works only on valid Lisp forms.

Java: annotation processing and AOP support

Java also supports a limited form of compile-time metaprogramming through annotation processing and support for AOP (see [4] in section B.3). You can use annotations in a Java program that get processed during build-time. These annotations generate code that can supplement or alter existing program behavior.

AspectJ (see [3] in section B.3) is the aspect-oriented extension to Java that offers a small set of powerful constructs that let you inject additional behavior into existing programs through bytecode instrumentation. You can specify well-defined points in the execution of a program, known as join points, at which you can inject advices to define additional behaviors. A collection of join points is referred to as a pointcut. Pointcuts, advices, and Java member declarations make modular units called aspects. Aspects are used to generate code at specific pointcuts and define one form of MOP for Java. You can use aspects to implement a limited form of DSLs in Java. They’ve been used quite successfully in Java EE frameworks like Spring (http://www.spring-framework.org) to offer concise little languages to the developer.

B.2. Lisp as the DSL

In section 2.3, we look at the role of metaprogramming and code generation in designing successful DSLs. One of the ways you can make your DSL acceptable to users is to have a concise surface syntax and yet be sufficiently expressive with the domain vocabulary. What this implies is that the host language has rich program-transformation semantics, whether at the compilation level or at the runtime level.

In section 2.3.1, you see how languages like Groovy offer runtime MOPs to generate code that alters program behavior through method synthesis, method interception, and other forms of meta-object manipulation. Runtime metaprogramming does have performance issues, because the transformation structures are being manipulated through the reflection and introspection of the meta-objects.

Languages like Lisp offer compile-time metaprogramming using macros, which we discuss in section B.1. It’s the power of syntactic macros that make the Lisp runtime completely free of any metastructures; a Lisp program execution has to deal only with the valid Lisp forms that are defined as part of the core language runtime. This makes Lisp metaprogramming special and syntactic macros the bedrock of DSL implementation in Lisp. In this section, we’ll go into further detail about the structure of Lisp programs and try to understand the differences between them and other language implementations.

 

When you write a program in most languages, what you’re writing is being represented as the concrete syntax tree (CST). The CST is the faithful representation of your program, including the white spaces, comments, and any metainformation you generate. Your program is then passed through scanners, lexical analyzers, and parsers to generate what we call the abstract syntax tree (AST). The AST is the syntactic essence of your program that gets pipelined into the next phases of the compilation process. It goes through the usual steps of transformation, optimization, and code generation. The parser of your language is primarily responsible for all such transformations that lead to the generation of the AST from the CST.

 

B.2.1. What’s so special about Lisp?

What makes compile-time metaprogramming difficult in languages like Java or C++? In order to work effectively with compile-time metaprogramming, you need to make transformations on the AST of the program. Read the sidebar in this section for a brief outline of abstract and concrete syntax trees.

In most languages like Java or C++, a program is represented as a string of characters. The only way the AST can be generated from the CST is through the language parser, which can parse only valid syntax for that language. The parser isn’t available as a separate module during the precompilation phase of the program. (Well, this isn’t strictly true. Now there are languages like Template Haskell and MetaOCaml that implement compile-time metaprogramming using macros. I talk about them briefly in chapter 9.) Because the parser isn’t available during this time, new syntax processing or precompile-time program transformation in these languages is restricted to one of the following primitive techniques:

  • Macros based on textual substitutions through a compiler preprocessor as in C
  • Selective preprocessing during the precompiler phases through annotations (as in Java) or templates (as in C++)
  • Instrumentation of bytecodes as in AOP using AspectJ in Java

If your background is in C, you’re probably ruminating on the extremely messy, painful, and error-prone artifacts that preprocessor macros gave you as part of that language. This is where Lisp shines. Lisp has been designed ground-up with an infrastructure that supports syntactic extensibility. The first step toward this outcome was decided when John McCarthy, the father of Lisp, decided that the language should have access to its abstract syntax.

So far I’ve mostly been talking about macros that process the AST and transform new syntax to original Lisp forms. Macros are what make Lisp extensible, but the real reason behind the power of macros lies in the design of the language itself. Let’s look at some of the philosophies of Lisp language design that make it significantly different from what you see in Java or C++.

B.2.2. Code as data

In Lisp, every program is a list structure, which is the AST of the code itself. As a result, code is seen as having the same representation and syntax as data. By standardizing on this simple protocol, the language publishes the abstract syntax to the programmer. This abstract syntax is simple—it’s a list. Any metaprogram that you generate using Lisp needs to conform only to this simple, standard representation.

B.2.3. Data as code

By using the QUOTE special form, data syntax can be embedded easily into the code syntax. Lisp macros are good examples of this philosophy. In fact, Lisp extends this paradigm of data as code and offers a full-blown template mechanism for writing metaprograms. In Common Lisp, we call this quasiquotation. In Clojure, the same feature is implemented through syntax quote, unquote, and splicing unquote. Here’s the definition of the macro defstruct in Clojure:

(defmacro defstruct
 [name & keys]
 `(def ~name (create-struct ~@keys)))

The syntax quote character, represented by a backquote (`) causes the form that follows it to be interpreted as Lisp data and works like normal quoting. But within the syntax quoted form, the unquote character (~) turns off quoting and lets Lisp compute that form. A similar technique in Common Lisp, the quasiquotation, lets you write templates of data in which some parts of the data are fixed, but the others are computed. It’s pretty much a complete template sublanguage embedded within the syntax of Lisp.

We’ll take a detailed look at this Lisp feature when I talk about metaprogramming in detail in chapter 5. If you’re not used to the paradigms of Lisp programming, now might be a good time for you to take a break and think about the awesomeness and dynamism that this feature can bring to your code generation capabilities.

B.2.4. A simple parser that parses only list structures

Lisp is a language with minimal syntax. The Lisp parser is so simple because all it needs to parse are lists! Both the data and code syntax are represented uniformly using list structures. The Lisp macro body is also a list structure.

Lisp is a homoiconic language, and as you now know, it has powerful compile-time metaprogramming abilities. You might be wondering how this relates to the awesomeness of Lisp in implementing DSLs. The answer is simple: stick to the Lisp philosophy of representing your DSL as a list structure and organize repeatable constructs and patterns as macros. If you design your DSL this way, you won’t need a separate parser for your DSL. You can use the Lisp parser to parse something that isn’t even a valid Lisp form and extend the language with new syntax and semantics that speak the vocabulary of your domain. Look at figure B.4 for a visual of the Lisp-as-the-DSL metaphor.

Figure B.4. Lisp as the DSL. Lisp macros get transformed into valid Lisp forms and get submitted to the compiler.

 

Definition

Homoiconic is a five-dollar word that describes a language in which a program can be represented using a data structure that the language can process. In the case of Lisp, it’s a list that makes a uniform representation for code and data.

 

By looking at figure B.4, can you figure out how Lisp unifies the differences between external and internal DSLs? On one hand, there’s external syntax in our DSL, also known as macros. Macros aren’t valid Lisp forms. At the same time, we don’t need an external parser to process those external forms. The ubiquitous list data structure unifies them all and the native Lisp parser serves as the universal processor for our DSL. All these capabilities that I’ve discussed make Lisp almost a perfect language for implementing custom DSL structures.

Metaprogramming is a technique that lets you write programs that write programs. You just saw how Lisp achieves this through compile-time macros. Section B.1 gives you a general overview of compile-time metaprogramming. This section describes a concrete implementation in one of the earliest metaprograming-powered languages. When you have a solid understanding of the power of meta, you’ll appreciate how you can exploit this paradigm in real-life implementations of DSLs. In chapters 4 and 5, you’ll find lots of examples of compile-time and runtime metaprogramming using dynamic languages like Ruby, Groovy, and Clojure.

B.3. References

  1. T. Veldhuizen. Expression templates. C++ Report, 7 (5) pp. 26-31, June 1995.
  2. Blitz++, http://www.oonumerics.org/blitz/.
  3. AspectJ, http://www.eclipse.org/aspectj/.
  4. Kiczales, G., J. Lamping, A. Mendhekar, C. Maeda, C. Lopes, J.-M. Loingtier, and J. Irwin. 1997. Aspect-Oriented Programming. Proceedings of the European Conference on Object-Oriented Programming, pp. 220-242.
  5. Kiczales, Gregor, Jim des Rivieres, and Daniel G. Bobrow. 1991. The Art of the Metaobject Protocol. The MIT Press.
  6. Steele, Jr, G.L. Growing a language. Higher-Order and Symbolic Computation 12 (1999), pp. 221-236.