Chapter 2. The DSL in the wild – DSLs in Action

Chapter 2. The DSL in the wild

This chapter covers

  • Designing your first Java-based DSL
  • Using Groovy to make your DSL more expressive
  • Patterns of DSL implementation
  • Choosing a DSL type

In the previous chapter, you saw how DSLs improve communication between the development team and the domain experts. We discussed the overall architecture of DSLs and the various execution models that they support. But what good are those DSLs without a meaningful, real-world use case? Given a real-world problem, how can you judge whether designing a DSL would be a better solution than using the traditional model of software development? In this chapter, we dive into these real-world pragmatics of DSL design.

We’ll start with a motivating example of the ground-up design, implementation, and refinement of a real-world DSL from our preferred domain of the financial brokerage business. We’ll look at a couple of implementations, then proceed to explain some of the general patterns that you’ll come across when you design DSL implementations. Figure 2.1 shows a visual roadmap of how we’re going to explore real-world DSLs in this chapter.

Figure 2.1. Roadmap for chapter 2

In every section, we’ll discuss a real-world application of DSLs, either in the form of an implementation use case or as a collection of patterns that you can use in your own model. At the end of this chapter, you’ll know how to think in terms of modeling your problem domain using DSL-based paradigms. I’ll show you a typical, API-based model and a DSL-based model side-by-side and you’ll learn to appreciate how the latter makes a more expressive presentation to your domain users.

2.1. Building your first Java DSL

An example is worth a thousand words. As I hinted in chapter 1, the examples we’ll be working with are primarily from the financial securities domain, with specific references and explanations to set up the context of the implementation. (Be sure to read the sidebars for details about this domain.) Not only will the explanations help you understand the specific domain, you can refer to them when we discuss examples of DSL implementations that are related to these concepts. Because the examples use the same domain as a basis, you’ll be able to improve and add to the DSL snippets as we move along.

In section 1.3, we saw Bob, the trader, working on snippets of the DSL that processes client orders before placing them in the stock exchange for the trade transaction. Let’s build on that scenario as you develop your first DSL.

Suppose you’re in charge of implementing a DSL that processes orders using domain vocabulary similar to what Bob was using. As you saw in chapter 1, one of the primary forces that drives DSL development is the involvement of a domain expert. With a sufficiently expressive DSL, he can comprehend the business rules and logic that your development team implements. He can verify the logic before the code base gets out of the development labs. You can even involve him in writing functional test suites as a user of your DSL. Not only do you get comprehensive test coverage using the domain knowledge of an expert, your DSL also gets to pass a real-world usability check. As the leader of the project, it’s extremely important that you orchestrate the involvement of the Bobs of your team early on in the process.

 

Financial brokerage system: processing client orders

As we discussed in chapter 1, the trading process involves buying and selling securities in the market place, guided by the rules of the stock exchange. These transactions take place in response to orders placed by investors through registered agents. These agents can be brokers, clearing banks, or financial advisers. A typical order from a client consists of information like the security to be transacted (buy or sell), quantity, and the unit price details. All these elements specify any constraint that the counterparty wants to impose on the price of execution. The following steps are performed from when the order is placed until the execution notice of trades is generated:

  1. The investor places the order with the agent.
  2. The agent records the order and forwards it to the stock exchange.
  3. The order is executed and the notice of execution comes back to the agent.
  4. The agent records the execution details and passes the notice to the investor.

 

Let’s assume that the DSL snippet you implement builds new orders for a specific client request. The language, needless to say, speaks the vocabulary of the domain and allows the user (Bob on our team) to manipulate all combinations of order processing rules within the semantic constraints of valid business rules. Don’t get hung up on the best syntax to use for this DSL right at the beginning. As I mentioned in chapter 1, DSLs always need to evolve iteratively and are never done right the first time. In the following sections, you’ll learn how the order-processing DSL evolves gradually, how its expressivity increases depending on the implementation language you select, and how the example culminates in an expressive enough language that makes Bob happy. The important thing is to start the process with a limited scope and a moderate expectation. But, as you learned in chapter 1, any exercise in DSL building starts with setting up the common vocabulary across the stakeholders of the project.

2.1.1. Setting up the common vocabulary

Bob looked at the problem domain, identified the core requirements, and immediately came up with the necessary language constructs for the order-processing DSL. They are shown in table 2.1.

Table 2.1. Preliminary vocabulary for a DSL that processes orders

Domain concept

Details

  1. New order
  • Must specify an instrument name.
  • Quantity should be mandatory.
  • Whether to buy or sell needs to be specified.
  • An order can be specified as all-or-none, indicating that either the whole order needs to be completed or that none of it is completed. No partial orders should be fulfilled.
  1. Order pricing
  • Unit price needs to be mentioned.
  • Examples of unit price are limit-price, limit-on-close-price, and limit-on-open-price.
  1. Order valuation
  • The full order needs to be valued based on a pricing scheme.
  • The pricing scheme can be predetermined or the user can specify an ad-hoc scheme inline

Now that the vocabulary is in place, we’ll start the initial implementation in the dominant language of our programming community—Java. Java has the highest number of developers in the industry. Anything you can build with Java as the backbone has huge potential for seamless acceptance within the community. Let’s start the exercise and explore the limits of expressiveness that Java offers as an implementation language. Our goal is to make Bob feel comfortable as he steps in to write the functional tests and validate the business rules.

2.1.2. Your first Java implementation

Java is an object-oriented (OO) language. As the first step in designing the DSL, you need an object representation of the Order abstraction that encapsulates the various attributes of a client order.

Building the Order Abstraction

The following listing is the Order class in Java that Bob will use to process new orders.

Listing 2.1. Order abstraction for Java DSL

The implementation of the class shown in this listing uses some of Java’s common idioms and design patterns to make the published API more expressive. The builder design pattern lets the user of the API construct orders incrementally. The pattern uses fluent interfaces that provide an easy-to-read representation of the domain problem. (I discuss fluent interfaces more in chapter 4.) By using the builder as the mutable object, you ensure the immutability of the Order data members for easier concurrency. One of the effects of using a builder to construct an object is that the core abstraction becomes immutable.

 

Definition

The Builder design pattern is commonly used to build objects incrementally. It separates the process of constructing the object from its representation, so that multiple representations can use the same process. For more information, see [5] in section 2.6.

 

That’s the implementation part of the Builder pattern. We’ll come back to some of the issues in the code. First, let’s find out how the DSL shapes up in real world when Bob uses it.

Using the Order Builder

The following usage snippet has sufficient domain vocabulary density; almost all the keywords that we noted in table 2.1 are in the published API language:

But even though we’ve used the right vocabulary, the DSL is also Java, so it has to abide by the syntax restrictions and verbosity that Java requires as a programming language. The call to valueAs takes as input an implementation artifact that you have to specify nonlocally to the current context. Java doesn’t support higher-order functions out of the box, so we can’t specify a pretty inline valuation strategy. For the Java implementation, the user of the DSL can define only concrete implementations for each of the order valuation strategies. In the DSL implementation, we define the contract for order valuation as an interface:

public interface OrderValuer {
  int valueAs(int qty, int unitPrice);
}

 

Simulating higher-order functions in Java

Though Java doesn’t support higher-order functions out of the box, some libraries simulate them by using objects. See lambdaJ (http://code.google.com/p/lamb-daj), Google Collections (http://code.google.com/p/guava-libraries), and Functional Java (http://functionaljava.org) for samples. If you’re stuck with Java, these libraries provide options for modeling higher-order functions. The drawback is that these options are quite verbose and definitely not as elegant as those offered by languages like Groovy, Ruby, or Scala.

 

The DSL user defines separate concrete implementations for specific valuation strategies:

public class StandardOrderValuer implements OrderValuer {
  public int valueAs(int qty, int unitPrice) {
    return unitPrice * qty;
  }
}

Now Bob can’t define his valuation policies inline, which was one of his original requirements. He thinks that’s a major deterrent, given that we’ve claimed that DSLs can help nonprogramming domain experts write meaningful functional tests. He has other observations about the order-processing DSL:

  • Verbosity in syntax— The language contains lots of unnecessary parentheses and other extra flourishes that interrupt the flow and get in the way of a nonprogrammer domain expert.
  • Extra nondomain complexity in syntax— Bob’s referring to the Builder class that had to be explicitly used by the DSL user. The DSL could have been implemented without using the complexities of the Builder class. We could have used chained setter methods of the Order class itself to build fluent interfaces. But the Builder class encourages immutable abstraction design without mutable properties. Can we get rid of this additional syntax from our language? Using more abstraction power, we can hide the explicit builder from the surface syntax and make it even more succinct:
new Order.toBuy(100, "IBM")
         .atLimitPrice(300)
         .allOrNone()
         .valueAs(new StandardOrderValuer())
         .build();

This solution only pushes the complexity from the syntax to the implementation. The bottom line is that the verbosity remains at the implementation level, if not at the usage level of the DSL.

Analyzing the Java DSL

We as Java programmers can fully appreciate the concerns that an explicit Builder pattern addresses and that make APIs fluent. The Java-based DSL that we designed looks pretty good when Java programmers are using the DSL. But there’s no denying the fact that we can overcome the verbosity of Java by using an implementation language that’s more expressive to its users, yet results in a more concise code base. Let’s analyze the Java code in more detail and look at the Java features that lead to the syntactic complexities that Bob complained about. Table 2.2 lists the Java features that map to Bob’s reported issues.

Table 2.2. Mapping issues reported against Java’s limitations

Issue reported

Responsible Java feature

Verbose (unnecessary parentheses and syntax)
  • Part of basic Java syntax.
  • Parentheses are mandatory for functions. Dots are mandatory for method dispatch on objects and classes.
Nondomain complexity
  • Java is not a malleable language. Many common idioms need to be expressed through additional layers of indirection, also known as design patterns.
  • Additional class structures need to be constructed as part of the abstraction design. Some bubble up as surface syntax in the final published API. The Builder class is an example of such unnecessary syntactic barriers that came up in our earlier DSL.
  • Java is not an interpreted language. Executing any snippet of Java code requires you to define a class with a public static void main method. Ultimately these are perceived as added syntactic noise by the DSL user.
Inability to express inline valuation strategy function
  • Java doesn’t offer higher-order functions as first-class features of the language.

In the following sections, we’ll explore options that can honor Bob’s suggestion of making the DSL friendlier to the domain experts.

2.2. Making friendlier DSLs

The expressiveness of a DSL is judged by your user. In this case, Bob has identified areas in your Java-based solution that need to be more closely aligned to the problem domain. Let’s try to make the DSL friendlier for Bob to use. One of the strategies you’ll look at introduces an additional layer in the form of XML that externalizes the domain language in a more human-readable form. The second strategy discusses implementing the DSL in an entirely new and more expressive programming language, Groovy.

2.2.1. Externalizing the domain with XML

XMLs are frequently used for business markups, so why not use XML for designing the domain language in our application? XML has rich tooling support, is recognized by all browsers and IDEs, and has a slew of frameworks and libraries for parsing, processing, and querying purposes.

True, XML is externalizable in the sense that a domain expert can write XML structures that are separate from the programming machinery. But XML is completely declarative, inordinately verbose, and doesn’t easily support the expression of control structures. The following snippet shows sample XML for the order-processing DSL shown in listing 2.1. I’ve intentionally elided parts of it to avoid showing the ugliness that arbitrary expressions can bring to an XML structure.

<orders>
  <order>
    <buySell>buy</buySell>
    <quantity>100</quantity>
    <instrument>IBM</instrument>
    <limitPrice>300</limitPrice>
    <allOrNone>true</allOrNone>
    <valueAs>...</valueAs>
  </order>
  ...
</orders>

The idea behind XML is not to do programming, but to express document structures in a completely portable way. DSLs often contain control structures that can’t be expressed elegantly in XML. Many Java EE and XML (Java Platform, Enterprise Edition) frameworks use XML to provide declarative configuration parameters. But if you try to write business logic and domain rules using XML, you’ll soon hit the same bottleneck of expressivity that our Java implementation faced before. Try a more direct approach, without going beyond the boundaries of your natural programming language. Remember, the language is the most powerful programming tool that you have.

2.2.2. Groovy: a more expressive implementation language

By now you must have realized that you’re trying to design a DSL that fits within the confines of the underlying implementation language. The DSL that clients will be using is the same language that the DSL is implemented in. In your first attempt, all the problems that Bob mentioned are the innate limitations of the Java programming language, which you couldn’t work around in your implementation of the DSL. The technique used is called embedding the DSL within the host language, which you’ve already seen in section 1.7 when we discussed the taxonomy of internal and external DSLs.

Now let’s now try to embed our DSL in a language that’s more expressive than Java. Groovy is a language that runs on the JVM, is more expressive than Java, is dynamically typed, and supports higher-order functions.

A Groovy solution

As you progress through this book, you’ll look at the features of Groovy that can help you design better DSLs. You are going to implement the order-processing DSL using Groovy, but first, here’s a sample of that DSL in Groovy that has the same functions as the earlier Java example:

newOrder.to.buy(100.shares.of('IBM')) {
  limitPrice   300
  allOrNone    true
  valueAs      {qty, unitPrice -> qty * unitPrice - 500}
}

This snippet creates a new client order for buying 100 shares of IBM at a limit price of 300 dollars in an all-or-none mode. The order valuation is computed using the specified formula. The end result is the same as the earlier Java example; the difference is the expressivity that the higher-order abstractions of Groovy bring to the implementation. DSL constructs like 100.shares.of(IBM) are possible only because Groovy offers fabulous metaprogramming capabilities. This makes the language more natural to the domain user. The following listing is the complete implementation of the DSL in Groovy.

Listing 2.2. Order processing DSL in Groovy

In the following sections, I’m going to be a cheerleader for DSL-based implementations. I’m going to only touch on the features of Groovy that stand out with respect to this specific implementation. In chapters 4 and 5 will cover in detail all the features that make Groovy a great language for DSL implementation. For now, let’s look at specific Groovyisms that make this expressivity possible.

Method synthesis using methodMissing

You can invoke nonexistent methods in Groovy; methodMissing offers the hook to intercept all such invocations . In the order-processing DSL, every invocation of methods like limitPrice and allOrNone is intercepted by methodMissing and converted to calls of property setters on the Order object. The methodMissing hook provides conciseness in the code base and flexibility when you’re adding method calls without explicit definitions.

Groovy metaprogramming techniques for dynamic method injection

Using metaprogramming techniques, we’ve injected methods into built-in classes like Integer that add to the expressivity of the language. The method getShares adds a property named shares to the class Integer that makes a great combinator for forming the natural flow of the DSL .

First-class support for higher-order functions and closures

This support is possibly the most important feature that makes languages like Groovy shine over Java in offering expressive DSLs. The difference this makes is huge; just look at the valueAs method invocations in the Groovy and Java versions of the language.

Now you’ve got your DSL implementation in Groovy and a DSL usage snippet. But you still need the mechanism to integrate the two and set up an execution environment that can execute any instance of the DSL supplied to it. Let’s see how to do that.

2.2.3. Executing the Groovy DSL

Groovy has scripting abilities. Any Groovy code can be executed through the interpreter and you can use this Groovy power to set up an interactive execution environment for your order-processing DSL. Enter the DSL implementation (listing 2.2) in a file called ClientOrder.groovy. Enter the usage snippet in another text file named order.dsl. Make sure that both are in classpath, then submit the following script to the Groovy interpreter:

def dslDef = new File('ClientOrder.groovy').text
def dsl = new File('order.dsl').text
def script = """
  ${dslDef}
  ${dsl}
"""
new GroovyShell().evaluate(script)

 

Integrating a DSL into your core application

The example in this section shows only one way of integrating DSL implementation along with the DSL invocation. We’ll talk about more integration methods in chapter 3 when we discuss integrating DSLs into your core application.

The example uses string concatenation to build the final script that gets executed. One disadvantage of this approach is that if there are any errors in execution, the line numbers in the stack trace won’t match the line numbers in the source file order.dsl. As I’ve mentioned, building a DSL and integrating it with your application is an iterative process. We’ll improve on this strategy in chapter 3 when we discuss yet another method of integrating a Groovy DSL into your application.

 

Congratulations! You’ve successfully designed and implemented a DSL that’ll make any domain person happy. The Groovy-based order-processing DSL that you’ve implemented fulfils expressivity criteria that puts it way ahead of the earlier Java version. More importantly, you know that DSL design is an iterative process. Had we not developed the Java version, you wouldn’t have realized the importance of using a more expressive language as the base of the implementation.

 

In part 2 of this book (chapters 4-8), we’ll look at other DSL implementations, not only in Groovy, but in other JVM languages like Scala, Clojure, and JRuby. This comparison will help you realize how DSL implementation techniques can vary depending on the features that the underlying host language offers.

 

Now that you’ve seen a complete implementation of a DSL that solves a real-life use case, you’ve got an inside-out view of how an implementation evolves through the stages of successive refinement. The Groovy implementation turned out to be expressive to the users. But what are some of the underlying implementation techniques that contributed to its expressiveness?

Depending on the language you choose for implementing an internal DSL, you get some of these techniques for free. Building a well-designed DSL is the art of mixing the idioms of the host language and these techniques in a way that transforms that host language into the shape of your DSL. You used some of the techniques that Groovy offers when you designed your implementation. But not all DSLs are alike. Like every other language, there are definite patterns in DSL design that depend on the platform of your implementation, the core skill set of your team members, the overall architecture of your application, and other constraints related to your development ecosystem.

Up next, we’ll take a look at some of the implementation patterns of DSLs. Patterns are like ready-made packages of reusable design knowledge that you can use in your own implementations. They teach you how to make friendlier DSLs using the power of the host language. In the next section, you’ll learn about the variations in patterns that internal and external DSLs exhibit under the constraints of a particular implementation. You can’t implement all these patterns in every language, but you need to understand all the patterns so that you can make the optimal choice within your implementation platform.

2.3. DSL implementation patterns

Classifying DSLs as internal or external is too broad a definition, considering the multitude of architectural patterns that these languages implement in practice. All internal DSLs share the common trait of being built on top of a host language. The common trait of all external DSLs is that they build their language infrastructure from scratch. It’s not only the commonality of their origin that characterizes the entire taxonomy of DSLs. As we saw in chapter 1, a DSL is an embodiment of good abstraction design principles. To design a good abstraction, you need to consider not only the commonality of forms between the participating components but also the variabilities that each exhibits.

In the next two sections, we’ll explore some of these patterns of variability. When you have an idea of the patterns that exist even within the same family of DSLs, you’ll be able to map your own DSL requirements to the concrete implementation architecture more easily. The more you identify such recurring patterns, the easier it’ll be for you to reuse your abstractions. As you learned from our discussions in appendix A about designing abstractions, when you can reuse your abstraction, the language that you design becomes more extensible. In case you haven’t read appendix A yet, do that now. The information it contains will help you during your journey through the rest of this book.

2.3.1. Internal DSL patterns: commonality and variability

Internal DSLs are everywhere. With languages like Ruby and Groovy offering flexible and concise syntax and a powerful metaprogramming model, you can find DSL development that piggybacks these capabilities in almost every piece of software. The common pattern across all internal DSLs is that they are always implemented on top of an existing host language. I tend to use the term embedded more when I talk about internal DSLs, because it makes one aspect of their architecture explicit. You can use the infrastructure of an existing language in a number of ways, each of which results in DSL implementations that vary in form, structure, flexibility, and expressivity.

Internal DSLs manifest primarily in two forms:

  • Generative— Domain-specific constructs are transformed to generate code through compile-time macros, preprocessors, or some form of runtime meta-object protocol (MOP).
  • Embedded— Domain-specific types are embedded within the type system of the host language.

Even this micro-classification is not entirely without its share of ambiguity. Consider Ruby and its accompanying web framework Rails, written as an internal DSL in Ruby. From that point of view, Rails is embedded within Ruby. But Rails also uses Ruby’s metaprogramming power to generate lots of code during runtime. From this point of view, it’s generative as well.

Let’s consider some of the statically typed languages that purely embed DSLs within the type system of the host language. Haskell and Scala are the dominant players in this category; the DSL that you design inherits all the power of the host type system. Finally, there are language extensions to Haskell (Template Haskell) that add generative capabilities to the language through macros.

We have numerous variations even within the classification of internal DSLs, including instances when a single language offers multiple paradigms of DSL development. Figure 2.2 shows a diagrammatic view of the taxonomy and some languages that implement these variations.

Figure 2.2. An informal micro-classification of patterns used in implementing internal DSLs

In this section, we’ll look at these common variations found among some of the internal DSL implementation techniques, using figure 2.2 as a reference.

 

Chapters 4 and 5 are supplements to the material in this section. In those chapters, we’ll discuss DSL patterns and implementations in much more detail.

 

Smart API

Smart API is possibly the simplest and most frequently used implementation of internal DSLs you’ll encounter. This technique is based on chaining methods in sequence similar to the Builder pattern implementation (see [1] in section 2.6). Martin Fowler calls the Smart API a fluent interface (http://www.martinfowler.com/bliki/FluentInterface.html). For this pattern, you create APIs that get wired up in the natural sequence of the domain action that you’re trying to model. This process makes it fluent, and the domain-based method names make it readable and meaningful to the DSL user. The following code snippet is from the Guice API (http://code.google.com/p/googleguice/), which is the dependency injection (DI) framework from Google. If you’re the user trying to wire up a Java interface with an implementation as a declarative module of your application, the following use of the API seems to flow naturally and expresses the intent of your use case:

binder.bind(Service.class).to(ServiceImpl.class).in(Scopes.SINGLETON)

Figure 2.3 illustrates how the APIs chain forward through repeated invocations on the returned object.

Figure 2.3. Smart API using method chaining. Note how the method calls progress forward and return only at the end to the client.

With method chaining, you use the infrastructure of the host language and build Smart APIs that speak the vocabulary of your domain. The drawback of this technique is that it can lead to the proliferation of many small methods that might not make much sense on their own. Also, not all use cases can be implemented using fluent interfaces. Typically, using the Builder pattern to incrementally construct and configure objects is most effective when you use method chaining to model the process; the Java implementation of the order-processing DSL in section 2.1 is an example. In languages like Groovy or Ruby that offer named arguments, the Builder pattern and fluent interfaces become somewhat redundant. (Named arguments with defaults are also available in Scala 2.8.) For example, the previous Java snippet turns into a more concise yet expressive Groovy code using a mix of normal and named parameters:

binder.bind Service, to: ServiceImpl, in: Scopes.SINGLETON

Smart API is a common pattern used in internal DSLs. The exact implementation of it depends on what language you’re using. The main takeaway from this discussion is: Always choose the most idiomatic implementation technique when you’re using DSL patterns. I’ll come back to this topic with more examples and implementation variations when I talk about fluent interfaces in the context of internal DSL implementation in chapter 4. For now, let’s move on to another pattern.

Syntax tree manipulation

Syntax tree manipulation is yet another option that’s used for implementing internal DSLs. The design follows the interpreter pattern (see [1] in section 2.6) and uses the infrastructure of the host language to create and manipulate the abstract syntax tree (AST) of the language. After you’ve generated the AST, it’s your responsibility to traverse the AST and do the manipulations that will generate the necessary code for the domain logic. Groovy and Ruby have developed this infrastructure through library support that can generate code by manipulating the AST.

Come to think of it, this is what Lisp offers you out of the box with its language infrastructure. In Lisp, every program is a list structure, which is the AST that the programmer has access to. Manipulating the AST to generate code is the basis of the Lisp macros. You can extend the core language syntax by manipulating the AST.

Typed embedding

DSL patterns based on metaprogramming rely on code generation techniques to keep the interface of the DSL precisely at the level of abstraction that the domain demands. But what if your host language doesn’t support any form of metaprogramming? When you’re designing a DSL, it’s extremely important to be minimal in what you offer to your users as the syntax of the language. The more support the host language infrastructure provides for abstraction, the easier it is for you to achieve this minimalism.

Statically typed languages offer types as one of the means to abstract domain semantics and make the surface syntax of your DSLs concise. Instead of generating code to express the domain behavior you want, you can define domain-specific types and implement them in terms of the types and operations offered by your host language. These types will form the language interface of your DSL that the user will be working with; he won’t care about their concrete implementations. Typed models come with a guarantee of some level of implicit consistency in your programming model. Figure 2.4 is a snapshot of what types can offer to your DSL.

Figure 2.4. An embedded typed DSL comes with lots of implicit guarantees of consistency. Use a type to model your DSL abstraction. The constraints that you define within your type are automatically checked by the compiler, even before the program runs.

The biggest advantage of this technique is that because your DSL’s type system is embedded in the type system of the host language, your type system is automatically type-checked by the language compiler. Your DSL users will be able to take full advantage of the IDE integration capabilities of the host language like smart assist, code completion, and refactoring.

Consider the following example in Scala that models the abstraction of a Trade. In this example, Trade, Account, and Instrument are domain-specific types that have business rules encapsulated within them . With Ruby or Groovy we generated additional code to implement domain behavior; in Scala we implement similar semantics within types and leave it to the compiler to check for consistency.

Languages like Haskell and Scala that offer advanced static typing let you design purely typed embedded DSLs without resorting to code generation techniques, preprocessors, or macros. As a DSL user, you can compose typed abstractions using combinators that are implemented in the language itself. The type systems that these languages offer provide advanced capabilities like type inferencing and support of higher-order abstractions that make your language concise yet sufficiently expressive. Paul Hudak demonstrated this with Haskell in 1998 (see [2] in section 2.6), when he used the techniques of monadic interpreters, partial evaluation, and staged programming to implement purely embedded DSLs that can be evolved incrementally over time. Christian Hofer, et al discuss similar implementations with Scala in [3] in section 2.6. They also discuss how you can polymorphically embed multiple implementations within a single DSL interface using the techniques of Scala traits, virtual types, higher-order generics, and family polymorphism. In chapter 6, I’ll use sample implementations to explain how static typing in Scala helps you to design pure, embedded domain-specific languages (EDSLs).

 

Definition

Monads figure in a model of computation popularized by Haskell. Using monads, you can compose abstractions, following predefined rules. I discuss monadic structures in chapter 6 when I talk about DSL implementations in Scala. For more information, go to http://en.wikipedia.org/wiki/Monad_(functional_programming).

 

Now we’re going to talk about several metaprogramming patterns that we use frequently in DSL implementations. Languages that support them can’t thrive without them. In the world of DSLs, metaprogramming offers one of the richest techniques to design custom syntax for your DSL.

Reflective metaprogramming

You can apply patterns at a local level of implementation; Smart API was an example of that. But when you design a DSL, you might need to adopt patterns as general implementation strategies. They shape the way you structure your whole implementation and they’re one of the key features of your host language. In our discussion of implementation patterns for internal DSLs (refer to our roadmap in figure 2.1), metaprogramming is one such concept that manifests itself in various forms when you design a DSL. Reflective metaprogramming is the pattern that we’ll discuss in this section.

Suppose you’re designing a DSL where you need to read stuff from configuration files and invoke methods dynamically, depending on the contents of the file. Here’s a real-life example in Ruby that reads from a YAML file, composes the method name, and dynamically invokes the method using arguments read from the file:

YAML.load_file(x_path).each do |k, v|
  foo.send("#{k}", v) unless foo.send(k)
end

Because the method name isn’t known until runtime, we’ll use the metaprogramming abilities of Ruby to do a dynamic dispatch on the object using Object#send(), instead of the usual dot notation of invoking methods statically. This coding technique is reflective metaprogramming; Ruby discovers methods at runtime and does the invocation. DSL implementations that deal with dynamic objects use this technique to delay method invocations until the last moment when it gets the complete information, maybe from configuration files.

Runtime metaprogramming

Unlike reflective metaprogramming, which discovers existing methods at runtime, you can use other forms of metaprogramming that can generate code dynamically during runtime. Runtime metaprogramming is another way by which you can achieve small surface syntax for your DSL. It makes your DSL look lightweight on the surface; the heavy lifting of code generation is transferred to the backend infrastructure of your host language.

Some languages expose their runtime infrastructure components as meta-objects that programmers can manipulate. In Ruby or Groovy, you can use such components in your programs to dynamically change the behavior of meta-objects during runtime and inject new behavior to implement your domain constructs. Figure 2.5 shows a brief overview of the runtime behavior of metaprogramming in Ruby and Groovy.

Figure 2.5. Languages that support runtime metaprogramming let users generate code on the fly. This code can add behaviors dynamically to existing classes and objects.

In the order-processing DSL that we developed in section 2.1, we used this same technique in Groovy to generate additional methods like shares and of in built-in classes like Integer. These methods don’t have any meaningful role to play in the semantics of the action that the DSL performs. Rather, they serve as useful glue to make the language more natural to the domain we’re modeling. Figure 2.6 annotates the return types of each method that’s called in sequence for a section of the Groovy-based DSL. You can see how the power of metaprogramming generates code during runtime to string together the new methods and adds to the expressivity of the language.

Figure 2.6. Enriching the domain syntax through runtime metaprogramming

Rails and Grails are two of the most powerful web development frameworks that use the power of runtime metaprogramming. In Rails, when you write the following snippet, the Ruby metaprogramming engine generates all the relevant code for the relational model and validation logic, based on the definition of the Employees table.

class Employee < ActiveRecord::Base {
  has_many :dependants
  belongs_to :organization
  validates_presence_of :last_name, :title, :date_of_birth

  # ..
}

Runtime metaprogramming makes your DSL dynamic by generating code during runtime. But there’s another form of code generation that takes place during compilation and doesn’t add any overhead during runtime. Our next DSL pattern is compile-time metaprogramming, which is mostly found in the Lisp family of languages.

Compile-time metaprogramming

With compile-time metaprogramming, you can add custom syntax to your DSL, much like you can with the pattern you just learned about (runtime metaprogramming). Although these patterns are similar, there are some crucial differences between the two, as table 2.3 makes clear.

Table 2.3. Comparison of compile-time and runtime metaprogramming

Compile-time metaprogramming

Runtime metaprogramming

You define syntax that gets processed before runtime, during the compilation phase. You define syntax that gets processed through the MOP of the language during runtime.
No runtime overhead because the language runtime has to deal only with valid forms. Some runtime overhead because meta-objects are processed and code is generated during runtime.

In typical implementations of compile-time metaprogramming, the user interacts with the compiler and generates program fragments during the compilation phase.

 

Macros are the most common way to implement compile-time metaprogramming. In section 4.5 we’ll delve into the details of how compile-time metaprogramming works, with specific examples from Clojure.

 

Preprocessor-based macros in C and templates in C++ are some examples of language infrastructure that can generate code during the compilation phase. But in the long history of programming languages, Lisp is the granddaddy of compile-time metaprogramming. C macros operate at the lexical level through textual substitution. Lisp macros work with ASTs and offer significant power in designing abstractions at the syntax level. Figure 2.7 shows a schematic diagram of how the custom syntax that you define in your DSL gets transformed through the macroexpansion phase into valid program forms, which are then forwarded to the compiler.

Figure 2.7. You use macros to do compile-time metaprogramming. Your DSL script has some valid language forms and some custom syntax that you’ve defined. The custom syntax is in the form of macros, which get expanded during the macroexpansion phase into valid language forms. These forms are then forwarded to the compiler.

That was the last of the internal DSL implementation patterns that we’ll discuss in this chapter. We’ve discussed various flavors of metaprogramming that you’ll find mostly in dynamic languages like Ruby, Groovy, and Clojure. We also talked about static typing and the benefits that it brings when you’re designing type-safe DSL scripts. In chapters 4, 5, and 6, we’ll get back to all these patterns and discuss each of them, with specific examples in each of the languages.

We started the chapter with the promise that we’d talk about real-world DSL design. We discussed DSL implementations in Java and Groovy, and we just now finished looking into the patterns that come up in internal DSL implementation. Each pattern is a snippet of experience that you, as a practitioner, should feel free to reuse. All of them have been used successfully in building real-world DSL implementations.

Now we’ll move on to the next obvious sequel of this discussion. What do you do when your host language doesn’t support the syntax that you’re looking for in your DSL? You need to get out of the confines of your host language and search for alternatives that you’ll need to build from scratch. You need to use external DSLs. In the next section, we’ll look at some of the implementation patterns that external DSLs offer.

2.3.2. External DSL patterns: commonality and variability

External DSL design follows the same lifecycle and principles of general-purpose language design. I know this statement is inherently repulsive and might drive you away from thinking about designing external DSLs in your next project. Although the statement is true in theory, it’s not all that grim when you consider that your DSL isn’t necessarily as complex in syntax and semantics as a general-purpose programming language can be. In reality, you can process some external DSLs by manipulating strings using regular expressions. But the only common trait of all external DSLs is that they aren’t implemented using the infrastructure of a host language.

External DSL processing consists of the following two broad phases, as figure 2.8 explains:

1.  Parse— where you tokenize the text and use a parser to recognize valid inputs

2.  Process— where you do the business processing on valid inputs that were recognized by the parser in the first phase

Figure 2.8. The processing stages of an external DSL. Note that unlike internal DSLs, the parser is now part of what you need to build. In internal DSLs, you use the parser of the host language.

If you’re designing a simple DSL in which the parser itself processes the inputs inline, the two phases might be combined. The more common and realistic approach is one in which the parser generates an intermediate representation of the input text. In various scenarios and for varying complexities of the DSL, this intermediate representation can be an AST or a more sophisticated metamodel of the language you’re designing. The parser can also vary in complexity, ranging from simple string processing to detailed syntax-directed translation (a parsing technique discussed more in chapter 8) using parser generators like YACC and ANTLR. The processing phase works on the intermediate representation and either generates the target output directly or can itself transform into an internal DSL that gets processed using the infrastructure of the host language.

In the following sections, we’ll briefly discuss each of the patterns that you’re likely to encounter in external DSL implementations. In chapter 7, we’ll have a more detailed discussion about the implementation aspects of each of them. Figure 2.9 lists some of the common patterns found in real-world external DSL implementations.

Figure 2.9. An informal micro-classification of common patterns and techniques of implementing external DSLs

Each pattern shown in figure 2.9 provides a way to describe the syntax of your DSL using a form that’s external to the host language of implementation. This means that the DSL script that you’ll write won’t pass as valid syntax in the implementation language. For each of these patterns, you’ll see how you can transform the custom DSL syntax into an artifact that can be consumed by your host language.

Context-driven string manipulation

Suppose you need to process business rules, but instead of traditional APIs you want to provide a DSL interface to your users. Consider the following example:

commission of 5% on principal amount for trade
                              values greater than $1,000,000

This is a string that doesn’t make sense in any programming language. With appropriate pre-processing and scrubbing, you can coerce it into valid Ruby or Groovy code. The parser will be a fairly simple one that tokenizes the string and does simple transformations through regular expression manipulations. The resulting form will be Ruby or Groovy code that can be executed right away as an implementation of the business rule.

Transforming XML to a consumable resource

Many of you have probably worked with the Spring DI framework. (If you’re unfamiliar with Spring, go to http://www.springframework.org.) One of the ways you can configure the DI container is through an XML-based specification file. You need to put all your dependencies of abstractions and implementations into this file. During runtime, the Spring container bootstraps the specification file and wires up all dependencies into a BeanFactory or ApplicationContext, which remains alive during the lifecycle of the application and serves up all the necessary context information. The XML specification that you write is an external DSL that gets parsed and persisted as a resource to be consumed by the application.

Figure 2.10 shows a schematic overview of how Spring uses XML as the external DSL to bootstrap its ApplicationContext abstraction.

Figure 2.10. XML is being used as the external DSL to abstract Spring configuration specification. The container reads and processes XML during startup and produces the ApplicationContext that your application uses.

Another similar example is the Hibernate mapping file that maps the database schema with your entity description files. (For more information about Hibernate, go to http://hibernate.org.) Both examples follow the parse and process stages of execution, albeit with different lifecycles and persistence strategies. They exhibit the commonality of external DSLs, but differ from the pattern we discussed earlier (context-driven string manipulation) in the form and complexity of the parser and the lifetime of the intermediate representation.

DSL workbench

The core concepts of metaprogramming that we discussed in the context of internal DSLs have been extended to another level by some of the language workbenches and metaprogramming systems that are currently available. When you write code in text form, the compiler needs to parse the code to generate the AST. What if the system already maintains the code that you write in the form of an AST? If a system could do that, the result would be easier transformation, manipulation, and subsequent code generation from the intermediate representation.

Eclipse Xtext (http://www.eclipse.org/Xtext) is a great example of a system that offers a complete solution for end-to-end development of external DSLs. Instead of storing your DSL in plain text form, it stores a higher-order representation of your DSL grammar in the form of metamodels. These metamodels can then be integrated seamlessly with a number of other frameworks like code generators, rich editors, and so on. Tools like Xtext are called DSL workbenches because they offer a complete environment for developing, managing, and maintaining their external DSLs. We’ll discuss designing DSLs based on Xtext with a detailed case study in chapter 7.

JetBrains’ Meta Programming System (http://www.jetbrains.com/mps/index.html) supports nontextual representations of program code that eliminate the need for code parsing. The code is always available as the AST, with all its annotations and references, and allows you to define generators that generate code in numerous programming languages. It’s as if you’re designing your external DSL in a metalanguage offered by the metaprogramming system of the workbench. You can define business rules, types, and constraints like you would do through your programming language. The difference is in the external presentation, which is much friendlier, might be graphical, and is easier for you to manipulate.

Looking back at figure 2.9, you’ve just learned about three of the commonly used techniques for external DSL implementation. We’ll discuss two others that might be the two most prominent techniques that you’ll use in your real-world applications. We’re almost to the end of our third milestone of the chapter. By now, you have a good appreciation for all the techniques of internal and external DSL implementations that we’ve discussed so far. I’m sure that you’re anxiously waiting to see some of them being used in the larger context of modeling a domain. We’ll be taking that journey soon.

Mixing DSL with embedded foreign code

Parser generator tools like YACC and ANTLR let programmers use syntax notation that’s similar to Extended Backus-Naur Form (EBNF) to declare the grammar of the language. The tool processes the production rules and generates the parser of the language. When you implement a parser, you usually want to also define some actions that your parser should take when it recognizes a fragment of input. One example of such an action is building up an intermediate representation of the input language string, which will be used in later stages by your application.

Tools like YACC and ANTLR let you embed host language code for action definitions within the production rules. Associated with each rule, you can write code fragments in C, C++, or Java that get bundled into the final parser code that the tool generates. This is a pattern of external DSL design in which you can extend the native DSL with foreign embedding in some other high-level language. We’ll discuss a complete DSL design using this pattern with ANTLR as the parser generator in chapter 7. Now let’s move on to our final classification.

DSL design based on parser combinators

This classification is the final unit shown in figure 2.9. Now we get to discuss one of the most innovative ways to design an external DSL. In the last section, you saw how you can use external tools along with embeddings of a programming language in YACC or ANTLR to generate a parser for your DSL. These tools generate efficient parsers of the grammar that you feed them. The drawback is that they aren’t exactly the friendliest of tools to use. Many of today’s languages offer a better alternative in the form of parser combinators.

In combination with a powerful type system, you can design parser combinators to be an expressive DSL that’s implemented as a library within the language itself. You can develop parsers using the full power of your host language artifacts like classes, methods, and combinators, without resorting to an external tool set.

Scala offers a parser combinator library as part of its standard library. Using Scala’s power of higher-order functions, we can define combinators that make the parser DSL look like declarative EBNF production rules. Check out the following grammar that’s declared using Scala parser combinators. It defines the grammar for a small order-processing language using pure Scala.

object OrderDSL extends StandardTokenParsers {
  lexical.delimiters ++= List("(", ")", ",")
  lexical.reserved += ("buy", "sell", "shares", "at",
    "max", "min", "for", "trading", "account")
  def instr = trans ~ account_spec
  def trans = "(" ~> repsep(trans_spec, ",") <~ ")"
  def trans_spec = buy_sell ~ buy_sell_instr
  def account_spec = "for" ~> "trading" ~> "account" ~> stringLit
  def buy_sell = ("buy" | "sell")
  def buy_sell_instr = security_spec ~ price_spec
  def security_spec = numericLit ~ ident ~ "shares"
  def price_spec = "at" ~ ("min" | "max") ~ numericLit
}

If you can’t get into the details of the above snippet, that’s totally OK. I threw in this sample implementation only to demonstrate the power of declarative parser development within the confines of a host language. Using this technique, you’ll be able to develop your external DSL parser fully in Scala.

 

I’ll revisit the topic of parser combinators in chapter 8, which contains a comprehensive external DSL that’s built using Scala parser combinators.

 

We’ve come to the end of this road. We’ve covered all the DSL implementation patterns and techniques that were listed earlier in the chapter. These descriptions were sort of thumbnails aimed at providing the bigger picture in the context of real-world DSL implementation. In chapters 4 through 8, you’ll see in detail how to use each of these in various forms when we take up real-world domain problems and implement DSLs for each of them.

Before we end the chapter, we need to take a realistic view of a topic you’ll find useful every time you step out to design a DSL: how to decide on the pragmatics of which form of DSL to use in your application. In chapter 1, we discussed when to use a DSL. Now I’m going to explain how to choose between internal and external DSLs when you’re designing an application. DSLs make perfect sense when they’re used to model specific problems of the domain. But you need to do a balancing act in choosing the engineering aspects of it. Whether you choose to design an internal DSL or an external one can depend on a lot of factors; not all of them will necessarily be driven by technology choices.

2.4. Choosing DSL implementations

As programmers, we’re always faced with many options, be it in design methodology, programming paradigms, or using idioms in specific implementations. We’ve been talking about designing DSLs, the virtues of well-designed abstractions, and a multitude of options to make your language expressive enough to your user community. Now we have to talk about some other options that you’ll face.

Suppose you’ve decided to adopt a DSL-based development approach for your project and you’ve already identified a couple of business domain components that make good candidates for expressive DSL design. How do you decide on your DSL implementation strategy? Do you want to use the host language and model your problem as an internal DSL? Or would you prefer to design an external DSL to get to the level of expressivity that your users need? As with most problems in software engineering, there’s no universal choice. It all depends on the set of constraints that the problem domain presents and the set of options your solution domain offers. In this section, let’s review some of the factors you need to consider before jumping in to decide on the DSL implementation technique.

Reusing existing infrastructure

Internal DSLs piggyback on the host language infrastructure, syntax, semantics, module system, type system, method of error reporting, and the complete tool chain that it integrates with. This piggybacking is possibly the most definitive advantage of implementing internal DSLs. For external DSLs, you need to build all these from the ground up, which is never an easy proposition. Even within internal DSLs, you have lots of implementation patterns to choose from, as we saw in the last section. Your choice here will mostly depend on the capabilities of your host language and the level of abstraction that it supports.

If you use a language like Scala or Haskell that offers rich type systems, you can decide to use them to encode your domain types and have a purely embedded DSL. But embedding might not always be the most appropriate option available. The language that you’re trying to embed needs to have concrete syntax and semantics similar to that of the host language for embedding to work. A mismatch in either will make your DSL look foreign to the ecosystem of the host language and will never compose with its native control structures. In such cases, you might want to resort to metaprogramming techniques, if they’re offered by your host language. As I discussed earlier, metaprogramming lets you extend the base language with your own domain constructs and can often lead to the design of more expressive surface syntax for your DSL compared to the embedded variant.

Leveraging existing knowledge

There are situations when your decision to use an implementation paradigm is driven by the available knowledge base of your team members. Internal DSLs are more likely to score on this point. The important point to consider is that being familiar with the language doesn’t imply that the programmers are aware of the DSL-friendly idioms that it offers. Fluent interfaces are commonly used in Java and Ruby, but they have their pitfalls too. And there are situations when you need to consider aspects like mutability of abstractions, context sensitivity of the fluent API, and the finishing problem of finalizing the chain (see [4] in section 2.6) to make your DSL semantically consistent. All these things involve subtle idiomatic usage of the language, which contributes to the consistency of your DSL.

Leveraging existing knowledge is certainly an important consideration. As the leader of the team, judge the expertise of your team members, based on the context of DSL implementation, not on their familiarity of the surface syntax of the language. I’ve seen instances when a team decided to use XML as the external DSL and gained a lot in productivity and user acceptance instead of trying to shoehorn internal DSLs into Java.

Learning curve with external DSLs

Maybe you’re afraid to choose external DSLs because you think that designing them is just as complex as designing a general-purpose programming language. If that’s what you’re thinking, I don’t blame you. Just having to deal with terms like syntax-directed translation, recursive descent parsers, LALR and SLR seems to remind you of how complex the whole thing can be.

In reality, most of the external DSLs required in application development don’t need to be as complicated as a full-blown programming language. Then again, some external DSLs will be complex, and there is an associated learning curve as part of the cost of development. The advantage is that you can customize almost everything, including how you handle errors and exceptions, instead of being confined within the constraints of an underlying host language.

The right level of expressivity

Although internal DSLs score a lot of points by reusing existing infrastructure, it’s also true that the constraints that the base language forces on you can make it difficult to achieve the right level of expressivity for your domain users. More often than not, modules are identified as candidates for DSL long after the development environment and the tool chain have been finalized. It’s not always possible to switch to an alternate language that might have been a better candidate for the DSL design.

When this happens, you need to consider external DSLs as part of your application infrastructure. The main advantage of modeling a problem with an external DSL is that you can design it precisely to the level of sophistication that you need for the problem at hand. It also gives you ample room for tweaking, based on user feedback. This isn’t always possible with internal DSLs, because you have to abide by the basic constraints of syntax and semantics that the base language enforces.

Composability

In a typical application development scenario, you need to compose DSLs with each other and also with the host language. Composing internal DSLs with the host language is easy. After all, the DSL uses the same language and is mostly implemented as a library that embeds into your host language.

But let’s talk a bit about combining multiple DSLs, even when they’re implemented using the same host language. If you’re using statically typed languages for implementation and you’ve designed embedded DSLs, you need the support of the host language’s type system to ensure seamless composability between them. Languages that support functional programming paradigms encourage you to design internal DSLs based on functional combinators. The internal DSL and the combinators can be completely composable, if they’re designed properly. External DSLs are harder to design in this manner, particularly if they were designed separately and without considering composability as an upfront criterion.

2.5. Summary

From the rationale of DSLs in chapter 1 to the real-world pragmatics of DSL use, implementation, and classification, you’ve come a long way in a short time. If chapter 1 gave you a precursor to DSL-based development paradigms, this chapter has exposed you to the pragmatics of real-world usage.

I started the chapter with an example to emphasize how DSL-based program development focuses on making abstractions more expressive. A Java-based implementation of the order-processing DSL was expressive enough for the programmer as a user. But when we speak of DSLs as being an effective vehicle for non-programming domain experts, you need to have an implementation language that helps you express more in the language of the domain. The Groovy implementation did precisely that; the level of expressiveness increased considerably when we moved from Java to Groovy.

In the next section, we changed gears from the specifics of an implementation to the broader topic of DSL patterns. You learned about the patterns in DSL implementations that exist even within the broad classification of internal and external DSLs. DSLs can be of varying complexity. As a DSL designer, you need to decide on the strategy of implementation that best suits the problem at hand. We discussed all of those patterns in section 2.3 to give you an overall idea of the implementation techniques.

In this chapter, you’ve seen how DSLs vary in form and structure. It’s the architect’s responsibility to give it the final shape that models the domain. Before we talk about that, we need to discuss how you integrate DSLs with your development environment.

 

Key takeaways & best practices

  • Java offers features that make your DSL expressive enough. If you feel limited with Java, target other languages on the JVM that have good interoperability with Java.
  • When you’re using a specific language for DSL implementation, keep an eye on the patterns that it offers to make your DSL idiomatic. If you’re using Groovy or Ruby, metaprogramming is your friend. With Scala, the rich type system can form the backbone of your DSL implementation.
  • Keep an open mind when you’re selecting the type of DSL you’re going to design. External DSLs might seem difficult, but most likely you won’t require the sophistication needed to build a full-blown language from the ground up.

 

So far, you’ve seen the macromodel of DSLs in action. Now it’s time to think in terms of the micromodeling artifacts of DSL-based development. If you’re on the JVM and your core application is in Java, how do you integrate your Groovy DSL so that it can talk to the Java components and still maintain its own identity as a separately evolving entity? There are quite a few options you can adopt, and a few pitfalls to avoid. But that’s for the next chapter.

In all the following chapters, you’ll implement DSL snippets from the securities trading domain, and Bob will always be there with us as the all-seeing eye to help us make our DSLs more expressive.

2.6. References

  1. Gamma, E., R. Helm, R. Johnson, and J. Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional.
  2. Hudak P. 1998. Modular Domain-Specific Languages and Tools, Proceedings of the 5th International Conference on Software Reuse.
  3. Hofer, Christian, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors. 2008. Polymorphic Embedding of DSLs, Proceedings of the 7th International Conference on Generative Programming and Component Engineering, pp 137-148.
  4. Ford, Neal, Advanced DSLs in Ruby, http://github.com/nealford/presentations/tree/master.