In this chapter you’ll...
- Be introduced to declarative programming
- Learn the architecture of a typical rule-based system
- See a method for developing rule-based systems
- Read about industry standards for rules and rule engine APIs
There’s an old saying that “when all you’ve got is a hammer, everything looks like a nail.” In computer programming, too, it’s important to choose the right tool for the right job. Some problems can be solved easily using traditional programming techniques, whereas writing a rule-based system is the easiest way to solve others. Other problems are in the middle, and either technique will work equally well. This chapter will help you understand which problems are well suited to being solved with a rule-based system, and what this type of software is all about.
Imagine your first day on the job at Acme Advanced Robotics Corporation. Acme has built some great humanoid robot prototypes, but the company is having trouble teaching its robots to survive in the real world. You’ve been brought in to write some high-level robot control software so that Acme can begin selling robot butlers to the masses. You decide to start with something simple: breakfast.
How do you teach a robot to prepare a bowl of breakfast cereal? Assume for the moment that the robot knows how to recognize, pick up, and interact with objects, and that the robot will operate in a closed, controlled, optimal environment. It is be a straightforward program: Tell the robot to get a bowl, a spoon, and a napkin, and set them on the table. Then, the robot should open a box of cereal, fill the bowl, and add milk. If you were writing a computer program for a sophisticated cereal-serving robot, it might look something like this:
START putOnTable(Bowl) putOnTable(Spoon) putOnTable(Napkin) open(Cereal) pour(Cereal) open(Milk) pour(Milk) eat(Cereal, Spoon) END
The cereal program, with its predictable linear control flow, is typical of many small computer programs. Calculating the value of an equation, computing a customer’s bill at a shopping mall, even rendering a single frame of a complex video game, are all variations on the same sort of linear, deterministic process, in which each step follows inevitably from the last. Such programs are often called procedural programs—they solve a problem in a straightforward, predictable way. Traditional software development techniques can be used to write programs such as the cereal program to good effect. You wouldn’t write a rule-based program to solve the cereal problem. In general, problems that have a well-known algorithmic solution should be solved using a traditional procedural approach.
Emboldened by your success, you now think about teaching the same robot how to drive a car in the real world. You have to tell the robot to search for its keys and driver’s license—but only if it isn’t already carrying them. You tell the robot how to start the car, put the car in reverse, and back out of the garage—but only if the garage door is already open! The instructions probably need to cover the different behaviors of cold motors and warm motors, how to use both manual and automatic transmissions, and different types of emergency brakes. And the instructions will become far more complex once the car begins to move.
This time, the instructions are filled with many context-sensitive decisions—in fact, there are more decisions than actions. The control flow includes many loops and branches. It would be next to impossible to write a single list of instructions covering every situation that might possibly arise, considering that circumstances interact to constantly change the correct outcomes of each decision. For example, seeing a child’s ball roll into the road is not a serious situation when the car isn’t moving, but it requires prompt and decisive action when the car is driving toward the bouncing ball.
If it’s impossible to write a procedural program to control your robot, how can you do it? You can use declarative programming instead.
Much of the programming we do is procedural. Rule-based programming, however, is declarative.
In procedural programming, the programmer tells the computer what to do, how to do it, and in what order. Procedural programming is well suited for problems in which the inputs are well specified and for which a known set of steps can be carried out to solve the problem. Mathematical computations, for example, are best written procedurally.
Note that I’m using procedural in a slightly different way than is conventional. Although object-oriented programming is traditionally contrasted with the older procedural programming, for the purposes of this discussion, the two are equivalent. In both procedural and object-oriented programming, the programmer writes all the logic that controls the computer, although it is partitioned differently in an object-oriented program.
A purely declarative program, in contrast, describes what the computer should do, but omits much of the instructions on how to do it. Declarative programs must be executed by some kind of runtime system that understands how to fill in the blanks and use the declarative information to solve problems. Because declarative programs include only the important details of a solution, they can be easier to understand than procedural programs. Furthermore, because the control flow is chosen by the runtime system, a declarative program can be more flexible in the face of fragmentary or poorly conditioned inputs (when you removed some of the information from Mrs. Rosencrantz’s problem in chapter 1, the same program, unchanged, was still able to find the possible solutions). Declarative programming is often the natural way to tackle problems involving control, diagnosis, prediction, classification, pattern recognition, or situational awareness—in short, many problems without clear algorithmic solutions. Programming a cooking, driving robot declaratively will be a breeze!
Although the driving program would be hard to write in a procedural style, it is an ideal candidate to be written as a declarative program. A rule-based program doesn’t consist of one long sequence of instructions; instead, it is made up of discrete rules, each of which applies to some subset of the problem. A few rules plucked from the robot’s driving program might look like these:
IF the engine has stalled THEN start car END IF you hear sirens AND you are driving THEN pull over to curb END IF you see brake lights AND you are getting closer to them THEN depress the brake pedal END
In a rule-based program, you write only the individual rules. Another program, the rule engine, determines which rules apply at any given time and executes them as appropriate. As a result, a rule-based version of a complex program can be shorter and easier to understand than a procedural version. Writing the program is simpler, because you can concentrate on the rules for one situation at a time. Modifying the program is also simpler—if you’ve ever had to work on a program containing a dozen levels of nested if statements, you’ll understand why. In the rest of this chapter, we’ll formalize some of the ideas behind rule-based systems and see how they are constructed.
A rule is a kind of instruction or command that applies in certain situations. “No chewing gum in school,” “no running with scissors,” and other rules of that ilk, are some of the first explicit rules we learn. “Where there is smoke, there’s fire” and Murphy’s Law (“Whatever can go wrong, will go wrong”) are others that we learn throughout our lives. Using this very general definition, you might conclude that all the knowledge you have about the world can be encoded as rules. Experience shows that this is often (but not always) the case. In general, any information you can think about in logical terms can be expressed as rules.
1 One reviewer pointed out that this popular proverb is properly called Finagle’s Law, and that the original formulation of Murpny’s Law was, “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it.” I chose not to go against popular usage here, but the pedant in me appreciated this fact enough to add a footnote. For more information, see The Jargon File—for instance, http://info.astrian.net/jargon/terms/f/Finagle_s_Law.html.
Rules are a lot like the if-then statements of traditional programming languages. You might write a gum-chewing rule like this, in an English-like pseudocode:
IF I am in school AND I am chewing gum THEN spit out the gum END
The if part of a rule written in this form is often called its left-hand side (often abbreviated LHS), predicate, or premises; and the then part is the right hand side (RHS), actions, or conclusions.
The domain of a rule is the set of all information the rule could possibly work with. In this hypothetical case, the domain of the chewing rule is a set of facts about the location and oral fixations of one particular person.
A rule-based system is a system that uses rules to derive conclusions from premises: Given the gum-chewing rule and the premise that you are in school, you (as an advanced kind of rule-based system yourself) might conclude that it’s time to spit out your gum. In this book, the systems we’re talking about are a specific category of computer programs. These programs are sometimes called rule engines. A rule engine doesn’t contain any rules until they are programmed in. A rule engine knows how to follow rules, without containing any specific knowledge itself.
A rule engine is generally part of a rule development and deployment environment. The features offered by these environments vary widely, depending on the intended applications for the rule engine and on the type of programmer intended to develop the systems. This book will show you how to develop and deploy rule-based systems in general. To do so, it will use the Jess rule engine in all its examples.
Expert systems, rule-based computer programs that capture the knowledge of human experts in their own fields of expertise, were a success story for artificial intelligence research in the 1970s and 1980s. Early, successful expert systems were built around rules (sometimes called heuristics) for medical diagnosis, engineering, chemistry, and computer sales. One of the early expert system successes was MYCIN, a program for diagnosing bacterial infections of the blood. Expert systems had a number of perceived advantages over human experts. For instance, unlike people, they could perform at peak efficiency, 24 hours a day, forever. There are numerous dramatic examples in the computer science literature of these early systems matching or exceeding the performance of their human counterparts in specific, limited situations. Predictions were made that someday, sophisticated expert systems would be able to reproduce general human intelligence and problem-solving abilities.
2 R. Davis, B. G. Buchanan, and E. H. Shortliffe, “Production Systems as a Representation for a Knowledge-Based Consultation Program,” Artifical Intelligence 8 (1977): 15–45.
Over time, of course, the drama receded, and it became clear that researchers had vastly underestimated the complexity of the common-sense knowledge that underpins general human reasoning. Nevertheless, excellent applications for expert systems remain to this day. Modern expert systems advise salespeople, scientists, medical technicians, engineers, and financiers, among others.
Today, general rule-based systems, both those intended to replace human expertise and those intended to automate or codify business practices or other activities, are a part of virtually every enterprise. These systems are routinely used to order supplies, monitor industrial processes, prescreen résumés, route telephone calls, and process web forms. Many commercial application servers incorporate a rule engine, and most others explicitly or implicitly offer integration with one. Expert systems really have become ubiquitous—we just don’t call them by that name anymore.
The rules in the first expert systems were intertwined with the rest of the software, so that developing a new expert system meant starting from the ground up. The folks who wrote MYCIN, recognizing this fact, created a development tool named EMYCIN. EMYCIN (Empty MYCIN) was developed by removing all the medical knowledge from MYCIN, leaving behind only a generic framework for rule-based systems. EMYCIN was the first expert system shell. An expert system shell is just the inference engine and other functional parts of an expert system with all the domain-specific knowledge removed. Most modern rule engines can be seen as more or less specialized expert system shells, with features to support operation in specific environments or programming in specific domains. This book is about this kind of rule engine.
3 W. Van Melle, “A Domain-Independent Production Rule System for Consultation Programs,” International Joint Conference on Artificial Intelligence (1979): 923–925.
A typical rule engine contains:
- An inference engine
- A rule base
- A working memory
The inference engine, in turn, consists of:
- A pattern matcher
- An agenda
- An execution engine
These components are shown schematically in figure 2.1.
Figure 2.1. The architecture of a typical rule-based system. The pattern-matcher applies the rules in the rule-base to the facts in working memory to construct the agenda. The execution engine fires the rules from the agenda, which changes the contents of working memory and restarts the cycle.
If you wanted to write your own rule engine, where would you start? You might begin with the most important component. The primary business of a rule engine is to apply rules to data. That makes the inference engine the central part of a rule engine.
The inference engine controls the whole process of applying the rules to the working memory to obtain the outputs of the system. Usually an inference engine works in discrete cycles that go something like this:
- All the rules are compared to working memory (using the pattern matcher) to decide which ones should be activated during this cycle. This unordered list of activated rules, together with any other rules activated in previous cycles, is called the conflict set.
- The conflict set is ordered to form the agenda—the list of rules whose right-hand sides will be executed, or fired. The process of ordering the agenda is called conflict resolution. The conflict resolution strategy for a given rule engine will depend on many factors, only some of which will be under the programmer’s control.
- To complete the cycle, the first rule on the agenda is fired (possibly changing the working memory) and the entire process is repeated. This repetition implies a large amount of redundant work, but many rule engines use sophisticated techniques to avoid most or all of the redundancy. In particular, results from the pattern matcher and from the agenda’s conflict resolver can be preserved across cycles, so that only the essential, new work needs to be done.
Many beginning rule programmers have difficulty with the idea that the rule engine will decide the order in which the rules will be fired, but this is actually one of the great strengths of rule-based programming. The rule engine can more or less create a custom program for each situation that arises, smoothly handling combinations of inputs the programmer might not have imagined.
Your rule engine will obviously need somewhere to store rules. The rule base contains all the rules the system knows. They may simply be stored as strings of text, but most often a rule compiler processes them into some form that the inference engine can work with more efficiently. For an email filter, the rule compiler might produce tables of patterns to search for and folders to file messages in. Jess’s rule compiler builds a complex, indexed data structure called a Rete network. A Rete network is a data structure that makes rule processing fast. Chapter 8 describes how Jess’s rule compiler works.
In addition, the rule compiler may add to or rearrange the premises or conclusions of a rule, either to make it more efficient or to clarify its meaning for automatic execution. Depending on the particular rule engine, these changes may be invisible to the programmer.
Some rule engines allow (or require) you to store the rule base in an external relational database, and others have an integrated rule base. Storing rules in a relational database allows you to select rules to be included in a system based on criteria like date, time, and user access rights.
You also need to store the data your rule engine will operate on. In a typical rule engine, the working memory, sometimes called the fact base, contains all the pieces of information the rule-based system is working with. The working memory can hold both the premises and the conclusions of the rules. Typically, the rule engine maintains one or more indexes, similar to those used in relational databases, to make searching the working memory a very fast operation.
It’s up to the designer of the rule engine to decide what kinds of things can be stored in working memory. Some working memories can hold only objects of a specific type, and others can include, for example, Java objects.
Your inference engine has to decide what rules to fire, and when. The purpose of the pattern matcher is to decide which rules apply, given the current contents of the working memory. In general, this is a hard problem. If the working memory contains thousands of facts, and each rule has two or three premises, the pattern matcher might need to search through millions of combinations of facts to find those combinations that satisfy rules. Fortunately, a lot of research has been done in this area, and very efficient ways of approaching the problem have been found. Still, for most rule-based programs, pattern matching is the most expensive part of the process. Beginning rule programmers often overlook this fact, expecting the procedural right-hand sides of their rules to represent all the computational effort in their program. The solution to Mrs. Rosencrantz’s problem involved lots of pattern matching and no procedural code at all (except to print a report at the end). Often the pattern-matching technique used by a particular rule engine will affect the kinds of rules you write for that engine, either by limiting the possibilities or by encouraging you to write rules that would be particularly efficient.
Once your inference engine figures out which rules should be fired, it still must decide which rule to fire first. The list of rules that could potentially fire is stored on the agenda. The agenda is responsible for using the conflict strategy to decide which of the rules, out of all those that apply, have the highest priority and should be fired first. Again, this is potentially a hard problem, and each rule engine has its own approach. Commonly, the conflict strategy might take into account the specificity or complexity of each rule and the relative age of the premises in the working memory. Rules may also have specific priorities attached to them, so that certain rules are more important and always fire first.
As an example, the driving robot’s control program might have two rules like these:
IF the light is green THEN go END IF a person is in front of you THEN stop END
If the robot is stopped for a red light, and the light turns green when someone is still in the crosswalk, then both rules will apply. It is important that the second rule fire before the first, or the future of driving robots will be in serious peril. This second rule should therefore be given a very high priority.
Finally, once your rule engine decides what rule to fire, it has to execute that rule’s action part. The execution engine is the component of a rule engine that fires the rules. In a classical production system such as MYCIN, rules could do nothing but add, remove, and modify facts in the working memory. In modern rule engines, firing a rule can have a wide range of effects. Some modern rule engines (like Jess) offer a complete programming language you can use to define what happens when a given rule fires. The execution engine then represents the environment in which this programming language executes. For some systems, the execution engine is a language interpreter; for others, it is a dispatcher that invokes compiled code.
This book is a hands-on guide to building useful rule-based systems. Each individual project in this book covers some aspect of this task, presenting realistic examples of every step along the way. In this section, we look at an overview of the development process we will follow in later chapters.
The first step in the development of any rule-based system is to begin collecting the knowledge from which the rules will be derived. People who do this for a living are called knowledge engineers. Knowledge engineering can be tricky, particularly if the knowledge has to come from human experts. Experts aren’t always cooperative, and even if they are, they don’t always know how to explain the procedures they follow. On the other hand, many experts respond well to interviews, and you can ask questions to fill in gaps in the expert’s explanations.
If you are developing a rule-based system that is strictly based on a procedures manual or other document, or if a human expert is not available, then the knowledge may be collected directly from written sources. Collecting knowledge from books and other reference material has its own advantages and disadvantages. Although books are generally more organized than human experts, they can be lacking in the kind of practical rules of thumb (or heuristics) that a practitioner can supply. On the other hand, you rarely have scheduling and other logistical problems when attempting to read a book, but these can be annoying obstacles when working with a human expert.
Another important aspect of knowledge engineering is organizing and structuring knowledge. A typical rule-based system contains hundreds or thousands of rules. Organizing the collected knowledge so that translation to rules will be straightforward is a challenging task for the knowledge engineer.
We’ll discuss the knowledge engineering process in greater detail in chapter 9.
When all the knowledge has been collected, the task of programming the system begins. The best first step is to examine the knowledge and design data structures that will make it easy to implement the rules clearly and directly. This process resembles object-oriented analysis. First, the major concepts are identified. For an employee benefits consultant, these might include employee, health plan, claim, time, and money. The important thing at this stage is to identify all the concepts referred to in the collected knowledge—the irrelevant ones can be removed later.
Then, you list all the variable characteristics of each concept: Employees have a name, a health plan, years of service, and a salary, among other things. Again, at this stage you try to identify all the characteristics mentioned in the collected knowledge. The pants-color and position templates in chapter 1 were simple examples of data structures for working memory elements. Designing data structures for rule-based systems is discussed in chapter 10.
You may wonder why I’m mentioning testing now, when you haven’t written any code yet. Actually, this is the perfect time to begin testing a rule-based system: at the beginning. If rigorous tests are applied to the system at every stage of its development, it will naturally be more robust, more modular, and better understood than a system that wasn’t tested until the end. Therefore, before writing a group of rules, you should develop an automated test to exercise them. You can write tests in Java, in your rule language, or in a convenient scripting language. You should run all the tests you have written quite often, ideally each time a change is made to the system. When the final system is delivered, the tests can be part of the deliverable—they will be a great help to anyone who needs to modify the system in the future.
How do you develop tests? Some tests will be very small and check intermediate results, whereas others will be fully worked problems. In the former case, you might develop the test by yourself. The larger tests, though, should be based when possible on actual case studies of how problems were solved in the past.
It is important that the tests be automated, so no human checking of results is required; otherwise the tests will require too much effort to run and will not be used. It helps to have an automated test framework—you can often quickly develop one yourself using Perl, shell scripts, or similar scripting facilities. This testing technique, known as test-driven development, is one facet of eXtreme Programming, a methodology that is rapidly gaining acceptance in many computer programming fields. An automated test framework that I use for testing Jess programs is described in appendix C.
4 K. Beck, Extreme Programming Explained: Embrace Change (Reading, Mass.: Addison-Wesley), 2000.
For most rule-based systems to do any useful work, they need to be connected in some way to their environment. Sometimes this means database access; other times it means directly reading values from sensors and sending commands to embedded hardware. Before you begin to code your rules, you try to develop a picture of what your system will need to realize these connections. Depending on your development environment, your rules may already have a built-in ability to connect to all the data sources and sinks they’ll need to reach, directly from the rule language. In other situations, you may need to write interface code in another language. If you do, I hope you’ll use test-first programming to develop it. We’ll look at interface building many times throughout this book.
Once the data structures are defined, the interfaces are specified, and the tests are in place, it’s time to begin writing the rules. As in all programming, this process involves a significant amount of art; there are always multiple ways to accomplish a task. The good news is that because each rule can be independent of the others, rule-based programs can be developed iteratively: code a little, test a little, and then code some more. The bad news is that it’s relatively easy to write unstructured rule-based programs, which can become hard to understand.
You can give structure to your rule-based programs by thinking in terms of phases or modules, groups of rules that are the only ones relevant at specific phases of the execution of your system. Most rule development languages offer explicit support for this kind of modularity, and it’s a good idea to use it whenever possible. The driving robot’s rules might be divided into separate modules devoted to starting the car, parking the car, city driving, highway driving, passing other cars, and so on. By breaking rules into small groups, you can make a rule-based program easier to write and to understand. We’ll first study writing rules for a real application (an information kiosk) in chapter 11.
Once you’ve developed some rules, you’ll often find that you don’t have all the information you need to write more. When this happens, you’ll need to go back to the source and do some more knowledge engineering. The development of a rule-based system lends itself well to this sort of iterative procedure. You can show the early incarnations of the system to the human experts, if they exist, and ask them for corroboration of the results. You might have to change your tests, if the experts disagree with what they are testing.
It’s also worthwhile to have another knowledge engineer look over your work at this point. Code reviews are amazingly effective at finding problems with software before a release, and they work for rule-based software as well. Whether you hold formal code reviews or just ask a friend for advice, a second pair of eyes can really help to increase the quality of your work.
Various commercial off-the-shelf products (other than application servers) can be designed to work together with rule engines. Historically, there has been a certain amount of vendor lock-in, because each rule engine has its own programmer’s interface. The Java Rule Engine API (http://www.jcp.org/jsr/detail/94.jsp), defined by the javax.rules package, is a standard enterprise API for accessing rule engines, currently being developed by a consortium of rule engine vendors under the Java Community Process. The javax.rules package will allow the host program to interact generically with multiple rule engines, the same way the Java Database Connectivity (JDBC) API makes it possible to write vendor-neutral database programs. The API will include mechanisms for creating and managing sets of rules; for adding, removing, and modifying objects in working memory; and for initializing, resetting, and running the engine. Soon (perhaps even by the time you read this), most popular rule engines, including Jess, will be accessible using the javax.rules API. In fact, the official reference implementation of the javax.rules API is currently slated to be a driver for Jess.
The javax.rules API will not specify a standard rule language, however. Other groups are working on developing standardized rule languages, although less consensus exists in this area. For the same reason there is no one standard general programming language, it is likely that vendor-specific rule languages will be with us for a long time. Each rule language has its own strengths and weaknesses, and the expressiveness, elegance, and power of a rule language can be a major factor in choosing an engine.
A rule-based system is a computer program that uses rules to reach conclusions from a set of premises. Its historical roots include production systems and expert systems, but nowadays their broad range of applications includes everything from real-time control of embedded systems to enterprise resource planning for multinational corporations.
Rule based systems are not procedural, but declarative programs. They require a different approach to programming in which a runtime system is used to make scheduling and control-flow decisions. Modern rule-based systems often include hybrid procedural/declarative languages, broadening their applicability.
A wide range of commercial rule development and deployment environments is available, but all have an essential architecture in common. Efforts are underway to standardize rule engine APIs and rule programming languages.
These first two chapters provided an introduction to the fundamental concepts of rule-based programs. This is really a practitioner’s book, however, so we want to begin writing new rule-based programs as soon as possible. You will learn the Jess programming language in the next part of this book. You’ll start by learning about the Jess software itself in chapter 3 and proceed from there.