CHAPTER 13: Those Pesky Usability Tests – User Interface Design for Programmers

CHAPTER
13
Those Pesky
Usability Tests

Many software companies have usability testing labs. Here's the theory behind a usability lab. (To those of you who have done tests in a usability lab before, I must ask that you please try to refrain from snickering and outright guffaws until I get to the end of the theory, please. I'll get to the reality soon enough.)

A Story of Silicon Jungle

One fine day, Eeny the Elephant is lumbering through the jungle when he hits upon a great idea. "Everybody tried B2C and that didn't work," thinks Eeny. "Then they tried B2B, and all those companies are in the dumpster, too! The answer is obvious: B2A!"

Eeny quickly raises $1.5 million in seed financing from a friendly group of largish birds who promise a mezzanine round of "five at twenty with preferred warrants" and starts his company. After hiring a couple dozen executives with experience at places like failed dehydrated-hamburger chains and mustache-waxing drive-ins, he finally gets aroundto hiring a Chief Technology Orangutan (CTO), who, at least, is smart enough to realizethat "That Really Cool but Top Secret B2A Company" (as it's now known) is going to needsome UI designers, software architects, Directors of Technical Research, usability testingengineers, and, "oh, maybe one or two actual programmers? If we haveany stock options left, that is."

So, the architects and UI designers get together and design the software. They make nice storyboards, detailed functional and technical specifications, and a schedule with elegant Gantt and PERT charts that fills an entire wall. Several million dollars later, the programmers have actually built the thing, and it looks like it's actually working. It's on time and under budget, too! In theory. But this story is the theory, not the reality, remember?

"That Really Cool, etc." (as it's now known) has hired usability engineers who have spent the last six months building a state-of-the-art usability testing lab. The lab has two rooms: one for observers, with a one-way mirror that allows them to spy on the other room where the "participants" sit. (The usability testers have been warned not to call users "users," because users don't like being called users. It makes them feel like drug addicts.) The participants sit down at a computer with several video cameras recording their every move while they attempt to use The Product.

For a few weeks, a steady stream of mysterious visitors representing all walks of life and most of the important phyla come to the "That Really, etc." campus to take partin the usability tests. As the "participants" try out the software, the usability testers take detailed notes on their official-looking clipboards. The test lasts a couple of weeks while the programmers take a well-deserved, all-expenses-paid rest in Bali to "recharge the ol' batteries" and maybe get tans for the first time intheir young, dorky lives.

After about three weeks of this, the Chief Tester of Usability (CTU) emerges from the lab. A hush comes over the company cafeteria (free gourmet lunches). All eyes are on the CTU, who announces, "The results of the usability test will be announced next Tuesday." Then she retreats back into her lab. There is an excited hubbub. What will the results be? Eeny can hardly wait to find out.

The next Tuesday, the entire staff of "That, etc." (as it's now known) have gathered in the company cafeteria to hear the all-important usability results. The programmersare back from the beach, freshly scrubbed and sunburned, wearing their cleanest Star Trek-convention T-shirts. Management arrives dressed identically in Gap pleated khakis. Themarketing team hasn't been hired yet. (Don't argue with me, it's my story, and in my story we don't hire marketing until we have a product).

The tension is palpable. When the CTU comes into the room, the excitement is incredible. After a tense moment fumbling with Power Point and trying to get the LCD projector to work (surprise! it doesn't work the first time), the results of the usability test are finally presented.

"We have discovered," says the CTU, "that 73% of the participants were able to accomplish the main tasks of the product." A cheer goes up. Sounds pretty good! "However, we've discovered that 23.3% of the users had difficulty or were completely unable to check their spelling and make much-needed corrections. The usability team recommends improving the usability of the spell checker." There are a few other problems, too, and the designers and programmers take detailed notes in their identical black notebooks.

The Chief Code Compiling and Programming Officer (C3PO) stands up. "Well, looks like we've got our work cut out for us, boys and girls!" The programming team, looking earnest and serene, files out of the cafeteria to get back to their dual Pentium workstations and fix those usability problems!

Well!

A Bitter Descent into Reality

"In theory there is no difference between theory and practice. In practice there is," as Yogi Berra probably didn't say. Unless you've been working at a big software company, you may have never seen an actual usability lab. The reality of usability testing is really quite different from the theory.

You Don't Need to Test with a Lot of Users

In Chemistry Lab back in high school, the more times you repeated your experiment, the more precise the results were. So, your intuition would probably tell you that the more people you bring in for usability tests, the better.

As it turns out, with a usability test, you don't really care about statistics. The purpose of a usability test is simply to find the flaws in your design. Interestingly, in real life, if you have major usability problems, it only takes about five or six people before you find them. Usability testers have long since discovered that the marginal number of usability problems that you find drops off sharply after the sixth tester and is virtually zero by the twelfth user. This is not science here; it's digging for truffles. Take about 3 or 4 pigs out to the forest, let them sniff around and you'll find mostof the truffles. Bringing out 1024 pigs is not going to find any more truffles.

You Usually Can't Test the Real Thing

It's a common sport among usability pundits to make fun of software teams that don't leave enough time in the schedule to do usability tests, change things in response, and retest. "Build one to throw away!" say the pundits.

Pundits, however, don't work in the real world. In the real world, software development costs money, and schedules are based on real world problems (like trying to be first to market, or trying to complete a fixed-budget project on time before it becomes a money-losing proposition). Nobody has time to throw one away, OK? When the product is done, we have to ship it ASAP. I've never seen a project where it is realistic to do a usability test on the final product and then open up the code again to fix problems.

Given the constraints of reality, it seems like you have three choices:

  1. You can test the code long before it's complete. It may crash too often, and it's unlikely to reflect even the current understanding of what the final product is going to look like, so the quality of the usability results may be limited.
  2. You can test a prototype. But then you have to build the prototype, which is almost never easier than building the final product. (Sometimes you can build a prototype faster using a rapid development tool like Visual Basic, while your final productis in C++. Let me clue you in—if you can build working prototypes faster than you can build the real code, you're using the wrong tools.)
  3. You can test the code when it's done, then ignore the results of the test because you have to rush the code out to market.

    None of these approaches is very satisfactory. I think that the best times to do usability tests are as follows:

  4. Do hallway usability tests, also known as "fifty-cent usability tests," when you first design a new feature. The basic idea is that you just show a simple drawing or screen shot of your proposed design to a few innocent bystanders (secretaries and accountants in your company make good victims), and ask them how they would use it.
  5. Do full-blown usability tests after you ship a version of your product. This will help you find a whole slew of usability problems to fix for the next version.

The Joel Uncertainty Principle

The Joel Uncertainly Principle holds that:


You can never accurately measure the
usability of a software product.


When you drag people into a usability lab to watch their behavior, the very act of watching their behavior makes them behave differently. For example, they tend to read instructions much more carefully than they would in real life. And they have performance anxiety. And the computer you're testing them on has a mouse when they're used to a trackball. And they forgot their reading glasses at home. And when you ask them to type in a credit card number, they're reading a fake credit card number off a sheet you gave them, not off a real credit card. And so on and so forth.

Many usability testers have tried to ameliorate this by testing users "in their natural setting," in other words, by following them home with a zoom-lens spy camera and hiding behind a bushy bougainvillea. (Actually, it's more common just to sit behind them at their desk at work and ask them to "go about their usual activities.")

Usability Tests Are Too Rigged

In most usability tests, you prepare a list of instructions for the user. For example, if you were usability testing an Internet access provider, you might have an instruction to "sign up for the service." (I have actually done this very usability test several times in my career.)

So far, so good. The first user comes in, sits down, starts signing up for the service, and gets to the screen asking them how they want to pay. The user looks at you helplessly. "Do I gotta pay for this myself?"

"Oh wait," you interrupt. "Here, use this fake credit card number."

The sign-up procedure then asks if they would like to use a regular modem, a cablemodem, or a DSL line.

"What do I put here?" asks the user. Possibly because they don't know the answer, but possibly because they know the answer for their computer, only they're not using their computer, they're using yours, which they've never seen before, in a usability lab, where they've never been before. So you have no way of knowing whether your UI is good enough for this question). At Juno, we knew that the dialog in Figure 13-1 was likely to be the source of a lot of confusion. People certainly had a lot of trouble with it in the lab, but we weren't quite sure if that was because they didn't understand the dialog or if they just didn't know how the lab computer was set up. We even tried telling them "pretendyou're at home," but that just confused them more.

FIGURE 13-1


The dialog that we couldn't figure out how to usability test.

Five minutes later, the program asks for the user's address, and then it crashes when they put in their zip code because of a bug in the early version of the code that you're testing. You tell the next person who comes in, "when it asks for your zip code, don't type anything in."

"OK, sure boss!" But they forget and type the zip code anyway, because they're so used to filling out address forms onscreen from all the crap they've bought on the Web.

The next time you do the usability test, you're determined to prevent these problems. So you give the user a nice, step-by-step, detailed list of instructions, which you have carefully tested so they will work with the latest development build of the software. Aha! Now, suddenly you're not doing a usability test. You're doing something else. Charades.

Theatre of the Macabre. I don't know what it is, but it's not a usability test because you're just telling people exactly what to do and then watching them do it.

One solution to this problem has been to ask people to bring in their own work to do. With some products (maybe word processors), that's possible, although it's hard to imagine how you could get someone to test your exciting new mailing list feature if they don't need a mailing list. But with many products there are too many reasons why you can't get a realistic usability test going "in the lab."

Usability Tests Are Often Done to Resolve an Argument

More than half of the usability tests I've been involved in over my career have been the result of an argument between two people about the "best" way to do something. Even if the original intent of the usability test was innocent enough, whenever two designers (or a designer and programmer, or a programmer and a pointy-haired manager) get intoa fight about whether the OK button should be on the left or the right of the Cancel button, this dispute is inevitably resolved by saying, "we'll usability test it!"

Sometimes this works. Sometimes it doesn't. It's pretty easy to rig a usability test to show the righteousness of one side or the other. When I was working on the Microsoft Excel team, and I needed to convince the Visual Basic team that object-oriented programming was "just as easy" as procedural programming, I basically set up a usability testin which some participants were asked to write cell.move and other participants were asked to write move(cell). Since the audience for the usability test was programmers anyway, the success rates of the non-object-oriented group and the object-oriented group were—surprise, surprise—indistinguishable. It's great what you can prove when you get to write the test yourself.

In any case, even if a usability test resolves a dispute, it doesn't do it in any kind of a statistically valid way. Unless you test thousands of people from all walks oflife under all kinds of conditions, something that not even Microsoft can afford to do, you are not actually getting statistically meaningful results. Remember, the real strength of usability tests is in finding truffles—finding the broken bits so you can fix them. Actually looking at the results as if they were statistics is just not justified.

Some Usability Test Results I Might Believe:
  • Almost nobody ever tried right-clicking, so virtually nobody found the new spell-checking feature.
  • 100% of the users were able to install a printer the new way; only 25% could install the printer the old way.
  • There were no problems creating a birthday card.
  • Several participants described the animated paper clip as "unhelpful" and "getting in the way."
  • Many people seemed to think that you had to press "Enter" at the end of every line.
  • Most participants had difficulty entering an IP address into the TCP/IP control panel because the automatic tabbing from field to field was unexpected.
Some Usability Test Results I Would Not Believe:
  • When we used brighter colors, 5% more participants were able to complete the tasks. (Statistically insignificant with such a small sample, I'm afraid).
  • Most participants said that they liked the program and would use it themselves if they operated their own steel forge. (Everybody says that in a usability test. They're just being nice, and they want to be invited back to your next usability test.)
  • Most participants read the instructions carefully and were able to assemble the model airplane from Balsa wood right the first time. (They're only reading the instructions because you told them to.)
  • 65% of the people took more than four and a half minutes to complete the task. (Huh? It's those precise numbers again. They make me think that the tester doesn't get the point of usability tests. Truffles! We're looking for truffles!)

Usability Tests Create Urban Legends

My last employer's software was a bit unusual for a Windows program. In addition to the usual File Exit menu item that has been totally standard on all GUI programs since about 1984, this program had an Exit menu at the top level menu bar, visible at all times (see Figure 13-2). When you consider that closing windows is probably the only thing in Microsoft Windows that nobody has trouble with, I was a bit surprised that this was there. Somehow, every other Windows program on the planet manages without a top-level Exit menu.

FIGURE 13-2


Huh? What's that doing there?

Well, Exit menus don't just spontaneously appear. I asked around. It turned out that when the product was first designed, they hadactually done some kind of marketing "focus groups" on the product, and for some reason, the one thing that everybody remembered from the focus group was that there were people who didn't know how to exit a Windows program. Thus, the famous Exit menu. But the urban legend about this focus group lasted far longer than it should have. For years after that, nobody had the guts to take out the Exit menu.

Most software organizations do usability tests pretty rarely, and—worse—theydon't retest the improvements they made in response to the test. One of the risks of this is that some of the problems observed in the test will grow into urban legends repeated through generations of software designers and achieve a stature that is completely disproportional to their importance. If you're a giant corporation with software used by millions of people and you usability test it every few months, you won't have this problem. In fact, if you even bother to retest with the changes you made, you won't have this problem (although nobody ever manages to find time to do this before their product has to ship). Microsoft tested so many doggone versions of the Start button in Windows 95 that it's not even funny, and people would still come into usability labs not realizing that they were supposed to click on it to start things. Finally, the frustrated designers had to insert a big balloon, which basically said, "Click Me, You Moron!" (see Figure 13-3). The balloon doesn't make Windows any more usable, but it does increase the success rate in the usability test.

FIGURE 13-3


If you obsess about getting a 100% success rate on your usability test, you can probably force it, but it hardly seems worth the effort. (Somebody who doesn't even know to click the button isn't likely to understand what's going on when they do.)

A Usability Test Measures Learnability, Not Usability

It takes several weeks to learn how to drive a car. For the first few hours behind the wheel, the average American teenager will swerve around like crazy. They will pitch, weave, lurch, and sway. If the car has a stick shift, they will stall the engine in the middle of busy intersections in a truly terrifying fashion.

If you did a usability test of cars, you would be forced to conclude that they are simply unusable.

This is a crucial distinction. When you sit somebody down in a typical usability test, you're really testing how learnable your interface is, not how usable it is. Learnability is important, but it's not everything. Learnable user interfaces may be extremely cumbersome to experienced users. If you make people walk through a fifteen-step wizard to print, people will be pleased the first time, less pleased the second time, and downright ornery by the fifth time they go through your rigmarole.

Sometimes all you care about is learnability: for example, if you expect to have only occasional users. An information kiosk at a tourist attraction is a good example; almost everybody who uses your interface will use it exactly once, solearnability is much more important than usability. But if you're creating a word processor for professional writers, well, now usability is more important.

And that's why, when you press the brakes on your car, you don't get a little dialog popping up that says, "Stop now? (Yes/No)."

One of the Best Reasons to Have a Usability Test

I'm a programmer. You may think I'm some kind of (sneer) computer book writer or usability "guru," but I'm not. I spend most of my time at work actually writing lines of code. Like most programmers, when I encounter a new program, I'm happy to install it and try it out. I download tons of programs all the time; I try out every menu item and I poke around every nook and cranny, basically playing. If I see a button with a word I don't understand, I punch it. Exploring is how you learn!

A very significant portion of your users are scared of the darn computer. It ate their term paper. It may eat them if they press the wrong button. And although I've always known this intellectually, I've never really felt this fear of the computer.

Until last week. You see, last week I set up the payroll for my new company. I have four people to pay, and the payroll company has set me up with a Web-based interface in which I enter payroll information. This interface has a suctionlike device directly hooked up to vacuum money straight out of my bank account.

Yow.

Now this Web site is scary. There are all kinds of weird buttons that say things like "MISC (99) DEDUCTION." The funny thing is, I even know what a MISC (99) DEDUCTION is—because I called up to ask them—but I have no idea whether the deduction should be in dollars, hours, negative dollars, or what, and the UI doesn't tell me, and it's not in the help file anywhere. (Well, the help file does say "Enter any MISC (99) deductions in the MISC (99) DEDUCTION box," in the grand tradition of help files written by people who don't know any more about the product than what they can figure out by looking at it.)

If this were just a word processor or a bitmap editor, I'd just try it and see what happens. The trouble is, this is a vacuum-cleaner-like device programmed to suck money directly out of my bank account. And due to the extreme incompetence of the engineers who built the site, there is no way to find out what's going on until it's too late: the money has been sucked out of my bank account and direct-deposited into my employees' accounts and I don't even find out what happened until the next day. If I type 1000, thinking it means dollars, and it really meant hours, then I'll get $65,000 sucked outof my account instead of $1000.

So, now I know what it feels like to be one of those ordinary mortals who will notdo something until they understand it fully.

Programmers, on the whole, are born without a lot of sympathy for how much troubleordinary people have using computers. That's just the way of the world. Programmers can keep nineteen things in their short-term memory at once; normal people can keep about five. Programmers are exceedingly rational and logical, to the point of exasperation; normal people are emotional and say things like "my computer hates me." Programmers know how hierarchical file systems work and think they are a neat metaphor; many normal people don't understand how you could have a folder inside a folder. They just don't.

One of the best, if not the only, good reason to have a usability test is because it's a great way to educate programmers about the real world. In fact, the more you can get people from your engineering team involved in the usability tests, the better the results. Even if you throw away the "formal" results of the test. And that's because one of the greatest benefits of a usability test is to hammer some reality into your engineer's noggins about the real world humans who use their product. If you do usability tests, you should require every member of the programming team (including designers and testers) to participate in some way and observe at least some of the participants. Usually this is pretty amusing to watch. The programmers have to sit on their hands behind one-way glass as the user completely fails to figure out the interface they just coded. "Right there, you moron!" the programmer shouts. "The damn CLEAR button, right under your ugly pug nose!" Luckily, the room is soundproof. And the programmer, chastened, has no choice but tocome to grips with reality and make the interface even easier.

Needless to say, if you outsource your usability test to one of those friendly companies that does all the work for you and returns a nice, glossy report in a three-ring binder, you're wasting your money. It's like hiring someone to go to college for you. If you're thinking of doing this, I suggest that you take the money you would have spent on the usability test and mail it directly to me. I accept Visa, MasterCard, and American Express. For $100,000, I'll even send you a three-ring binder that says, "get rid of the Exit item on the main menu bar."