Chapter 2 Writing Good Questions – An Introduction to Survey Research, Volume II, 2nd Edition

CHAPTER 2

Writing Good Questions

More than four decades ago, Warwick and Lininger indicated that

Survey research is marked by an unevenness of development in its various subfields. On the one hand, the science of survey sampling is so advanced that discussion of error often deals with fractions of percentage points. By contrast, the principles of questionnaire design and interviewing are much less precise. Experiments suggest that the potential of error involved in sensitive or vague opinion questions may be twenty or thirty rather than two or three percentage points.1

While both sampling methodology and survey question development have advanced significantly since Warwick and Lininger made that observation, the development of question methodology continues to be crucial because of its relevance to measurement error (which is an important component of the total survey error). Continuing research across different disciplines is expanding our horizons to address subtle problems in survey question construction. For example, a study by Fisher2 examined the effect of survey question wording in a sensitive topic area involving estimates of completed and attempted rape and verbal threats of rape. The results of the study show significant differences between the two sets of rape estimates from two national surveys: the “National Violence against College Women” study and the “National College Women Sexual Victimization” study, with the latter study’s estimates ranging from 4.4 to 10.4 percent lower than that of the former. While Fisher attributes the difference between the two surveys to four interrelated reasons, “the use of behaviorally specific questions cannot be overemphasized, not necessarily because they produce larger estimates of rape but because they use words and phrases that describe to the respondent exactly what behavior is being measured.”3 Fisher’s point, essentially, is that estimates coming from surveys that use more specific language describing victim’s experiences in behavioral terms, such as “he put his penis into my vagina,” produce more accuracy in the responses and thus improve the overall quality of the survey.

In a different discipline, potential response bias resulting from racial or ethnic cultural experience was found in research on health behavior by Warnecke et al.4 Among the researchers’ findings was evidence in support of differences in question interpretation related to respondent ethnicity. They suggest that providing cues in the question that help respondents better understand what is needed may address these problems.

Finally, in a third discipline, an economic study examining household surveys asking individuals about their economic circumstances, financial decisions, and expectations for the future conducted by Bruine de Bruin et al.5 found that even slight changes in question wording can affect how respondents interpret a question and generate their answer. Specifically, the authors concluded that questions about “prices in general” and “prices you pay” focused respondents more on personal price experiences than did questions about “inflation.” They hypothesized that thoughts about personal price experiences tend to be biased toward extremes, such as large changes in gas prices, leading respondents to overestimate overall inflation. Essentially, what would be considered irrelevant changes in question wording affected responses to survey questions.

These three examples from different disciplines serve as important illustrations of the sensitivities continuing to be explored in relation to the structure and format of survey questions. The goal of this chapter is to acquaint you with some structural issues on questions design and to provide general guidelines on writing good survey questions. Before we plunge into the topic, however, some context on survey questions will help clarify some of the points we make later.

We begin by highlighting the distinction between the survey mode, the survey instrument, and the survey questions. The mode, discussed in Chapter 5 (Volume I), is the method of delivery for the survey. The survey instruments often referred to as questionnaires can range from a traditional paper-and-pencil mail-out form to a self-administered electronic online survey with embedded audio and video and to the survey interview screen seen by computer-assisted telephone interviewing (CATI) telephone interviewers when they call interview participants and record the responses into a computerized database. Essentially, the survey instrument is the platform for the questions, while the questions are the expressions—a word, phrase, sentence, or even image—used to solicit information from a respondent. When talking about survey elements, the mode, the survey instrument, and the survey questions are interwoven and frequently discussed simultaneously (see, for example, Snijkers et al.6). However, in our discussion here, we have decided to focus on the questions rather than the questionnaire for two reasons. First, as Dillman notes in his discussion of the evolution from his Total Design Method (introduced in 1978)7 to the more recent developments in Tailored Design Method,8 advances in survey methodology, changes in culture, rapidly changing technology, and greater emphasis on online surveys have created a need to move away from a one-size-fits-all approach. This more formulaic approach has been replaced by a more customized one that can be adapted to the situation and participants. As Dillman et al. point out:

Rapid technological development in the past 15 years has changed this situation substantially so that there are now many means for contacting people and asking them to complete surveys. Web and cellular telephone communication have undergone rapid maturation as means of responding to surveys. In addition, voice recognition, prerecorded phone surveys that ask for numerical and/or voice recorded responses, fillable PDFs, smartphones, tablets, and other devices have increasingly been used for data collection. Yet, for many reasons traditional phone, mail, and in-person contacts have not disappeared, and are often being used in combination to maximize the potential of reaching people. In addition, offering multiple ways of responding (e.g., web and mail in the same survey) is common. It is no longer practical to talk about a dominant mode of surveying, as in-person interviews were described in the middle of the 20th century and telephone was referred to from about 1980 to the late 1990s.9

The advantages of such mixed-mode surveys include a means to reduce total survey error within resource and time limitations.10 For example, by using mixed-mode surveys, researchers can increase response rates. A common way to mix modes to reduce costs is to collect as many responses as possible in a cheaper mode before switching to a more expensive mode to try to obtain additional responses. This strategy was used by the U.S. Census Bureau for the 2010 Decennial Census. Paper questionnaires were first mailed to nearly every address in the United States, and about 74 percent of them responded (U.S. Census Bureau n.d.). Only then was the more expensive method of sending interviewers to try to obtain responses from households that did not respond by mail. The Census Bureau was able to avoid considerable expense by getting most households to respond by mail and minimizing the number that would need to be visited by in-person interviewers.11

While the survey field now embraces the idea of mixed-mode surveys (as discussed in Chapter 5, Volume I), the ability to garner information across different modes requires even greater attention to creating questions that can be formatted to fit into different modes. Further, while the modes of survey delivery continue to change, with a greater emphasis on self-administration and rapid electronic delivery and response, the fundamentals of good questions remain the same. In summary, it is essential to understand the most basic element in survey design and administration—the question—as we try to ensure compatibility across a range of delivery modes and formats if we are to produce valid and reliable surveys. Even a well-designed and formatted questionnaire cannot undo the damage to a survey caused by questions that are poorly conceived, badly constructed, or offensive to the respondent.

Begin at the Beginning

One of the themes you will see throughout this book is the importance of having a clear idea of what information you need to get to answer your research question(s). Before you ever begin to tackle the development of your questions for the survey, you should have these research questions and objectives written down, with a clear understanding and agreement between the sponsor and the researchers as to what they are. At the end of the survey project, the test of whether a survey was successful will be whether those original research questions were answered.

Dillman notes three goals for writing good questions for self-administered surveys so that every potential respondent will (1) interpret the question the same way, (2) be able to respond accurately, and (3) be willing to answer.12 Let’s briefly take a look at Dillman’s goals, which will help frame the actual design of questions: to write a question that every potential respondent will be willing to answer, will be able to respond to accurately, and will interpret in the way the surveyor intends. We then have to organize those questions into a questionnaire that can be administered to respondents by an interviewer or that respondents can process on their own. We also have to find and convince sample members to complete and return the questionnaire.

Validity and Reliability in Survey Questions

The first two question qualities that Dillman’s goals highlight center on two important concepts that must be addressed in our questions: reliability and validity. Reliability and validity are two of the most important concepts in research, generally, and much of the effort we put into survey research is directed toward maximizing both to the greatest extent possible. In surveys, reliability refers to the consistency in responses across different respondents in the same situations.i Essentially, we should see consistency of the measurement, either across similar respondents or across different administrations of the survey. In a questionnaire, this means that the same question elicits the same type of response across similar respondents. To illustrate, the question “In what city do you currently live?” is an extremely reliable question. If we asked 100 people this question, we would expect to see a very high percentage responding to city of residence in a similar fashion, by naming the city in which they are currently living. Likewise, if we were to ask similar groups this same question in three successive decades, we would again expect the kind of responses we would get to be very parallel across those years. By contrast, if we asked 100 people, “How much money in dollars does it take to be happy?” we would find a great deal of inconsistency in their responses. Further, if we posed the same question to groups across three decades, we would likely find a great deal of variation in their responses. One of the major differences between the two questions is the degree to which perceptions of key concepts are shared among participants. In the first question, the concept of city of residence has a generally shared definition. By contrast, in the second question, the definition of the concept of happiness is vague and not as widely shared. Many people, for example, would probably have great difficulty in putting a monetary value on what they view as essential to being happy.

While perceptions of the question by survey participants affect reliability on written survey instruments, when the questions are presented in an interview format, we must also deal with the differences between interviewers. Reliability can be impacted if the question is asked differently by different interviewers or if a single interviewer varies the way the question is presented to different participants. Once the question has been asked, reliability can also be impacted by how the response is recorded by the interviewer. Interviewer issues, including the importance of interviewer training, are discussed later in Chapter 3 (Volume II).

Validity,ii on the other hand, refers to the extent that the measure we are using accurately reflects the concept we are interested in, or, as Maxfield and Babbie note, “Put another way, are you really measuring what you say you are measuring?”13 Let’s revisit our first question: “In what city do you currently live?” If we asked a group of high school students that question, we would likely get accurate responses. On the other hand, if we were trying to assess the intelligence of high school students and asked them what their grade point average was and concluded, on the basis of those grade point averages, that the percentage of students with an A average was the percentage of very intelligent students, the percentage of those with a B average was the percentage of students with above-average intelligence, the percentage of those with a C average was the percentage of average intelligence students, and so forth, such a conclusion would be invalid because grade point average isn’t an accurate measure of intelligence.

There is an interplay between reliability and validity in survey research, and when creating survey questions, we must pay attention to both. One of the common analogies used to help understand the relationship between reliability and validity is the target shown in Figure 2.1. The objective in our questions is to hit the bull’s-eye on the target.

Figure 2.1 Relationship of reliability and validity in question design

Willingness to Answer Questions

Being asked to complete a survey, whether it’s conducted in person or online, by mail, or on the phone, probably isn’t on the top of most people’s list of things they most enjoy. The issue surrounding potential participants’ willingness to take part in a survey has become very visible in the past couple of decades because of substantially declining survey participation response rates. As a result, considerable attention has been paid to the use of incentives as ways to improve participation, as well as the design factors that intrinsically make participation more likely.14 However, research has consistently shown that incentives to reduce the burden on respondents as well as rewarding them for their help have a significant impact on improving responsiveness.15 Participants must see a value to their involvement that outweighs the effort they need to expend by participating. If participants are not motivated to answer each question, if they see no benefit from their effort, if a question is offensive or demeaning, if they don’t understand a question, or if they believe that answering a question will result in harm to them (such as a violation of their privacy), it is likely they simply won’t answer the question.

Key Elements of Good Questions

So, what are the essential elements of good questions? A reading of the literature from general textbooks to highly specialized journal articles will provide a vast, almost alarming assortment of recommendations, cautions, and directions—as Dillman puts it, “a mind-boggling array of generally good, but often confusing and conflicting directions about how to do it.”16 To avoid falling into this trap ourselves, we will use a slightly modified version of attributes suggested by Alreck and Settle to attempt to distill these recommendations down into three major areas: specificity, clarity, and brevity.17

Specificity, Clarity, and Brevity

By question specificity, we are referring to the notion that the question addresses the content of the information sought as precisely as possible. Does the information targeted by the question match the target of the needed information? If a question does not, the results it produces will have low validity in terms of addressing the research objective. A rough analogy might be using a pair of binoculars to look at a distant object. If you close first your left eye, then your right, each eye will see the object independently. However, if you open both eyes and instead of seeing one image with both, you see two images, then you know some adjustment to the binoculars is needed. Similarly, there should be a high level of congruence between the research objectives and the question(s) asked. If there is not, some tweaking of the question(s) will be needed. For the question to accurately address the research question(s), it must also be relevant to the survey respondent. If you ask survey respondents about a topic with which they are unfamiliar, the question may have high congruity between its topic and the information needed but would do a poor job of getting that information from respondents.

The second area, question clarity, is one of the biggest problems in survey research, particularly when it’s used with self-administered survey instruments. Lack of clarity has a large impact on both question validity and reliability because the question must be equally understandable to all respondents. The core vocabulary of the survey question should be attuned to the level of understanding of the participants. There is oftentimes a disparity between what the survey sponsors or the researchers know about the question content and the respondents’ level of understanding. Frequently, this happens when technical terms or professional jargon, very familiar to sponsors or researchers but totally unknown to respondents, is used in survey questions. To illustrate, consider the following question that was found on a consumer satisfaction survey sent to one of the authors of this book: “How satisfied are you with your ability to engage the safety lock-out mechanism?” The response categories ranged from very satisfied to not satisfied at all. The problem was the author wasn’t aware there was a safety lock-out mechanism on this product or how it was used! The reverse of this problem can also hamper question clarity. This happens when sponsors or researchers have such a superficial knowledge of the topic that they fail to understand the intent of their question. For example, if a research firm asked small businesses if they supported higher taxes for improved city services, they might find respondents asking, “Which taxes?” “Which services?”

The third area, brevity, has to do with the length of the question. The length and complexity of questions affects the response rate of participants as well as impacting the validity and reliability of the responses. Basically, questions should be stated in as straightforward and uncomplicated a manner as possible, using simple words rather than specialized ones and as few words as possible to pose the question18 (although this last caution may be more applicable to self-administered questionnaires than for interview formats).19 More complex sentence structures should be avoided. For example, compound sentences (two simple sentences joined by a conjunction such as and or or) or compound–complex sentences (those combining an independent and a dependent clause) should be broken down into two simpler questions.20

Avoiding Common Question Pitfalls

Before moving on to question types, let’s look at some common question pitfalls that apply equally to different question types and formats.

  • Double-barrel questions: These are created when two different topics are specified in the question, essentially asking the respondent two questions in one sentence. This leaves the respondent puzzled as to which part of the question to answer.
    • Example. How would you assess the success of the Chamber of Commerce in creating a favorable business climate and an awareness of the negative impact of overtaxation on businesses?

      Correction: The question should be split into two separate questions:

      1. How would you assess the success of the Chamber of Commerce in creating a favorable business climate?
      2. How would you assess the success of the Chamber of Commerce in creating an awareness of the negative impact of overtaxation on businesses?
  • Loaded or leading questions: These originate when question wording directs a respondent to a particular answer or position. As a result, the responses are biased and create false results. Political push polls, which are sometimes unethically used in political campaigns, illustrate extreme use of loaded questions.21 They create the illusion of asking legitimate questions but really use the question to spread negative information by typically using leading questions (see the second example).
    • Example. Don’t you see some problem in letting your children consume sports drinks?

      Correction: The question should be reworded to a neutral statement.

      1. Is letting your children consume sports drinks a problem?
    • Example. Are you upset by Senator ____________’s wasteful spending of your tax dollars on programs for illegal immigrants?

      Correction: All negative references in the question should be removed.

      1. Should programs for undocumented immigrants be supported with tax dollars?
  • Questions with built-in assumptions: Some questions contain assumptions that must first be considered either true or false in order to answer the second element of the question. These pose a considerable problem as the respondent may feel disqualified from answering the second part of the question, which is the real topic focus.
    • Example. In comparison with your last driving vacation, was your new car more comfortable to ride in?

      Correction: Potential respondents may hesitate to answer this question because of an assumption contained in it: that the individual has taken a driving vacation. This question could be split into two separate questions.

      1. Have you previously taken a driving vacation?
      2. If yes, in comparison with your last driving vacation, was your new car more comfortable to ride in?
  • Double-negative questions: Questions that include two negatives not only confuse the respondent but may also create a level of frustration resulting in nonresponse.
    • Example. Please indicate whether you agree or disagree with the following statement. A financial adviser should not be required to disclose whether the adviser gets any compensation for clients who purchase any of the products that the financial adviser recommends.

      Correction: The not be required in the question adds a layer of unnecessary complexity. The question should be worded as follows: “A financial adviser should be required to disclose whether or not he or she gets any compensation for clients who purchase any of the products that the adviser recommends.”

Question Types and Formats

The formatting of the survey question considers the research objectives, the characteristics of the respondent, the survey instrument and mode of delivery, and the type of analysis that will be needed to synthesize and explain the survey’s results. Broadly speaking, there are two principal types of survey questions: unstructured and structured. Unstructured questions are sometimes called open-ended because they do not restrict the possible answers that the survey respondent may give. The second general question type, structured, is commonly referred to as closed-ended because the survey participant is limited to responses or response categories (pre)identified by the researchers. For example, if we are doing a telephone survey of community needs, we might use either of the following two survey questions.

  1. What do you like best about living in this city?
  2. What do you like best about living in this city?
    1. A good transportation system
    2. A low crime rate
    3. A lot of entertainment and recreational opportunities
    4. A good school system
    5. Good employment opportunities

The first question is open-ended, while the second is a closed-ended question. Let’s briefly look at the characteristics of both types.

Open-Ended Questions

In responding to the first question, which is open-ended, a respondent’s answer could obviously cover many different areas, including some the researchers had not previously considered. In this respect, open-ended questions are particularly well suited to exploring a topic or to gathering information in an area that is not well known. With open-ended questions, the scope of the response is very wide, so they generally work best when the researcher wants to provide, in essence, a blank canvas to the respondent. On the downside, the open-ended question may elicit a response that may be well outside the question’s focus: An answer such as “My best friend lives here” or “I get to live rent-free with my parents” doesn’t really address the community needs issue, which was the intent of the question. Thus, the response to the question would have low validity.

When self-administered surveys use open-ended questions, the ability to interactively engage in follow-up questions or probe to clarify answers or get greater detail is limited. For this reason, considering the aspects of specificity, clarity, and brevity in question design is especially important. Wordy, ambiguous, or complex open-ended question formats not only create difficulty in terms of the respondent’s understanding of the questions but may also present a visual image format that suggests to the respondent that this question will be difficult and time consuming to answer. For this reason, an open-ended question should never exceed one page on a written survey or require respondents to read across different screens in a computerized format. Similarly, if the question is provided in a paper format, the space provided to answer the question should directly follow the question rather than being placed on a separate page or after additional questions. Simple formatting elements such as providing sufficient space to allow the respondent to write or type in a narrative-type response in a paper or online survey are very important.22

When open-ended survey questions are administered to the survey participant in an interview, the response is often recorded verbatim or with extensive notes, which is useful to later pick up on the nuances of the response, such as how the person responding phrases an answer or the strength of a feeling they express in their response. Similarly, with in-person, telephone, or interactive online interviews, the interviewer can use follow-up questions to obtain more specific information or probe for more detail or seek explanation of the open response. In some cases, these probes are anticipated and preprogrammed into the interview questionnaire, but in others they are spontaneously developed by the interviewer on the basis of answers that are not clear or lack detail. Obviously, the experience and qualifications of the interviewers have a major impact on the ability to follow up with conditional probes.

Open-ended questions can be time consuming for both the respondent and the researcher. For the respondent, it requires the individual to not only recall past experiences but also make a judgment as to how best and with how much detail to answer. For the researcher, open-ended questions often yield many different responses, which may require additional coding and can complicate or even prevent the analysis. Also, since responses to open-ended questions are typically in narrative format, a qualitative rather than quantitative analysis of the information must be anticipated. Such analysis, even when aided by computerized qualitative analysis programs,23 typically requires more time and effort. Owing to the increased effort on the part of the respondents and researchers with these types of questions, from a practical perspective, the number of open-ended questions must be held to a reasonably low number on a survey instrument. It also prompts researchers to avoid using open-ended questions when they are working with large samples.

Closed-Ended Questions

The closed-ended question format is more defined and has been standardized to a greater extent than the open-ended style. While both open- and closed-ended questions require researchers to craft questions carefully, closed-ended questions place an additional burden on researchers to carefully consider what responses are needed and are appropriate. As can be seen in the second question, the closed-ended responses are restrictive and therefore must be precisely targeted to research questions. The closed-ended format does allow researchers to impart greater uniformity to the responses and to easily determine the consensus on certain items, but only on those items that were specified by the answers provided. This, in turn, can lead to another problem, as pointed out by Krosnick and Fabrigar;24 because of researcher specification with closed-ended questions, open-ended questions are less subject to the effect of the researcher.

With closed-ended questions, the response choices should be both exhaustive and mutually exclusive. This means that all potential responses are listed within answer choices and that no answer choice is contained within more than one response category. We should point out a distinction here between single- and multiple-response category questions. In single-response questions, only one of the choices can be selected, and therefore the response choices must be exhaustive and mutually exclusive. However, some questions are worded in a way that a respondent may choose more than one answer from the choices provided. In this situation, the response choices are each still unique, but the person responding can select more than one choice (see Example 4).

Consider the following closed-ended question:

Example 1

In which of the following categories does your annual family income fall?

 

a) Less than $20,000

b) $21,000–$40,000

c) $40,000–$60,000

d) $61,000–$80,000

 

Can you see a problem with the response set? If you said that it has both nonmutually exclusive categories and does not provide an exhaustive listing of possible family income levels, you are right. Both problems are seen with the current response categories. If you look carefully, you will notice that a respondent whose family income is between $20,000 and $20,999 has no response category from which to choose. Similarly, if a respondent’s family income level is $105,000 a year, the individual would be in a similar quandary, as again there is no appropriate answer category. A problem also arises if the respondent has an annual family income of $40,000 a year. Which category would the individual choose—(b) or (c)? Fortunately, these two problems are easy to fix. To solve the issue of nonmutually exclusive categories, we would change response (c) to “$41,000 to $60,000.” To correct the problem of transforming the response set of answers to be exhaustive, we could change the first response option (a) to “less than $21,000” and add another response option at the end of the current group, (e) “more than $80,000.”

One of the first considerations with closed-ended questions is the level of precision needed in the response categories. Take, for example, the following three questions that are essentially looking for the same type of information.

Example 2

When you go clothes shopping, which of the following colors do you prefer?

(Please check your preference)

______ Bright colors ______ Dark colors

Example 3

When you go clothes shopping, which of the following colors do you prefer?

(Please check your preference)

______ Bright colors _____ Dark colors _____ No preference

Example 4

When you go clothes shopping, which of the following colors do you prefer?

(Please check each of your preferences)

 

____ Yellows ____ Browns ____ Reds ____ Greens
____ Blues ____ Pinks ____ Blacks ____ Whites
____Oranges ____ Purples ____ Lavenders
____ No preference

 

Each of these questions provides a different level of precision. In the first question, the choices are limited to two broad, distinct color categories, which would be fine if researchers were looking for only general impressions. However, if the question were on a fashion survey, this level of detail wouldn’t be sufficient. The second question also opens up another answer possibility, that is, a response that indicates the individual doesn’t have a color preference. The third question, which could be expanded to any number of response categories, not only provides an indication of specific color choices but also allows the individual to select specific colors from both bright and dark areas. This question could be further enhanced by providing a visual image, such as a color wheel, that would let the respondents mark or check precise colors, thus ensuring greater reliability in the answers provided across different respondents.

Unfortunately, there is always a delicate balance in trying to get to the greatest level of precision in question responses, on the one hand, while not sacrificing the respondent’s ability to answer the question accurately, on the other. With closed-ended questions, the formatting of the response categories can impact important response dimensions, such as the ability to accurately recall past events. More than 40 years ago, Seymour Sudman and Norman Bradburn described the problem of recall on memory.

There are two kinds of memory error that sometimes operate in opposite directions. The first is forgetting an episode entirely, whether it is a purchase of a product, a trip to the doctor, a law violation, or any other act. The second kind of error is compression of time (telescoping) where the event is remembered as occurring more recently than it did. Thus, a respondent who reports a trip to the doctor during the past seven days when the doctor’s records (show) how that it took place three weeks ago has made a compression-of-time error.25

The second problem in trying to make the response categories too precise occurs when the response categories become impossible to differentiate in the respondent’s mind. It’s a little bit like the average wine drinker trying to distinguish whether the sauvignon blanc wine promoted by the local wine shop actually has a fruit forward taste with plum and cherry notes and a subtle flowery finish. In a survey question directed to office workers, the problem might look like this:

Example 5

If you spend more time responding to business e-mails and text messages this year compared with last year, please choose the category below that best describes the difference in the average amount of time per day you spend responding to business e-mails and text messages this year compared with last year?

 

a) _____ Less than 8 minutes

b) _____ 8–16 minutes more

c) _____ 17–25 minutes more

d) _____ 26–34 minutes more

e) _____ 35–43 minutes more

f) _____ More than 43 minutes

 

As you can imagine, the average worker would likely have great difficulty in trying to estimate time differences with this degree of specification. Essentially, the researchers are trying to create too finite a distinction in the categories. This concept is often referred to in terms of the granularity of the response category, which refers to the level of detail in the response categories.

There are two basic formats for closed-ended questions: (a) unordered or unscalar and (b) ordered or scalar.26 The first of these, the unordered or unscalar response category, is generally used to obtain information or to select items from simple dichotomous or multiple-choice lists. The data obtained from this type of question are usually categorical, measured at the nominal level, which means there are discrete categories but no value is given to the categories. Here are some examples.

Example 6

Will your company be doing a significant amount of hiring in the next year?

_____ Yes _____ No

Example 7

If your company plans to expand its workforce in the coming year, which of the following best explains that expansion?

____ Rehiring from previous downsizing

____ New markets have created greater product demand

____ Expansion in existing markets has created greater product demand

____ New products coming to market

Such unordered response categories are sometimes referred to as forced-choice categories because the respondent can choose only one answer. Some forced-choice questions are used to help determine choices between areas that, on the surface, appear to have an equal likelihood of being selected.

Example 8

When buying a new home, which of the following do you consider most important?

____ Cost ____ Location ____ Size ____ Age

Unordered response categories may be partially opened by providing an alternative to the choices listed by including an other category with a blank line that allows a respondent to insert another response in addition to those listed.

Example 9

When buying a new home, which of the following do you consider most important?

_____ Cost

_____ Location

_____ Size

_____ Age

_____ Other (please explain)

Ordered or scalar response category, as the name implies, arranges responses in an order by requiring the respondent to select a response that conveys some order of magnitude among the possible choices. These response choices are measured by ranking or rating the response on a scale at the ordinal, interval, or ratio level. With ordinal ranking, the response categories are sorted by relative size, but the actual degree of difference between the items cannot be determined. For example, consider commonly seen scales that ask respondents to indicate whether they strongly agree, agree, neither agree nor disagree, disagree, or strongly disagree. Thus, each identified response category becomes a point along a continuum. One of the first and most commonly used rating scales is the Likert Scale, which was first published by psychologist Rensis Likert in 1932.27 The Likert Scale presents respondents with a series of (attitude) dimensions, which fall along a continuum. For each of the attitude dimensions, respondents are asked whether, and how strongly, they agree or disagree, using one of a number of positions on a five-point scale. Today, Likert and Likert-type scales are used commonly in surveys to measure opinions or attitudes. The following example shows a Likert-scale question and response set.

 

Question: Please rate the employees of this company for each of the areas listed below

For each item below, please check the answer that best applies, from strongly disagree to strongly agree

Strongly disagree

Disagree

Neither agree nor disagree

Agree

Strongly agree

1. The employees in this company are hard working

2. The employees in this company have good job skills

3. The employees in this company are dependable

4. The employees in this company are loyal to the company

5. The employees of this company produce high-quality work

 

A variant of the Likert Scale is the Semantic Differential Scale, another closed-ended format, which is used to gather data and interpret it on the basis of the connotative meaning of the respondent’s answer. It uses a pair of clearly opposite words that fall at the ends of a continuum and can be either marked or unmarked. Here are some examples of a semantic differential scale.

Marked Semantic Differential Scale

Please answer based on your opinion regarding the product:

 

Very

Slightly

Neither

Slightly

Very

Inexpensive

[ ]

[ ]

[ ]

[ ]

[ ]

Expensive

Effective

[ ]

[ ]

[ ]

[ ]

[ ]

Ineffective

Useful

[ ]

[ ]

[ ]

[ ]

[ ]

Useless

Reliable

[ ]

[ ]

[ ]

[ ]

[ ]

Unreliable

 

Unmarked Semantic Differential Scale

The central line serves as the neutral point:

 

Inexpensive

________________________

________________________

Expensive

Effective

________________________

________________________

Ineffective

Useful

________________________

________________________

Useless

Reliable

________________________

________________________

Unreliable

Source: Sincero (2012).28

 

With interval scales, by contrast, the difference between the categories is of equal distance and can be measured, but there is no true zero point. A common example of an interval scale is calendar years. For example, there is a specific distance, 100 years, between 1776 and 1876, yet it makes no sense to say 1776 is 95 percent of the later year. With ratio scales, which do have a true zero point, you can calculate the ratios between the amounts on the scale. For example, salaries measured on a dollar scale can be compared in terms of true magnitude. A person who makes $200,000 a year makes twice as much as someone whose salary is $100,000.

In summary, question content, design, and format serve as the fundamental elements in building and executing good surveys. Research has provided some guidance on best practices. For example, the following general design and question order recommendations have emerged from research: (a) order questions from easy to difficult, (b) place general questions before specific questions, (c) do not place sensitive questions at the beginning of the survey, and (d) place demographics at the end of the questionnaire to prevent boredom and to engage the participant early in the survey.29

It is also recognized that survey responses can be affected by how the question and response categories are presented, particularly ordinal scale questions.30 There is considerable debate over many of the facets of response sets and scales. For example, there is little agreement on the optimum number of points on a scale. The only agreement is that between 5 and 10 points is good, with 7 considered the optimal number by many researchers.31 But there is a range of opinions on this issue and on whether extending the number of points to 10 or more increases the validity of the data.32

Similarly, while it seems there is general agreement on using an odd number of categories, so as to have a defined midpoint in the scale, issues such as the inclusion of don’t know categories33 remain controversial, as some contend they are used when the respondent means no or doesn’t want to make a choice.

Because the primary mode of delivery continues to undergo changes as technology drives our ways of communicating and interacting, questions need to be adaptable over multiple platforms and retain their validity and reliability in mixed-mode designs. While we have a solid research base, we are struggling to see how new technologies such as web-based surveys and smartphone applications (apps) change the dynamics of survey design and administration.

Summary

  • The design, format, and wording of questions are extremely important in surveys.
    • Questions form the basic building blocks of surveys.
    • Questions have a major impact on measurement error (which is an important component of the total survey error).
    • Question construction has a major impact on how individuals respond.
    • Questions must be targeted to answer the research questions.
  • The relationship between survey mode, survey instrument, and survey questions is important to consider.
    • The method of delivery of the survey (mode); the platform for the questions (survey instrument or questionnaire); and the expressions of words, phrases, images, and so forth used to solicit information (questions) are all interrelated and must be developed in concert.
    • Because mixed-mode surveys are becoming increasingly popular, questions must be designed to be adaptable across different instruments and modes.
  • Addressing question reliability and validity
    • If a question is reliable, we see consistency in responses across different respondents in similar situations.
    • If a question is valid, it accurately measures what we say we are measuring.
    • Questions must have both reliability and validity.
  • Addressing participants’ willingness to answer questions
    • Declining participation rates have focused more attention on ways of improving respondents’ willingness to answer questions.
    • Motivation to answer questions can be increased or decreased by several factors.
      • Value of participation to the respondents
        • Balance of effort needed to respond against benefit of responding
        • Respect and courtesy shown to participants
        • Providing incentives
      • Avoiding questions that are offensive or demeaning
      • Making questions understandable
      • Assuring participants that they will not be put at risk of harm (such as violating privacy) by responding
  • Key elements of good questions
    • Specificity—Addressing the content of information as precisely as possible
    • Clarity—Ensuring that question wording and concepts are understandable to the respondent
    • Brevity—Making the question as short, straightforward, and simply worded as possible
  • Common question pitfalls
    • Double-barrel questions
    • Loaded or leading questions
    • Questions with built-in assumptions
    • Double-negative questions
  • Open-ended questions
    • Can be administered through interviews, other interactive formats, or in self-administered forms
    • Are good for exploring topics or gathering information in areas that are not well known
    • Allow participants a blank canvas to respond, usually in narrative format
    • Have some problems with validity because responses may miss question intent
    • Are frequently used with follow-up questions or probes to get more detail or further information
    • Require more effort on the part of both respondents and researchers
  • Closed-ended questions
    • They are more defined and standardized than open-ended.
    • They generally require less effort on the part of respondents and researchers.
    • Response categories are restricted and predetermined by researchers.
    • Wording and format of response categories must be carefully constructed to ensure that the information required to answer research questions is obtained.
      • Response categories may be (a) ordered or scalar or (b) unordered or unscalar.
      • Response categories may be measured at the nominal, ordinal, interval, or ratio level.
      • Response categories must be mutually exclusive and exhaustive.

Annotated Bibliography

General

  • See Chapters 7 and 8 in Robert Groves et al., Survey Methodology, 2nd ed.34

Characteristics of Good Questions

  • Dillman et al. text on surveys35 provides a great deal of information on the characteristics of good questions, including Dillman’s 19 principles for good question design.

Reliability and Validity

  • For a conversational and easy-to-understand overview of reliability and validity in survey research, see “Understanding Evidence-Based Research Methods: Reliability and Validity Considerations in Survey Research” by Etchegaray and Fischer.36
  • For an in-depth review of reliability in surveys, see Alwin’s Margins of Error: A Study of Reliability in Survey Measurement.37

Question Type and Structuring Questions on a Survey

  • Ian Brace38 provides a good overview of question types and the importance of how questions are structured on survey instruments.

iIt is important for researchers to recognize changes in the context of the situation, which might affect the consistency of responses. For example, if a survey on school safety was administered to children in a particular school district before and after a major school shooting was reported in another part of the country, the situation of the survey might appear to be the same, but the situation context would be substantially different.

iiThere are four major areas of validity, namely, face, content, criterion, and construct, which we will not discuss. The interested reader can go to any introductory statistics or social research methodology text to learn more about these different types of validity.