Better measurement for better results: A practical guide to strengthening your survey-based research

Authors: Ms Jennie Walker BSc(Hons)1, Dr Hafiz Khusyairi PhD2, Prof Philip J. Batterham PhD3, Dr Amy Page PhD4

1. MBiostat, Faculty of Medicine and Health, University of Sydney, Camperdown, Australia
2. COVID-19 Public Health Response Branch, New South Wales Ministry of Health, 1 Reserve Road, St Leonards, Australia
3. Centre for Mental Health Research, Australian National University, 63 Eggleston Road, Acton, Australia
4. Centre for Medicine Use and Safety, Monash University, The Alfred, 55 Commercial Road, Melbourne, Australia

SHPA proudly supports the Research Toolkit series, which aims which aims to support members in conducting and publishing their research. This series is coordinated by the SHPA Research Leadership Committee, and hence shares the insights and experience of our most research-passionate members. If you’re keen to make a difference to patient care, not just in your daily practice, but in improving practice itself, then this series is for you.

Contents:

First things first; work smarter, not harder
Where to find existing scales and how to evaluate them
How to strengthen the rigour and trustworthiness of your research findings
Content validity: How do you capture the concept of your scale?
Content and face validity: How do you develop and test your scale?
Do your items show evidence of validity and reliability?
So, what now?

Helpful Resources
Acknowledgements
References


 

Introduction

Surveys are commonly used in many areas of health research. Accurate, reproducible survey results lead to new discoveries and expand our knowledge in the health sciences. Applying better practices to our research can get us well on the way to achieving such a feat.

One of many factors to consider during the design of a survey is the inclusion of rigorous and trustworthy measures. We have created this guide to give you the know-how to do this. First, we provide you with information about survey research and how measures within a survey can provide a foundation for rigorous results. Then, we delve into the details of collecting evidence to support the rigour of these measures. Armed with this information you will be well on your way to reporting accurate and reproducible results.

 


First things first; work smarter, not harder

Surveys are commonly used to collect information from respondents about their behaviours, beliefs, attitudes, or intentions. Asking respondents to provide details about themselves is often the only way to measure specific outcomes. This is particularly the case where there are limitations in obtaining accurate, observable, or biological markers such as mental health and wellbeing, perceived psychosocial supports, or attitudinal factors. In the context of survey research, these markers are better known as constructs. They are typically considered subjective and capture an abstract idea such as behaviours, beliefs, or attitudes. Generally, a construct is measured by summing or averaging responses on a scale (other methods exist but these are outside the scope of this guide). Scales are a set of multiple items and provide a good source of information from respondents about a particular construct. For example, a widely used scale to measure the construct psychological distress is the Kessler-10 scale (K10).

You don’t have to look too far to find a plethora of existing scales shown to have been rigorously tested. Through extensive testing at the time of their first use, these scales will provide a precise and reliable measure for your construct. Using existing scales will also help you compare the data you gather with data already available.

When designing your study, it can be easy to dismiss an existing scale based on factors such as length, a non-preferred format, or personal dislike of the questions. While developing a scale may sound straightforward, it is a costly and lengthy process including a literature review, focus groups/interviews, question development, pre-testing, and statistical analysis. Throughout this process it is essential you collect evidence to show the new scale is reliable and trustworthy. Ideally, you will collect evidence across time and settings; so, some say gathering this evidence is never really finished.1 Therefore we recommend you adopt, where feasible, existing scales already tested by others. Investment of your time and effort upfront to find a suitable scale will far outweigh the time and effort you will spend developing your scale.


 

Where to find existing scales and how to evaluate them

So how do you find and review existing scales? The first step is to clearly define your study aims, the related constructs, and the group of individuals you want to include in your study (target population). This information will help you to narrow your search for suitable scales. Start by searching an academic database like Medline, or using a search engine such as Google Scholar, or identifying relevant books (see Appendix A in Health Measurement Scales: A Practical Guide to Their Development and Use under Helpful Resources).2 Some professional associations also provide helpful resources, so be sure to check out their web pages. Once you have found a scale or multiple scales that appear to suit your needs, conduct a critical assessment of each scale individually. The purpose of this is to make sure the scale aligns with your study aims. We have provided three questions below to get you started for a critical review.

  • Are the questions in the scale appropriate to your study?
  • Is there evidence of validity and reliability reported by the author? You can use this guide as a starting point.
  • Has the scale been tested with participants similar to your target population?

If you conclude no suitable scale exists, then this guide will assist you in developing a scale relevant to your desired construct.

Key message: Use existing validated scales as much as possible. Critically assess the suitability of existing scales for your research purposes by reviewing the available evidence.


 

How to strengthen the rigour and trustworthiness of your research findings

The validity and reliability of a scale underpin the study results. Validity and reliability not only strengthen your study conclusions but also gives other researchers confidence to use your scale. A reliable scale will give a consistent result regardless of who is responding and regardless of the context. Meanwhile, a valid scale will adequately capture the construct you intend to measure. For example, if our body weight remained stable and we weighed ourselves every day, a reliable set of scales will show the same weight daily (reliability). Whereas an accurate measure of our body weight in kilograms, for example, will show us the scale is valid (validity).

To provide support to your scale, a collection of evidence demonstrating validity and reliability is essential. In the following sections of this guide, we address aspects of validity and reliability while navigating you through the process of scale development. Alternatively, you can use this guide when evaluating the evidence of existing scales. In the remainder of the guide, we concentrate on five different types of validity and reliability (see Table 1).

Type of validity/reliability

Description

Definition

Content validity

Do the questions capture the construct you want to measure?

Refers to the extent to which items on your scale are representative of the construct you are seeking to measure

Face validity

Superficially do the questions appear to measure what they should be measuring?

Refers to the degree to which your items appear to be a reasonable or accurate measure of the construct

Convergent validity

Does your scale have a relationship with a similar measure?

Refers to the extent to which your scale relates with conceptually similar scales

Discriminant validity

Does your scale lack a relationship with an unrelated measure?

Refers to the degree to which your scale diverges from a measure with a conceptually unrelated construct

Internal consistency (reliability)

Are the scores on each item related to one another and one overall construct?

Refers to the extent to which the items in your scale measure the same construct

Table 1. The types of validity and reliability included in this guide.
 

Key message: Testing the validity and reliability of your scale is essential in all stages of development. The accumulation of evidence for different types of validity and reliability will support your research results and give other researchers confidence to use your scale.


 

Content validity: How do you capture the concept of your scale?

As a first step, you will need to collect information about the construct you want to measure. Making use of your previous work searching for existing scales will come in handy here. Next, you will need to find information related to your construct. This can come from various sources including focus groups, interviews with people from your target population, and engagement with expert panels. These methods are qualitative in nature and offer an in-depth, rich source of information. A search of existing literature for any current research and identifying themes grounded in theory may also prove helpful.2 The information gathered during this step will provide you with content for your scale questions (outlined in the next section). If any concept was omitted, the items of your scale might not be representative of the construct you are aiming to measure. This would compromise the content validity of your scale. For example, if a scale aiming to measure depression didn’t include a question about low mood, it would probably have poor content validity. For more information on the methods described in this section, see Health Measurement Scales: A Practical Guide to Their Development and Use in Helpful Resources.

Key message: Use a variety of methods to capture key information to guide your research questions. Questions that adequately capture your construct will provide evidence for content validity.


 

Content and face validity: How do you develop and test your scale?

Now it’s time to develop your questions and test your scale. We have provided a checklist below to help you write clear and well-worded questions for your scale.3, 4

  • Aim to write questions rather than statements
  • Avoid asking:
    • double-barreled questions or two questions in one (e.g. ‘The care I received was timely and courteous?’)
    • double negatives (e.g. ‘Do you agree long waiting times are not uncommon?’)
    • leading questions (e.g. ‘Don’t you agree free transport to the hospital would be a good idea?’)
    • ambiguous questions (e.g. ‘Do you use drugs?’). It is unclear if the question is asking about illegal drugs, prescription drugs, or over-the-counter drugs.
  • Write closed-ended questions with options for participants to choose a response. Find existing pre-coded response options that are mutually exclusive and exhaustive (e.g. 0 Never, 1 Rarely, 2 Sometimes, 3 Often, 4 Always). These allow for ease of interpretation and coding.
  • Make sure questions share a consistent grammatical person, typically first or second person
  • Questions should share a distinct timeframe. Recent events are easier for participants to recall (e.g. ‘In the last two weeks, how often did you…’).
  • Ensure the questions are not conditional on experiencing a specific event, and everyone in your target population can answer them (e.g. ‘In the last two weeks, I felt anxious on trains’).
  • Make sure the questions are clear and well-worded, and the language matches your target population. A good way to achieve this is to capture commonly used phrases and terminology when defining the concept of your scale (see the previous section).

It is also important you consider the feasibility of the scale. A scale may not be useful if it is difficult for people to complete in the intended setting. For example, a scale that requires an expert or interpreter to assist respondents when answering the questions would not be useful. Also, if the burden of completing the scale by the respondent outweighs the usefulness of the scale, consider starting with a shorter set of questions. For example, it is unlikely a large number of respondents will complete a 20-minute scale that is not relevant to their needs.

It is good practice to identify any issues with your scale early on by pre-testing the questions with colleagues and people in your target population. Pre-testing will give you an opportunity to identify any issues with comprehension, question relevance, question-wording, conceptual ambiguities, unclear reference periods, completion time, and a mismatch between the questions and the response options. Again, qualitative methods will provide a rich source of information. Such methods include expert panels, behavioural coding, focus groups, or cognitive interviewing.1, 3 Cognitive interviewing is a useful technique and explores how respondents interpret the questions and their response options. You can conduct the interviews one-on-one or in a small group setting. If your questions appear to measure the intended construct, you will have collected evidence of face validity. For more information on cognitive interviewing, see ‘Research synthesis: The practice of cognitive interviewing’ in Helpful Resources.

Key message: Make sure your questions are well-considered and clearly written. Pre-test your questions with people in your target population to identify any issues before survey production.


 

Do your items show evidence of validity and reliability?

Convergent and discriminant validity: Is there a relationship with other scales?

After you have collected your data, what now? It’s time to assess the evidence of validity and reliability for your scale. Let’s start by checking if the score from your scale correlates with a score from a similar scale (convergent validity). Usually an identical scale will not exist, so find one that relates conceptually with evidence of validity and reliability. Once you have the scores from both measures, correlate these using a Pearson r correlation coefficient. Scores for a Pearson r range from -1 to +1. A score between 0.4 and 0.8 will provide evidence that your scale is capturing a similar construct as the existing scale.2 On the other hand, you can test whether your scale diverges from a conceptually unrelated scale (discriminate validity). A Pearson r correlation coefficient closer to zero will indicate little or no relationship between the measures. General statistical packages include a measure of the Pearson r correlation coefficient, as does Microsoft Excel (Microsoft Corporation, Redmond, Washington, USA). Online resources are available to get you started using Microsoft Excel,5 but also consider reaching out to an academic researcher or colleague with statistical experience for assistance.

Internal consistency (reliability): Are the items related to one another?

Let’s now focus on how the items relate to one another (internal consistency or reliability). There are a few measures you can use to test the reliability of your scale, such as split-half (odd-even) reliability or Kuder-Richardson. But one popular measure is Cronbach’s alpha.1 A Cronbach’s alpha score will be between 0 and 1, with 0 showing your scale is not at all reliable and 1 showing perfect reliability. Generally, an alpha score of 0.7 and above will provide you with evidence of a reliable scale. Most general statistical packages include a measure of Cronbach’s alpha. If this doesn’t make sense, reach out to an academic researcher or colleague with statistical experience for assistance.

While the previously described process of testing your scale is an important step, you can take this further. There is a range of advanced statistical methods that provide more thorough testing of the validity and reliability of your scale. Also, keep in mind you may need to test your scale again if you want to use it in a different setting. For example, interpretation of the items may vary when testing your scale in a population with different cultural norms. See ‘Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research’ in Helpful Resources for information on more advanced statistical methods.

Key message: There are a range of methods that may be used to assess evidence for validity (e.g. correlations with existing scales) and reliability (e.g. internal consistency). Administering your scale with other scales during the data collection phase will assist in establishing whether your scale is valid and reliable.


 

So, what now?

By the end of this process, you will have gathered evidence of validity and reliability for your scale. By reporting this evidence, you not only add strength to your study conclusions but provide confidence in the use of your scale by other researchers. For information on reporting this important evidence, see ‘Writing your first research paper: A practical guide for Clinicians’ and Health Measurement Scales: A Practical Guide to Their Development and Use in Helpful Resources.


 

Helpful Resources

  • Beatty PC, Willis GB. Research synthesis: The practice of cognitive interviewing. Public Opin Q 2007; 71: 287–311.
  • Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health 2018; 6: Article 149. doi.org/10.3389/fpubh.2018.00149.
  • Liu S, Mill D, Page A, Lee K. Writing your first research paper: A practical guide for Clinicians. Pharmacy GRIT 2021; Autumn-Winter: 40–4.
  • Streiner DL, Norman GR, Cairney J. Health Measurement Scales: A Practical Guide to Their Development and Use. 5th ed. Oxford: Oxford University Press; 2015.


 

Acknowledgements

The authors appreciate the feedback from Alana Rose and members of the SHPA Research Leadership Committee: Jacinta Johnson, Elizabeth McCourt, and Sanja Mirkov.


 

References

  1. Rickards G, Magee C, Artino AR Jr. You Can't Fix by Analysis What You've Spoiled by Design: Developing Survey Instruments and Collecting Validity Evidence [editorial]. J Grad Med Educ 2012; 4: 407–10.
  2. Streiner DL, Norman GR, Cairney J. Health Measurement Scales: A Practical Guide to Their Development and Use. 5th ed. Oxford: Oxford University Press; 2015.
  3. Holyk G. Questionnaire design. Encyclopedia of Survey Research Methods 2008; 657–60.
  4. Kelley K, Clark B, Brown V, Sitzia J. Good practice in the conduct and reporting of survey research. Int J Qual Health Care 2003; 15(3): 261–6.
  5. statstutor. Scatterplots and correlation in Excel. Ellen Marshall, Sheffield Hallam University and Tanya Waquanika, University of Sheffield; 2017. Available from <sheffield.ac.uk>. Accessed 13 June 2022.