References

Aranson E, Wilson T, Akert RNew York: Harper Collins; 1994

Bowling AMaidenhead: Open University Press; 2002

Bowling A, Ebrahim SMaidenhead: Open University Press; 2005

Barclay JE, Weaver HB Comparative reliabilities and the ease of construction of Thurstone and Likert attitude scales. Journal of Social Psychology. 1962; 58:109-20

Campbell M, Machin D, Walters SSussex: John Wiley & Sons; 2007

In: Cormack D Oxford: Blackwell Science Ltd; 2000

Dawis R Scale construction. Journal of Counselling Psychology. 1987; 34:(4)481-9

Edelmann R Attitude Measurement cited. In: Cormack D Oxford: Blackwell Science Ltd; 2000

Edwards AL, Kenney KC A comparison of the Thurstone and Likert techniques of attitude scale construction. Journal of Applied Psychology. 1946; 30:(1)72-83

Ferguson L A study of the Likert technique of attitude scale construction. The Journal of Social Psychology. 1941; 13:51-7

Likert R A technique for the measurement of attitudes. Archives of Psychology. 1932; 140:5-53

Mueller DJNew York: Teachers College Press; 1986

Parahoo KHampshire: Palgrave Macmillan; 2006

Petty RE, Cacioppo JTDubuque, IA, USA: William C Brown; 1981

Roberts J, Laughlin J, Wedell D Validity Issues in the Likert and Thurstone Approaches to Attitude Measurement. Educational and Psychological Measurement. 1999; 59:(2)211-33

Stone S, Abbott J, McClung C Paramedic Knowledge, Attitudes, and Training in End-of-Life Care. Pre-Hospital and Disaster Medicine. 2009; 24:(6)529-34

Thomas H IQ, interval scales and normal distributions. Psychological Bulletin. 1982; 91:198-202

Thurstone LL Attitudes can be measured. The American Journal of Sociology. 1928; 26:249-69

Upshaw H The effect of variable perspectives on judgements of opinion statements for Thurstone scale. Journal of Personality and Social Psychology. 1965; 2:(1)60-9

Analysing Thurstone and Likert attitude scales as data collection methods

04 April 2011

Features

02 April 2011

Volume 3 · Issue 5

ISSN (print): 1759-1376

ISSN (online): 2041-9457

Abstract

The development of the paramedic as a health care professional and the movement of paramedic education into the higher education setting has resulted in the need for paramedics and student paramedics to be aware of and understand research methods. This article does not explore or apply the entire research process, as it focuses on a specific part of the research. The article explores and contrasts two different data collection methods used to measure attitudes, one of which will be familiar to most healthcare professionals: the Likert Scale. Less frequently used is the Thurstone method and reasons for this are discussed. The author offers an example how these methods might be used to measure attitudes about the preparedness of paramedics to address end of life care issues.

This article will critically analyse two types of attitude scale as data collection methods. Commonly, measurement of both patients’ and professionals’ assessments of healthcare is based on attitude scales (Cormack, 2000; Bowling, 2002; Parahoo, 2006). There are several methods that have been developed to measure attitudes. However, this article will focus on the Thurstone and Likert methods. In order to give an example of the application of both scales, the relevance of these methods to measure the attitudes of paramedics supporting the suddenly bereaved will be discussed. The author acknowledges that several sources referred to are dated. Nevertheless, they are important primary sources, written when the scales were initially developed.

What are attitudes?

It is first important to define what an attitude is. Edelmann (2000) and Bowling (2002) both define an attitude as a disposition to evaluate a phenomenon in a particular way. Peoples’ attitudes in the context of psychological research do not wax and wane, they are consistent beliefs and feelings about things (Aranson et al. 1994). Bowling (2002) continues to explain that attitudes are usually evaluated in the context of cognitive, evaluative and behavioural components. These different components may or may not be consistent with each other (Edelmann, 2000).

The significance of this might be for example: a paramedic might hold particular beliefs about death and dying (cognitive), he might feel that for instance some patients should not be resuscitated (evaluative), but this would not necessarily influence his clinical practice (behavioural).

Attitude scales

Attitude scales are used extensively in the collection of self-report data in public health and social science research and evaluation (Bowling, 2002; Parahoo, 2006). While there are several variations in these types of scales, typically they involve a statement about the particular attitude being measured—sometimes referred to as the stem and a response arrangement where the respondent is asked to indicate on an ordinal range the extent of agreement or disagreement (Edelmann, 2000; Bowling, 2002).

There is an assumption on the part of the researcher when using either of these methods that the attitudes of the participant can be represented by a numerical score and that each descriptor will mean the same thing to each respondent.

Thurstone scale

When Thurstone developed his ‘method of equal appearing intervals’ in 1928, it was the first defined method of measuring attitudes (Cormack, 2000; Bowling, 2002). The method begins with a chosen attitude object and a wide range of belief statements, both positive and negative, are collected (Bowling, 2002). Belief statements are usually collected from literature, discussions with experts or interviews with people for whom the topic is relevant. In order to construct a scale from these statements, a large panel of ‘judges’ are involved. This can lead to the process being lengthy while responses are being constructed.

The judging panel undergo what is essentially a sorting task. Each person is asked to sort cards with individual statements written on them, into eleven piles, ranking from most positive to negative. Judges are not asked to give their own opinions but are required to estimate the degree of favourableness or unfavourableness expressed by each statement (Edwards and Kenny, 1946; Barclay and Weaver, 1962; Edelman, 2000). The middle pile forms a neutral opinion.

The results are then calculated and each statement is given a score; considering the number of judges in agreement to where each statement was placed in the continuum. Statements that have a poor inter-judge agreement are discarded. The attitude questionnaire can then be produced using statements of each mean value. The scale will constitute 20–40 statements which will be numerically equidistant from the previous or next statement. The statements are placed on the scale in random order (Edwards and Kenny, 1946).

Participants are asked to determine whether they agree, disagree or are neutral towards each attitude statement. The mean value of statements that participants agree or disagree with, given that each statement has a numerical value, can then be calculated (Edwards and Kenny, 1946; Edelmann, 2000; Bowling, 2002). This score reveals whether the participant has a positive or negative attitude toward the topic in question.

Critics assert that while Thurstone’s method was an important development in the field of attitude measurement assumptions were made that placed the validity of the scale in question. The ranking of statements by a panel of judges is not guaranteed to be independent of their beliefs. There is no assurance that the attitudes of the panel do not have a bearing on the resulting score awarded to each statement (Edwards and Kenny, 1946).

Barclay and Weaver (1962) asserted also that Thurstone’s method was time consuming and arduous taking a third more time to construct than a Likert scale. Edelman (2000) concedes that for this reason the Thurstone method is out of favour in modern research practice. However, given the significance of this method in the development of attitude measurement scales it is given place in several modern research texts.

Statements that have a poor inter-judge agreement are discarded. The attitude questionnaire can then be produced using statements of each mean value. Table 1 illustrates an example of how a Thurstone questionnaire might be presented.

Table 1. Hypothetical example of Thurstone questionnaire

This is a hypothetical example of a Thurstone questionnaire. Note how the statements of attitude toward end-of-life care change. A validated scale would contain statements ranked and scored by the panel of experts.Indicate whether you agree (A) disagree (D) or are neutral (leave blank) with the following statements:___I feel adequately prepared to effectively provide care in end-of-life situations

___I am able to appropriately provide support to the bereaved

___ do not feel that I need further education about end of life care issues

___I feel able to make autonomous decision whether to withhold resuscitation

___More education about end-of-life care issues would be beneficial

___The legal implications involved in end-of-life care are concerning

___The ethical issues surrounding end-of-life care mean I am often unsure whether I do the right thing

___I am not adequately prepared to provide effective end-of-life care

___More education is needed about end-of-life care issues

___Quality end of-life care is not an important concern for paramedics

Likert scale

Attempting to shorten this seemingly laborious procedure, Likert (1932) presented a technique which, according to him did not need a judgment group to produce item scale values (Edwards and Kenny, 1946; Barclay and Weaver, 1962; Mueller, 1986). However, it is worth noting that according to Ferguson (1941), Likert originally used a scale that had already been constructed using the Thurstone sifting method in order to develop his scale. This brings into question the initial claim that he did away with the need for a judging panel entirely.

The construction of a Likert scale is similar to that of a Thurstone scale in that attitude statements about a particular phenomenon are collected from relevant sources, usually literature (Edelmann, 2000; Bowling, 2002). These statements need to be carefully phrased without ambiguity, however individual phrases are not given a score and there is no assumption that the difference between each response can be measured equally. The Likert scale can therefore report the order of respondent’s attitudes but does not measure the difference between agreeing and strongly agreeing with a particular statement.

Scoring is usually done on a five point scale with a higher score revealing a more positive attitude. Table 2 shows an example of a Likert questionnaire.

Table 2. Example of Likert Scale questionnaire

From your paramedic training indicate how prepared you are in each of the following knowledge and skill areas?
Knowing when to honour written do not attempt resuscitation orders?
Not at all prepared	poorly prepared	somewhat prepared	well prepared
4	3	2	1
Knowing when not to commence resuscitation?
Not at all prepared	poorly prepared	somewhat prepared	well prepared
4	3	2	1
Understanding potential grief reactions?
Not at all prepared	poorly prepared	somewhat prepared	well prepared
4	3	2	1
Appropriate delivery of death notification?
Not at all prepared	poorly prepared	somewhat prepared	well prepared
4	3	2	1
How important do you think the following knowledge and skill areas are?
Knowing when to honour written do not attempt resuscitation orders?
Not at all important	of little importance	somewhat important	very important
4	3	2	1
Knowing when not to commence resuscitation?
Not at all important	of little importance	somewhat important	very important
4	3	2	1
Understanding potential grief reactions?
Not at all important	of little importance	somewhat important	very important
43	3	2	1
Appropriate delivery of death notification?
Not at all important	of little importance	somewhat important	very important
4	3	2	1

Adapted from: Stone et al (2009)

Half the statements are worded in order for a strongly agree response to be favourable to the issue in question, the other half worded so that a strongly agree response indicates an unfavourable response. The scoring is reversed for these statements (Edwards and Kenney, 1946; Roberts et al. 1999). This method goes some way to check the reliability of the scale, as those respondents who answer strongly positively should answer strongly negatively to the opposing statements.

The overall score for each respondent is reported in Likert scales as opposed to responses to individual statements. This means that respondents may produce the same score from different sets of answers. Therefore, the same score does not necessarily represent the same attitude (Edelmann, 2000). Individual statements might be reported on in a study, these statements are known as Likert items.

However, it was not the intention when Likert first constructed his scale for items to be reported on individually, data obtained are usually summarised using the total scores obtained (Dawis, 1987). Nevertheless, it is common practice for researchers to report on individual questions in order to clarify their data analysis as this will go some way to differentiate when the same scores are obtained from different responses.

Reliability and validity

Attitude scales, like all other data collection tools, need to be checked for reliability and validity. Internal consistency and reliability would be supported if the various individual items correlate with each other, indicating that they belong together in assessing this attitude. In order for an attitude scale to be reliable, all statements and instructions must be unambiguous and understood in the same way by all participants. Few studies have attempted to directly compare the reliability of Thurstone and Likert scales; however those that have directly compared the two methods concluded that Likert scales offered greater reliability (Edwards and Kenney, 1944; Barclay and Weaver, 1962). These studies also concluded that the use of a judging group was not necessary for the construction of a reliable attitude scale.

It is impossible to categorically state which method would be most valid—this would depend on whether the individual tool answered the research question. Validity could be assessed by determining whether the scale can differentiate between groups thought to differ on the attitude in question or by correlations with other reports that are theoretically related to the attitude object (Bowling, 2002).

In regard to his scale, Thurstone (1928) questioned whether judges could rate opinion statements without any bias; indeed a different set of judges might not arrive at the same ratings for statements. However, Upshaw (1965) determined that Thurstone’s methods provided scales that are valid.

Data types

The data commonly measured by Thurstone and Likert scales are of ordinal type. This means that data are ranked according to a certain characteristic but the difference between them cannot be measured accurately (Campbell et al. 2007). Thurstone’s equal appearing scale initially appears to deliver data interval in nature, by assigning an equal distant score between each statement. However, the resultant scores are based on the ranking of statements by a panel of judges and it is difficult to see how attitudes can be given an accurate numerical value. According to Thomas (1982) few, if any, psychological attitude scales are even-interval scales.

Ordinal type data must be analysed using non-parametric statistics. These methods are less authoritative than those developed for use with interval or ratio data. The resulting analysis will provide descriptive data that will summarize and indicate significant points from the results. Parametric analysis should not be performed on ranked order data and inferences to a greater population cannot be made.

However, Bowling (2002) asserts that often some researchers make assumptions about ordinal data (that intervals are equal) and apply parametric statistical analysis, albeit incorrectly. This enables the researchers to make more use of data. However, the resulting conclusions must be questioned if this is done.

Use of the scales

In the context of studying paramedics attitudes towards supporting the suddenly bereaved the decision as to which method is best to use rests with the researcher. Consideration of the aims of the study will be of paramount importance.

There is a paucity of research that considers the attitudes of paramedics to death or bereavement and there appears to be no pre-existing data collection tools specifically for this purpose.

The question remains therefore which of the two methods would be most appropriate to use in developing such a tool.

The Thurstone method might prove to be impractical to use as a large pool of judges would be needed and the specialist nature of the paramedic role means that the judges would need to be practicing paramedics. Using this group of staff to develop the tool would reduce the available population with which to conduct the study. The time consuming nature and lesser reliability of the Thurstone method accounts for the relatively superior popularity of the Likert procedure for attitude measurement in health sciences (Petty and Cacioppo, 1981).

Paramedics would also be familiar with the Likert type of attitude scale given its popularity in health related studies (Bowling, 2002). However, it is worth considering whether this popularity has bred ‘Likert scale contempt’, in that there are so many attitude scale questionnaires, respondents might not read, assimilate and answer truthfully, or they may use an automatic response. This would mean that data obtained would not reflect true attitudes.

It is important to consider that measuring attitudes can be extremely difficult and respondents may not always be truthful with their responses (Parahoo, 1997). Paramedics may wish to hide their true beliefs and attitudes. This would significantly affect the data collected.

The context within which the scale is administered might be important. In the case of assessing paramedics attitudes toward death, it might be important for the questionnaire not to be given to them by their managers during shift time as there may be concerns that the data obtained might be used against the paramedic. One of the limitations of both methods is the presence of the neutral or undecided response option. Respondents might use this option in order to conceal any extreme beliefs that they feel might be unpopular.

However the number of alternatives is often manipulated, it is not unusual to see more response categories and some researchers remove the neutral category all together (Edelmann 2000). These are issues that will need to be addressed by the researcher in their design of scale and beyond the scope of this piece to discuss fully.

Conclusion

Both of the attitude scales discussed are valid, reliable data collection methods that would be appropriate to investigate the attitudes of paramedics, but neither without weakness. Considering the literature, it appears that Likert is a superior tool between the two methods, however there are other techniques that warrant consideration before deciding which to use for the purpose of the research in question.

Key points

Attitudes are consistent beliefs and feelings about a phenomenon and are widely measured in healthcare research.

Both Thurstone and Likert methods measure ordinal data. Data is ranked, but the distance between them cannot be measured.

The Thurstone method involves the collection of a variety of different statements about a phenomenon which are ranked by an expert panel in order to develop the questionnaire. This can be a lengthy process and the validity of the method has been questioned.

The Likert scale is more widely used and studies have shown that it offers greater reliability. However there are potential problems with the reporting of results if the scale is poorly constructed.

The prolific use of attitude scales might negate their effectiveness. It is worth considering whether respondents complete them accurately.

Attitude measurement scales

Data collection

Likert Scale

Thurstone method