A surprising discovery

A fundamental assumption of all survey research is that people who have the same opinion express this opinion in an interview in the same way, except for random errors. When that would not be the case their opinions could not be compared and it would lead to problems in the analysis because a part of the differences in responses between people can´t be explained. One would think that this is such a fundamental issue for survey research that it would have been studied, but that was not the case. The problem is of course that we know the answers of the respondents but not their opinions. Thus the problem can´t be studied so easily.
Milton Lodge of the University of New York in Stony Brook did experiments to determine the numeric values to be attached to labels of categorical scales like: excellent, very good, good, neither good nor bad, bad, very bad and awful. He had reported quite large variations in the responses in numbers for the different labels. That made me curious what was happening. We knew each other because we had met before talking about the use of continuous scales. Irmtraud and I were interested to go once more to the US. So we made an appointment to meet him in Stony Brook.

The thanksgiving dinner for the Indians

The first surprise
The first surprise was that he invited us to the house of his parents in New York for the Thanks- giving dinner. This Thanksgiving dinner was yearly organized to memorize that the Indians helped the first settlers to survive the first winter in the New World. The tradition is to invite some Indians for this dinner. This year we were for them the “Indians” they invited. For us it was a very nice surprise and we enjoyed this special invitation.

The second surprise
The next surprise came when I started to look at the data. The numeric judgments of the labels varied indeed considerably but more surprising was to me that the judgements of the labels correlated with each other: persons who scored higher on one label scored also higher on other labels or the numeric values were on both labels lower. We thought that there could be three different explanations for these correlations:

The people have different opinions about these labels
The people express themselves differently in numbers
Both phenomena occur at the same time

Whatever was the reason for these correlations, this phenomenon suggested that the responses of the people are not comparable with each other neither in the original labels nor on the number scale or in both.

The third surprise
Then we looked at the literature whether this issue ever was studied in the context of survey research. We could not find any paper discussing this issue. This is the more surprising because all of us probably realize that some people talk in overstatements and others in understatements. We probably correct for these differences automatically in our mind, however, it seems that in survey research this issue was not yet studied.

Then we looked in the literature on scaling. There were indeed studies done that suggested that the charaxteristic of a person on the scale used will determine the judgements. For example a person of 20 years thinks that people of 40 are old but a person of 40 will see people of 50 as old. So the label "old" describes different persons depending on the own age of the person who makes the evaluation. However this could not be the explanation for the differences and correlations we found in Milton´s data.

Psychophysical scaling
Then we asked David Cross, a colleague of Milton who had worked in the past with S.S. Stevens, whether he knew any studies in psychophysics that suggested variation in responses. David Cross mentioned indeed that psychophysical studies suggested that the choice of the standard that was used in the experiments as comparison stimulus and the value attached to this standard (modulus) affected the judgments and therefore they always worked with the means of the judgments across persons and not with individual curves.

Different curves by Poulton
Next, I had several interesting discussions with David Cross about this phenomenon and the way of modelling it. But we could not formulate a good measurement model without new data. In the data of Milton Lodge the systematic differences in judgments could be due to differences about the perception of the labels or the use of the numbers. We needed data where the perceptions were undoubtedly the same. Then we could test whether people differ in their use of the labels and/or differ in the way they use the number scales.

Shuttle diplomacy by KIssinger

A fundamental problem required further research
For me the situation was clear. New research was needed to study this fundamental issue of survey research. If people indeed express themselves differently if they have the same opinion than this is a fundamental problem of survey research. Alert to this issue I detected a little bit later an article of a journalist, Valeriani, who followed Henry Kissinger in his shuttle diplomacy in the Middle East and noted the following translation of Kissingers remarks by the Israeli:

If Kissinger said The Israeli “translated” that as
Suicidal difficult to obtain
Impossible unlikely
Difficult to get achievable
I will see what I can do I got that concession a long time ago

<<go to the some succesful years

Go to the next story>>