Back to fundamental research
Before our adventure at the University of Amsterdam Willem had detected that people express themselves in different ways. Due to all the problems at the UvA and the work for the SRF, he could not do much about this issue. Now that he was free again he thought that the most important problem to study was what he has called “Variation in response Functions” (VRF). Convinced by his research with Ton Veugen, Leo van Doorn and Peter Neijens about the better quality of data collected with continuous scales using psychophysical scaling, the research was concentrated on VRF using continuous scales although the same problem also was observed using categorical scales.
Correcting or preventing VRF?
The problem of VRF was also detected by other people. This problem was tackled by dividing the answers by the standard deviations and/or by subtracting the individual mean of the individual responses to reduce the variation in the responses. However, Emiel Bon had in his master thesis shown theoretically and empirically that the first procedure leads to correlations close to 1 between items on the same scale across persons while the second procedure leads to correlations close to zero for the same data. These results suggested that after the data were collected correction was not possible in these simple ways. Therefore theoretical research and experiments were needed to prevent VRF.
The problem of VRF was also detected by other people. This problem was tackled by dividing the answers by the standard deviations and/or by subtracting the individual mean of the individual responses to reduce the variation in the responses. However, Emiel Bon had in his master thesis shown theoretically and empirically that the first procedure leads to correlations close to 1 between items on the same scale across persons while the second procedure leads to correlations close to zero for the same data. These results suggested that after the data were collected correction was not possible in these simple ways. Therefore theoretical research and experiments were needed to prevent VRF.
Preventing VRF
An obvious suggestion is that one has to restrict the freedom of the respondents in their choice of the response scale. In order to show that this indeed improves the situation Willem designed new experiments with two new students, Bas van den Putte and Kees Maas, who had followed Willem´s course on measurement and liked this approach, and also the statistician of the department Harry Seip. In this experiment the respondents were confronted with different instructions: complete freedom (instruction 1), one reference point on the scale (instruction 2) and two references points (Instruction 2). In the last case the scale is fixed completely for all respondents if their response function is linear. In order to avoid the complication that people have different opinions we have used as stimuli the lengths of lines which varied from 0 to 70 mm. The different instructions have been presented below in the three pictures.
An obvious suggestion is that one has to restrict the freedom of the respondents in their choice of the response scale. In order to show that this indeed improves the situation Willem designed new experiments with two new students, Bas van den Putte and Kees Maas, who had followed Willem´s course on measurement and liked this approach, and also the statistician of the department Harry Seip. In this experiment the respondents were confronted with different instructions: complete freedom (instruction 1), one reference point on the scale (instruction 2) and two references points (Instruction 2). In the last case the scale is fixed completely for all respondents if their response function is linear. In order to avoid the complication that people have different opinions we have used as stimuli the lengths of lines which varied from 0 to 70 mm. The different instructions have been presented below in the three pictures.
The consequences of these instructions can be seen in the presentation of the response functions of the different respondents in the graphs for the different instructions to evaluate the lengths of lines.
In instruction 1 a line of 70 mm is evaluated very differently, most responses vary between 0 and 10 but there are also responses as large as 300, 400 and even 1000. On the other hand very small lines were evaluated with numbers close to zero. This seems logical for this experiment. In fact the larger the stimuli (length of the lines) the larger are the differences.
Beware that people realize very well the differences between the lengths of the lines. This was clear from the correlations close to 1 between the responses and stimuli for all respondents. So the large differences in responses is not a measurement error but due to variation in response functions.
Using the second instruction with one reference point the differences in responses for the longest line are already much smaller but there are still two people who give very different responses. If one does not assume that a very small line should get a score close to zero this can still happen.
With an explicit instruction to give the smallest line a score 0 and the largest one the value 900 the VRF problem has nearly completely disappeared. This seemed a promising result. The question remained how this approach can be translated to real social science measurement.
Beware that people realize very well the differences between the lengths of the lines. This was clear from the correlations close to 1 between the responses and stimuli for all respondents. So the large differences in responses is not a measurement error but due to variation in response functions.
Using the second instruction with one reference point the differences in responses for the longest line are already much smaller but there are still two people who give very different responses. If one does not assume that a very small line should get a score close to zero this can still happen.
With an explicit instruction to give the smallest line a score 0 and the largest one the value 900 the VRF problem has nearly completely disappeared. This seemed a promising result. The question remained how this approach can be translated to real social science measurement.
Application in social science research
In order to study this topic further on social science data Willem got the support of Juan Manuel Batista, one of his students in the Summer school in Essex. We chose as topic of research "Job satisfaction" and created items on the basis of 4 aspects of a job: Job security, salary, the work atmosphere and education concord. We created for each aspect a positive and a negative statement. In this way 16 combinations were made like the one below:
In order to study this topic further on social science data Willem got the support of Juan Manuel Batista, one of his students in the Summer school in Essex. We chose as topic of research "Job satisfaction" and created items on the basis of 4 aspects of a job: Job security, salary, the work atmosphere and education concord. We created for each aspect a positive and a negative statement. In this way 16 combinations were made like the one below:
The formulation of the questions was similar as the instructions presented above with no reference points, one reference point or two reference points using line responses and numeric responses. The data collected in the small random sample of the SRF using the Telepanel procedure. The data were collected in three different weeks. Each week a different procedure was presented but to different people in different order. For the analysis we could only use people who did not change their opinion. So this was tested and the people which did not change their opinion were chosen. Next the relationship between their opinions about these work situations and their answers using the 6 different methods were estimated. This means that for all individual respondents the response functions under different conditions were determined. The expectation was that with the number of reference points the variation would reduce. In teresting was also to see the difference between line responses and numeric responses. The results were indeed very clear as can be seen in the table below.
This table clearly shows the effect of the response instructions with respect to the number of reference points. In both response scale types, numbers or lines, one can see more or less the same reduction in variation in the response functions with the number of reference points. It was clear that one should use the measurement procedure with two reference points.
A last experiment
Finally we had to decide how the reference points had to be chosen. In social science research using category scales people use scales like
“very bad, bad, neither bad nor good, good, very good”
or
“strongly agree, agree, neither agree nor disagree, disagree, strongly disagree”
However the opinions of what is “very bad” and “very good” and what is “strongly agree” and “strongly disagree” is not at all clear. Therefore different interpretations of these terms can lead to differences in response functions.
We thought that this difference of opinion should be excluded. The solution could be to use so called “fixed reference points”. Instead of “very good” and “very bad” we suggested “extremely good” and “extremely bad”. For “strongly agree” and “strongly disagree” we suggested “Completely agree” and “completely disagree”. This are fixed reference points because there is no doubt that these terms definitely indicate the endpoints of the scales. In fact “neither good nor bad” is also a fixed reference point because it is the midpoint of the scale.
This experiment Willem did together with another new student Kees de Rooy. In this experiment using line and number responses with two reference points the hypothesis mentioned above was confirmed. The variation in the response functions using fixed reference points was much smaller than with not fixed reference terms.
Finally we had to decide how the reference points had to be chosen. In social science research using category scales people use scales like
“very bad, bad, neither bad nor good, good, very good”
or
“strongly agree, agree, neither agree nor disagree, disagree, strongly disagree”
However the opinions of what is “very bad” and “very good” and what is “strongly agree” and “strongly disagree” is not at all clear. Therefore different interpretations of these terms can lead to differences in response functions.
We thought that this difference of opinion should be excluded. The solution could be to use so called “fixed reference points”. Instead of “very good” and “very bad” we suggested “extremely good” and “extremely bad”. For “strongly agree” and “strongly disagree” we suggested “Completely agree” and “completely disagree”. This are fixed reference points because there is no doubt that these terms definitely indicate the endpoints of the scales. In fact “neither good nor bad” is also a fixed reference point because it is the midpoint of the scale.
This experiment Willem did together with another new student Kees de Rooy. In this experiment using line and number responses with two reference points the hypothesis mentioned above was confirmed. The variation in the response functions using fixed reference points was much smaller than with not fixed reference terms.
Two big mistakes
With his students Willem had done many experiments and had also derived statistical models for the experiments which showed why there was a problem. Then we adjusted the measurement procedure using two reference points and we showed that the problem was largely solved. We could also show in the model why this was so. We were very proud of our work and wanted to publish it in its completeness. The reader with all the experiments was published by the SRF in1986. We were very proud of the book but it was a big mistake because the book has so far reached only 81 citations according to Scholar Google.
Willem also published a journal article about the same topic with the title “A measurement models for psychophysical scaling” in an European Journal (1988). This was also a big mistake because the paper has obtained only 12 citations. We should have published our results in an international journal and not under the name of psychophysical scaling which social scientists don´t recognize as relevant for them.
Our research deserved much more attention because VRF is a fundamental problem in survey research and we provided sufficient evidence for the reduction of the problem. The importance of this topic became clear when King et al. in 2004 published a paper under the title “Enhancing the validity and cross-cultural comparability of measurement in survey research” in the Political Science Review. This paper got so far 958 citations while they did not mention our work even though we discussed the same problem and evaluated partially similar items. They concentrated on categorical scales
Fortunately, our approach using continuous scales with two fixed reference points frequently has been used in the European Social Survey and it was shown in several experiments that these scales have much better quality then categorical scales and continuous scales without fixed reference points.
With his students Willem had done many experiments and had also derived statistical models for the experiments which showed why there was a problem. Then we adjusted the measurement procedure using two reference points and we showed that the problem was largely solved. We could also show in the model why this was so. We were very proud of our work and wanted to publish it in its completeness. The reader with all the experiments was published by the SRF in1986. We were very proud of the book but it was a big mistake because the book has so far reached only 81 citations according to Scholar Google.
Willem also published a journal article about the same topic with the title “A measurement models for psychophysical scaling” in an European Journal (1988). This was also a big mistake because the paper has obtained only 12 citations. We should have published our results in an international journal and not under the name of psychophysical scaling which social scientists don´t recognize as relevant for them.
Our research deserved much more attention because VRF is a fundamental problem in survey research and we provided sufficient evidence for the reduction of the problem. The importance of this topic became clear when King et al. in 2004 published a paper under the title “Enhancing the validity and cross-cultural comparability of measurement in survey research” in the Political Science Review. This paper got so far 958 citations while they did not mention our work even though we discussed the same problem and evaluated partially similar items. They concentrated on categorical scales
Fortunately, our approach using continuous scales with two fixed reference points frequently has been used in the European Social Survey and it was shown in several experiments that these scales have much better quality then categorical scales and continuous scales without fixed reference points.