Reliability and validity of psychological tests

Validity is one of the basic criteria in psychodiagnostics of tests and techniques that determines their quality, close to the concept of reliability. It is used when you need to find out how well a technique measures exactly what it is aimed at; accordingly, the better the quality under study is displayed, the greater the validity of this technique.

The question of validity arises first in the process of developing the material, then after applying a test or technique, if it is necessary to find out whether the degree of expression of the identified personality characteristic corresponds to the method for measuring this property.

The concept of validity is expressed by the correlation of the results obtained as a result of applying a test or technique with other characteristics that are also studied, and it can also be argued comprehensively, using different techniques and criteria. Different types of validity are used: conceptual, constructive, criterion, content validity, with specific methods for establishing their degree of reliability. Sometimes the criterion of reliability is a mandatory requirement for checking psychodiagnostic methods if they are in doubt.

For psychological research to have real value, it must not only be valid, but also reliable at the same time. Reliability allows the experimenter to be confident that the value being studied is very close to the true value. And a valid criterion is important because it indicates that what is being studied is exactly what the experimenter intends. It is important to note that this criterion may imply reliability, but reliability cannot imply validity. Reliable values may not be valid, but valid ones must be reliable, this is the whole essence of successful research and testing.

Reliability of psychological tests

In ordinary life, the reliability of a person or an object means the confidence that you can rely on it. How do they check that a psychological test can be relied upon?

The first way to check the reliability of a psychological test is to analyze the stability of the test results. Indeed, if the results of using a test on the same sample do not change significantly over several tests, then this can serve as a criterion for its reliability.

Repeated testing is called a retest. It is carried out at intervals from a week to a year. Correlations of several measurements are then analyzed. If the correlation between the results of retests is not lower than 0.76, then such a test is considered reliable.

Disadvantages of test-retest reliability of psychological tests.

1. Some psychological indicators are unstable and changeable. For example, by measuring mood and well-being at different times of the day or on different days, you can get different results, and this will not be a consequence of the unreliability of the test.

2. When completing the same test repeatedly, subjects “get used to” it. They can remember their answers and respond the same way. They may, on the contrary, change their answers in the direction of social desirability. Thus, test-retest reliability will not fully reflect the reliability of the test.

The second way to check the reliability of a psychological test is to analyze the consistency of the various parts of the test. For example, there is one indicator in the test that is diagnosed by 10 questions. The consistency of this test is determined by the high correlation of answers to each question with the overall score on the scale.

Often, to determine the consistency of a psychological test, it is split into two parts. You can do this by selecting questions one at a time. You can separate the first and second half of the dough. Next, the correlations of the answers of the two split parts of the test are analyzed. The higher the correlation, the higher the test's consistency and reliability.

So, the reliability of a psychological test is a characteristic of its formal suitability for diagnosing psychological indicators. For example, if a test for diagnosing anxiety is reliable, this means that if you use it on different samples at different times, you will get similar results. But will these results characterize the anxiety of the subjects? The reliability of a psychological test does not guarantee this. Another indicator is responsible for this - the validity of the psychological test.

Psychometric properties of psychodiagnostic methods

The psychometric basis of any technique is scales. The concept of “scale” is interpreted in a broad and narrow sense: in the first case, the scale is a specific technique, in the second case, it is a measurement scale that records the characteristics being studied. Each element of the technique corresponds to a certain score or index, which forms the severity of a particular mental phenomenon.

Measuring scales are divided into:

Metric: interval, ratio scales.
Non-metric: nominative, ordinal.

Scale name	Explanation, examples
Nominative (scale of names)	Based on a common property or symbol, assigns an observed phenomenon to the appropriate class. The naming scale is the most common in research psychodiagnostic methods. This scale is used, for example, in test questionnaires. The subject's denial or affirmation is compared with the answers in the key. Also, a nominative scale may involve the selection of one or more characteristics from those proposed.
Ordinal	Divides the sum of characteristics into elements based on the “more is less” principle. Thus, it arranges the results in ascending or descending order. An ordinal scale is used in the color choice test. The subject is asked to choose one of the squares on a white background, after which the selected figure is put aside and the procedure is repeated. Result: arranged according to the degree of attractiveness for the tested color. Each figure is assigned its own serial number.
Interval	The elements are ordered not only according to the principle of severity of the measured characteristic, but also on the basis of the distribution of characteristics by size, which is expressed by the intervals between the numbers assigned to the degree of expression of the measured characteristic. Interval scales are often used when standardizing primary test scores.
Relationships	Arranges elements by numerical value, maintaining proportionality between them. Objects are divided according to the property being measured. The numbers that are equated to object classes are proportional to the degree of expression of the properties being studied. Used, for example, to determine the sensitivity thresholds of analyzers. Often used in psychophysics.

After determining the scale used to form the test, it is necessary to determine the coefficient of the psychometric properties of the technique.

These include:

Representativeness.
Standard.
Reliability.
Validity.

Representativeness is a property that extends to a sample of subjects. It can characterize both a population and a general population. Representativeness has two parameters: qualitative and quantitative. The qualitative parameter characterizes the choice of subjects and methods of constructing the sample.

A quantitative parameter is the sample size expressed in numbers.

In psychological research, this property determines the extent to which results can be generalized. For example, relationships between men and women are studied. If we take subjects of different ages (schoolchildren, students, adults, pensioners), then the representativeness of such a sample will be low.

However, if the subjects are approximately the same age and field of activity (only schoolchildren, students, adults, pensioners of both sexes), then the representativeness will be high. In psychodiagnostics, representativeness is used to indicate the possibility of applying a technique to the entire population.

Standardization is a simplification of the methodology, bringing parts of the roadmap and application procedures to uniform standards. PDM should be universal and applicable by different specialists in different situations. If the structure of the PDM deviates from the standards, its results will not be comparable with the results of other studies. Non-standardized methods are used mainly for scientific research.

With their help, new mental phenomena are studied. But this technique cannot be used for psychodiagnostic purposes. Another important parameter of the LDM is reliability. It characterizes the accuracy, stability and stability of the results obtained using a specific technique.

The high reliability of the technique eliminates the influence of extraneous factors and significantly brings the experiment closer to a “pure” one. The criterion of reliability and validity are different concepts. Moreover, reliability is interpreted more broadly than validity: reliability > validity.

For example, on a day off a person gets the opportunity to spend time either fishing or hunting. If he decides to go hunting, but takes a fishing rod with him, then his choice will not be valid. However, if a person went hunting with a gun and it misfired, then the chosen method is unreliable.

Validity of psychological tests

The validity of psychological tests reflects the correspondence of their results to the essence of the measured psychological phenomena. For example, to what extent does the result of an aggressiveness test reflect the real level of aggressiveness of the respondent.

There are two main ways to determine the validity of psychological tests.

The first way to determine the validity of a psychological test involves correlating the test results with similar indicators of other tests. For example, to check the validity of a self-esteem test, you can do the following:

conduct testing of subjects using a new test;
identify the self-esteem of subjects on another test (assuming that it is valid);
calculate the correlation of self-esteem indicators using two psychodiagnostic methods;
a statistically significant correlation will give grounds to talk about the validity of the new test.

This method allows us to identify the so-called construct validity. It reflects the correspondence of the identified psychological indicator to the psychological construct.

The second way to determine the validity of a psychological test involves correlating the test results with external criteria. This validity is called criterion validity of a psychological test.

For example, an indicator of the criterion validity of a test of propensity for deviant behavior can be the actual number of offenses of a teenager. In relation to the test of achievement motivation, the indicator of criterion validity can be the success of performing a particular activity.

What is the validity of the methodology

A methodology, in contrast to a method, is a set of specific actions of a specialist aimed at achieving a corresponding result. The research method may include several techniques. For example, the survey method according to the classification of B. G. Ananyev can be carried out using different test questionnaires.

Validity in psychology is the correspondence of the integrity of the psychodiagnostic method and its individual parts to the mental characteristic being studied.

The PDM may include several scales. For example, a test questionnaire that determines the level of neuroticism-psychopathy consists of the following scales: psychopathization, neuroticism and the “lie” scale. The third measuring scale is used to test the sincerity of the subject. The most common reason for lying is the motivation of approval. This factor greatly distorts statistical and individual data.

A valid PDM is a technique that diagnoses only a narrow range of characteristics specified by the experimenter. It enjoys great confidence among specialists and is used in scientific research. The higher the validity coefficient, the more reliable the data obtained during the experiment.

The relationship between the reliability and validity of psychological tests

The reliability of a test reflects its quality as a diagnostic method, in terms of formal indicators. Without taking into account the meaningful analysis of the results.

Validity evaluates the content of the test results. To what extent do they correspond to real psychological phenomena?

A reliable test may not be valid. For example, a test of initiative may show high test-retest reliability and part consistency. However, from a content point of view, the test results reflect not so much initiative as willpower. That is, the reliability of this test is high, but the validity is low.

In the practice of psychological testing, the reliability of tests using retest. The validity of psychological tests is typically tested by analyzing relationships with scores on other tests that measure similar or similar psychological indicators.

Threats

Validity in psychology is a property of qualitative methodology, but factors may arise that distort a theoretically correctly constructed PDM. Side factors are more pronounced when working with poorly organized stimuli or new, previously unclear tasks for the subject.

The difficulty lies in studying unbalanced and insecure individuals. The main threats to high validity are the special characteristics of the test taker and situational phenomena.

The reliability of the results is reduced by:

test subject's errors;
specialist errors;
errors caused by conditions or incorrect diagnostics.

If the diagnosis does not necessarily require a specialist to be in the room, then his presence may distort the results of the study. Comments and interpretation of test tasks also reduce the reliability of the data obtained.

A subject interested in intentional testing errors or presenting himself in a favorable light to management distorts the diagnostic results. No less dangerous is the psychophysiological state of the person being tested. For example, the individual is very hungry, tired, or suffers from a migraine.

Extraneous noise, voice, and the ability to discuss test tasks with other subjects reduce the accuracy of the results. This applies to errors in diagnostic conditions and procedures.