“Psychometrics” is a field of study that deals specifically with psychological measurement. This measurement is done through testing. There are various types of psychometric tests, but most are objective tests designed to measure educational achievement, knowledge, attitudes, or personality traits. In addition to the tests themselves, there is another part of psychometrics that deals with statistical research on the measurements that psychometric tests are attempting to obtain.
Psychological measurements are grounded in classical theories, such as validity and reliability. A valid measure refers to accurate measurements while a reliable measure refers to factual consistency. Both of these concepts can be mathematically interpreted to produce correlating consistencies.
For example, consistency can be measured through the Pearson product-moment correlation coefficient. On the other hand, validity is measured through establishing concurrent validity, which is used to create predictive validity and therefore, constructs validity.
Both reliability and validity may be assessed mathematically. Internal consistency may be assessed by correlating performance on two halves of a test (split-half reliability); the value of the Pearson product-moment correlation coefficient is adjusted with the Spearman-Brown prediction formula to correspond to the correlation between two full-length tests. Other approaches include the intra-class correlation (the ratio of the variance of measurements of a given target to the variance of all targets). A commonly used measure is Cronbach’s α, which is equivalent to the mean of all possible split-half coefficients. Stability over repeated measures is assessed with the Pearson coefficient, as is the equivalence of different versions of the same measure (different forms of an intelligence test, for example). Other measures are also used.
There are a number of different forms of validity. Criterion-related validity refers to the extent to which a test or scale predicts a sample of behavior, i.e., the criterion, that is “external to the measuring instrument itself.” That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity; when the criterion is collected later the goal is to establish predictive validity. A measure has construct validity if it is related to measures of other constructs as required by theory. Content validity is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a job analysis.
Item response theory models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location.
For example, a university student’s knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student’s knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a norm group randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.
For some, the field of psychometrics has controversial aspects relating to the human implications of applied measurement. In part, the controversy involves the very notion of standardized tests. For others, the problematic aspects of psychometrics involve the history of the field, which involves aspects of eugenics. Many psychometricians are also concerned with finding and eliminating test bias from their psychological tests. Test bias is a form of systematic (i.e., non-random) error that leads to examinees from one demographic group having an unwarranted advantage over examinees from another demographic group. According to leading experts, test bias may cause differences in average scores across demographic groups, but differences in group scores are not sufficient evidence that test bias is actually present because the test could be measuring real differences among groups. Psychometricians use sophisticated scientific methods to search for test bias and eliminate it. Research shows that it is usually impossible for people reading a test item to accurately determine whether it is biased or not.
All psychology depends on solid research and correctly interpreted results. Originally, psychology was considered a quasi-science because of the over-reliance on theories and subjective observation. However, modern psychology now almost completely relies on statistical data and research to support theories.
Cognitive psychologists use psychological measurements to assess biological and cognitive processes. On the other hand, behavioral psychologists rely on psychological measurements to quantify human behavior. Psychologists specializing in abnormal mental health problems use psychological measurements to assess their patients, understand mental diseases, establish diagnosis guidelines, and screen new patients. Finally, quantitative psychologists primarily use statistical measurements to perform psychological measurements in different research areas.