1 REPORT DATE DD MM YYYY 2 REPORT TYPE 3 DATES COVERED From To
Dec 2005 Technical Paper
4 TITLE AND SUBTITLE 5a CONTRACT NUMBER
Tests of Cognitive Ability
9 SPONSORING MONITORING AGENCY NAME S AND ADDRESS ES 10 SPONSOR MONITOR S ACRONYM S
Air Force Materiel Command AFRL HECV
Air Force Research Laboratory
Human Effectiveness Directorate 11 SPONSOR MONITOR S REPORT
Warfighter Interface Division NUMBER S
Wright Patterson AFB OH 45433 7022
12 DISTRIBUTION AVAILABILITY STATEMENT
Approved for public release distribution is unlimited.
13 SUPPLEMENTARY NOTES
This is a book chapter Clearance No AFRL WS 05 2719 1 Dec 05.
14 ABSTRACT
This chapter consists of six parts Part one briefly reviews the historical foundation of the concept of cognitive ability and early attempts to measure it Part two reviews modern theories of the structure of cognitive ability and the emergence of the concept of general cognitive ability Next part three introduces the concepts of specific abilities knowledge and noncognitive traits Part four discusses psychometric characteristics of tests including reliability and validity Part five reviews the issues to be considered when deciding whether to choose from among commercially available tests or develop a test Example questions to help in test construction are provided The sixth and final part is a general summary
15 SUBJECT TERMS
Cognitive ability knowledge
Prescribed by ANSI Std 239 18,TESTS OF COGNITIVE ABILITY. Malcolm James Ree,Our Lady of the Lake University San Antonio Texas. Thomas R Carretta1, Air Force Research Laboratory Wright PattersonAir Force Base Ohio. This chapter consists of six parts Part one briefly reviews the historical. foundation of the concept of cognitive ability and early attempts to measure it. Part two reviews modem theories of the structure of cognitive ability and the. emergence of the concept of general cognitive ability Next part three introduces. the concepts of specific abilities knowledge and noncognitive traits Part four. discusses psychometric characteristics of tests including reliability and validity. Part five reviews the issues to be considered when deciding whether to choose. from among commercially available tests or develop a test Example questions to. help in test construction are provided The sixth and final part is a general. HISTORICAL FOUNDATIONS, The concept of cognitive ability can be traced back over 2 500 years. Zhang 1988 reported that in the sixth century BC the great Chinese philosopher. Confucius divided people into three groups based on intelligence people of great. wisdom people of average intelligence and people of little intelligence Another. Chinese philosopher Mencius fourth century BC likened intellectual. measurement to measurement of physical properties Within a century the Han. dynasty 202 BC 200 AD had heeded Confucius and Mencius and implemented. a system of civil service tests in China, In the fourth century BC Aristotle made a distinction between ability.
dianoia and emotional and moral capacity orexis Zhang 1988 reported on. the custom of testing children at one year of age beginning in the sixth century AD. in China particularly in southern China This was described in the writings of. Yen 531 590 AD Zhang 1988 also noted that the use of puzzles to test. cognitive ability was popularized during the Song dynasty 960 1127 AD One. example consisted of several geometric shapes that could be manipulated and fit. into a variety of designs The test was designed to measure creativity divergent. thinking and visual spatial perception Another popular Chinese puzzle test. designed to measure reasoning ability consisted of interconnected copper rings. TESTS OF GENERAL COGNITIVE ABILITY 3, mounted on a bar with a rod running through their center The goal of the test was. to remove the bar from the center of the rings, In the west the examination of human cognitive abilities was taken up by. religious philosophers In the 16 th century AD Descartes the French secular. philosopher regarded ability as res cogitans the thing that thinks. In 1575 Juan Huarte published in Spanish Peir6 Munduate 1994 a. treatise on work and human ability called Examen de Ingenios It was later. published in English as The examination of men Is wits Discovering the great. differences of wits among men and what sort of learning suits best with each. The modem scientific study of human cognitive abilities however is often. attributed to Binet in France and to the World War I Army Alpha and Beta tests in. GENERAL COGNITIVE ABILITY, The English polymath Sir Francis Galton 1869 invented the construct of. general cognitive ability calling it g as shorthand Charles Spearman 1927 1930. made the concept of g more accessible to psychology through his two factor. theory of human abilities which proposed that every measure of ability had two. components a general component g and a specific component s. While the general component was measured by every test the specific. component was unique to each test Though each test might have a different. specific component Spearman also observed that s could be found in common. across a limited number of tests Therefore the two factor theory allowed for a. spatial factor or other factor that was distinct from g but could be found in several. tests These factors shared by tests were called group factors Spearman 1927. identified several group factors and noted Spearman 1937 that group factors. could be either narrow or broad He further observed that s could not be measured. without measuring g As we have written elsewhere Ree Carretta 1996 1998. To be accurate we should call mathematics not M but. with g written large to indicate its contribution to the variance of. the factor Ree Carretta 1996 p 113, In fact tests that do not even appear to measure g do so as illustrated by. Rabbitt Banerji and Szymanski 1989 who demonstrated a strong. TESTS OF GENERAL COGNITIVE ABILITY 5, correlation 69 between Space Fortress a psychomotor task that looks.
like a video game and an IQ test, Controversy about g has not abated despite Spearman s early assertion. 1930 that g was beyond dispute In contrast to Spearman s model Thurstone. 1938 proposed a multiple ability theory Thurstone allowed no general factor. only seven unrelated abilities that he called primary Spearman 1938. reanalyzed Thurstone s data noting that g had been submerged through rotation. He then demonstrated the existence of g in Thurstone s tests This finding was. independently confirmed by Holzinger and Harmon 1938 and finally by. Thurstone and Thurstone 1941 Despite empirical evidence theories of multiple. abilities held sway Fleishman Quaintance 1984 Gardner 1983 Guilford. 1956 1959 Steinberg 1985 This was particularly true in psychometrics where. these theories lead to the construction of numerous multiple ability tests such as. the Differential Aptitude Test General Aptitude Test Battery Armed Services. Vocational Aptitude Battery Air Force Officer Qualifying Test Flanagan. Aptitude Tests Flanagan Industrial Tests and others Cleaving to the empirical. data other researchers continued to study g Arvey 1986 Gottfredson 1986. 1997 Gustafsson 1980 1984 1988 Jensen 1980 1993 1998 Schmidt. Hunter 1998 2004 Thomdike 1986 Vernon 1950 1969, Fairness and Similarity Near Identity of Cognitive Structure. There are several issues that must be addressed when measuring ability in. sex and ethnic groups One of these is that the same factors should be measured. for all groups McArdle 1996 among others has advocated that factorial. invariance i e equality of factor loadings should be demonstrated before other. group comparisons e g mean differences are considered McArdle stated that if. factorial invariance is not observed the psychometric constructs being measured. may be qualitatively different for the groups being compared obscuring the. interpretation of other group comparisons, Several studies of cognitive factor similarity have been conducted. Comparing the factor structure of World War II U S Army pilot selection tests. for Blacks and Whites Michael 1949 found virtually no differences Humphreys. and Taber 1973 also found no differences when they compared factor structures. for high and low socio economic status boys from Project Talent Although the. ethnicity of the participants in Project Talent was not specifically identified they. expected that the ethnic composition of the two groups would differ significantly. Using 15 cognitive tests DeFries Vandenberg McClearn Kuse Wilson. Ashton and Johnson 1974 compared the structure of ability for Hawaiians of. either European or Japanese ancestry They found the same four factors and nearly. identical factor loadings for the two groups,TESTS OF GENERAL COGNITIVE ABILITY 7. These studies all examined common factors Using a hierarchical model. Ree and Carretta 1995 examined the comparative structure of ability across sex. and ethnic groups They observed only small differences on the verbal math and. speed factors No significant differences were found for g on ability measures. Carretta and Ree 1995 made comparisons of aptitude factor structures in. large samples of young Americans The factor model was hierarchical including g. and five lower order factors representing verbal math spatial aircrew. knowledge and perceptual speed The model showed good fit and little difference. for both sexes and all five ethnic groups White Black Hispanic Asian. American and Native American Correlations between factor loadings for the. sex groups and for all pairs of ethnic groups were very high approaching r 1 0. Comparisons of regression equations between pairs of groups indicated that there. was no mean difference in loadings between males and females or among the. ethnic groups These and previous findings present a consistent picture of near. identity of cognitive structure for sex and ethnic groups. Predictive Fairness, Several researchers have conducted studies of predictive fairness of.
cognitive ability tests Jensen 1980 noted that numerous large scale studies. provided no evidence for predictive unfairness He concluded that predictive bias. did not exist although intercept differences could be observed and were likely due. to sampling error or differences in reliability for the two groups p 514. Putting a finer point on it Carretta 1997 demonstrated that even when. intercept differences were observed in statistical tests of differences of regression. equations for two groups the differences were due solely to differing reliability. found in the two groups, Hunter and Schmidt 1979 investigated 39 studies of Black White validity. and found no evidence of differential prediction for the groups Schmidt and. Hunter 1982 illuminated pitfalls in assessing the fairness of regressions using. tests of differences in regression linear models In these two studies Hunter and. Schmidt concluded that artifacts accounted for the apparent differential prediction. and that no predictive bias was present Carretta 1997 and Jensen 1980. provide clear statistical explanations of the issues. In sum no evidence exists that cognitive ability tests are unfair. SPECIFIC ABILITY KNOWLEDGE AND NONCOGNITIVE TRAITS. The measurement of specific abilities knowledge and noncognitive traits. often has been proposed as crucial for understanding human characteristics and. occupational performance Ree and Earles 1991 have demonstrated the lack of. predictiveness for specific abilities while Ree and others Olea Ree 1994 Ree. TESTS OF GENERAL COGNITIVE ABILITY 9, Carretta Doub 1998 1999 Ree Carretta Teachout 1995 Ree Earles. Teachout 1994 demonstrated the predictiveness ofjob knowledge. McClelland 1993 for example suggested that under some circumstances. noncognitive traits such as motivation may be better predictors ofjob performance. than cognitive abilities Sternberg and Wagner 1993 proposed the use of. measures of tacit knowledge and practical intelligence in lieu of measures of. academic intelligence They define tacit knowledge as the practical know how. one needs for success on the job p 2 Practical intelligence is defined as a more. general form of tacit knowledge Schmidt and Hunter 1993 in a review of. Sternberg and Wagner note that their concepts of tacit knowledge and practical. intelligence are redundant with the well established construct ofjob knowledge. Additionally Ree and Earles 1993 pointed out the lack of rigorous empirical. evidence to uphold the assertions of McClelland Sternberg and Wagner as well. as other critics, The construct of Emotional Intelligence Goleman 1995 has been. proposed as another facet that is more important than ordinary cognitive ability. Although its proponents e g Mayer Salovey Caruso 2002 consider it to be a. distinct construct Schulte Ree and Carretta 2004 have demonstrated that it is. not much more than a combination of the existing constructs of cognitive ability. and personality,PSYCHOLOMETRIC CHARACTERISTICS OF MEASURES OF. COGNITIVE ABILITY, Courses in statistics and research methods are common for human.
resources personnel specialists and there are established guidelines for conducting. studies of personnel measurement and selection American Psychological. Association American Educational Research Association National Council on. Measurement in Education 1999 Society for Industrial Organizational. Psychology 2003 Reliability and validity are two core concepts that must be. considered whether choosing a commercial test or developing a test. Reliability, Reliability is best defined as precision of measurement that is how much. of the measurement is true and how much is error In this statistical context. error does not mean wrong but random fluctuation An error has not been. committed rather random fluctuation happens perforce and cannot be avoided. although it can be minimized From this basic definition flow the other popular. definitions of reliability such as stability over time and consistency across test. forms as well as internal consistency Stability over time typically is measured by. retesting people after a period of time to ensure that their scores are consistent. i e test retest reliability Stability across test forms measuring the same. TESTS OF GENERAL COGNITIVE ABILITY 11, construct s is referred to as alternate form reliability Internal consistency is. measured by assessing the extent to which items are correlated with each other. e g correlating odd items with even items or split half reliability or coefficient. alpha All three of these indices of reliability are typically measured using. correlations or approximations to correlations Although correlations usually. range from 1 0 to 1 0 a reliability coefficient is a ratio of true variance to total. Two widely used cognitive ability tests are the Wonderlic Personnel Test. and the Watson Glaser Critical Thinking Appraisal According to research cited in. the Wonderlic PersonnelTest ScholasticLevel Exam User sManual the test. retest reliability ranges from 82 to 94 alternate form reliability ranges from 73. to 95 and split half reliabilities range from 88 to 94 Similarly high levels of. reliability are noted in the Watson Glaser CriticalThinking AppraisalManual. Form S Test retest reliability was 81 for a sample of 42 employees and internal. consistency reliabilities ranged from 66 to 87 in a wide variety ofjobs The data. from these two well known and frequently used tests shows that cognitive ability. is a reliably measured construct, For a test to be reliable there must also be consistent administration. consistent collection of answers and objective scoring Test administration. procedures must not vary from examinee to examinee and the data collection. methods must be consistent For example Ree and Wegner 1990 showed that. apparently minor changes in machine scored answer sheets could produce major. changes in tests scores particularly in speeded tests This issue looms larger as we. consider placing our test for the selection of applicants on a computer where the. presentation could vary by screen size contrast and font type Additionally when. different administration modes or response collection are necessary it is essential. to develop statistical corrections for the scores Carretta Ree 1993 The use of. tests of poor reliability to make decisions about excluding applicants especially. applicants near the minimum cutting point from a training program is bad. practice and may lead to indefensible consequences in cour t should a legal. challenge arise, Scoring must be objective A correct answer must be counted correct by all. scorers To deviate from this will cause scores to vary by who did the scoring and. will reduce reliability of the test leading to reduced validity and possibly an. indefensible position in court This is less of a problem for a multiple choice test. where the answer is presented and must be identified from among answers. presented It is more of a problem for an essay type exam where the answer must. be produced and evaluated, General cognitive ability can be reliably measured through several.
methods Because it is the greatest source of variance in cognitive tests it is. TESTS OF GENERAL COGNITIVE ABILITY 13, relatively easy to get acceptable reliability by careful item construction and by. adding items However as Thompson 2003 has pointed out the reliability to be. considered is the reliability in the sample currently being investigated not that. from previous test administrations or the normative sample. It is important to evaluate score reliability in all emphasis in. original studies because it is the reliability of the data in hand that. will drive study results and not the reliability of the scores. described in the test manual Thompson 2003 p 5, The important question about validity is whether a test measures what it. claims to measure Although it is convenient to distinguish several types of. validity the argument can be made that all validity studies are really construct. validity studies If the test can be shown to be valid it is shown to be measuring. the construct and therefore construct validity is bolstered. However a caveat must be offered here A measure can have predictive. validity where it is assumed that it measures a certain construct but in fact. measures a different construct For example Walters Miller and Ree 1993 in a. validation of a structured pilot candidate selection interview reported validity for.

