## Measurement and Statistics

Iowa Testing Programs Research 1966 - 2011

Dissertations: UI Measurement and Statistics Program, 1999-present

Iowa Testing Programs Founder: Everett F. Lindquist (1901-1978) Bibliography

## Iowa Testing Programs Research 1966 - 2011

### Occasional Papers (1973-2011)

1) Coffman, W.E. (1973). Moratorium? What kind? (ERIC document #099408)

2) Coffman, W.E. & Mathews, W.M. (1973) Narrative reports and the Iowa Tests of Basic Skills.

3) Forsyth, R.A. & Feldt, L.S. (1973). On the validity of the I.T.E.D. as an aid in program evaluation. (ERIC document #142558)

4) Gulliksen, H.O. (1973). Applications of psychological scaling methods.

5) Lee, L.P. & Coffman, W.E. (1974). A study of the "I don't know" response in multiple-choice tests. (ERIC document #141371)

6) Novick, M.R. (1975). A course in Bayesian statistics. OUT-OF-PRINT (subsequently published in American Statistician, vol. 29, no. 2).

7) Hood, J., Leslie, L. & Kendall, J.R. (1974). Instructional effectiveness in primary grade reading: A pilot study.

8) Petersen, N.S. & Novick, M.R. (1976). An evaluation of some models for test bias. OUT-OF-PRINT (subsequently published in Journal of Educational Measurement, vol. 13, no. 1).

9) Kendall, J.R. & Hood, J. (1974) Instructional practices and easy-to-change surrounding conditions variables in effective primary reading programs. (ERIC document #123557).

10) Petersen, N.S. (1976). An expected utility model for "optimal" selection. OUT-OF-PRINT (subsequently published in Journal of Educational Statistics, vol. 1, no. 4).

11) Lindley, D.V. (1975). The effect of ethical design considerations on statistical analysis.

12) Lindley, D.V. (1975). The class of utility functions. OUT-OF-PRINT (subsequently published in Annals of Statistics, vol. 4, no. 1)

13) Lindley, D.V. (1975). Probability and medical diagnosis; and The role of utility in decision-making.

14) Lindley, D.V. (1975). Inference for a Bernoulli process: A Bayesian view; and The future of statistics: a Bayesian 21st century.

15) Shigamasu, K. (1976). Development and validation of a simplified m-group regression model. OUT-OF-PRINT (subsequently published in Journal of Educational Statistics, vol. 1, no. 2).

16) Cantor, G.N. (1975). Sex and race effects in the community behavior of upper-elementary-school-aged children. (ERIC document #115658).

17) Forsyth, R.A. (1976). Describing what Johnny can do. (ERIC document #181063).

18) Sedere, M.U. & Feldt, L.S. (1977). The sampling distribution of the Kristof reliability coefficient, the Feldt coefficient, and Guttman's lambda2. (also published in Journal of Educational Measurement, vol. 14, no. 1).

19) Coffman, W.E. (1976). Those achievement tests - what for?

20) Hunt, D.E. (1977). Teachers are psychologists, too: On the applications of psychology to education. (ERIC document #129651).

21) Johnson, S.T. (1977). Self-concept in a school setting: Construct validations by factor analysis. (ERIC document #142573).

22) Coffman W.E. & Shigemasu, K. (1978). Appraising school effectiveness using a Bayesian method. (ERIC document #159220).

23) Coffman, W.E. (1979). Classical test development solutions.

24) Olsen, S.A. (1979). An investigation of possible test floor effects by comparing score distributions for students tested with two different levels of a test battery at four different times over a two-year interval. (ERIC document #175914).

25) Haebara, T. (1979). A method for investigating item bias using Birnbaum's three-parameter logistic model. (ERIC document #185090).

26) Coffman, W.E. & Olsen, S.A. (1980). The first two years of PLAN*: An evaluation of program impact.

27) Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. (subsequently published in Japanese Psychological Research, vol. 22, no. 3) (ERIC document #193300).

28) Mayekawa, S. & Haebara, T. (1980). Estimation of the reliability of a test consisting of more than three congeneric parts. (ERIC document #193302).

29) Laksana, S. & Coffman, W.E. (1980). A comparison of an ANOVA approach and an ICC approach for assessing item bias in an achievement test.

30) Haebara, T. (1981). Least squares method for equating logistic ability scales: a general approach and evaluation. (ERIC document #211609).

31) Gilmer, J.S. & Feldt, Leonard S. (1982). The standard errors of the Feldt-Gilmer congeneric reliability coefficients.

32) Feldt, L.S. & Melican, G.J. (1983). Interval estimation of w2, the proportion of variance associated with a set of fixed treatments.

33) Feldt, L.S., Woodruff, D.J., Salih, F.A. & Srichai, M. (1987). Statistical tests and confidence intervals for Cronbach's coefficient alpha. (also published in Applied Psychological Measurement, vol. 11, no. 1).

34) Coffman, W.E. (1988). Measurement of thinking skills--an historical perspective. (ERIC document #299322).

35) Forsyth, R.A. (1990). The NAEP proficiency scales: Do they yield valid criterion-referenced interpretations? (subsequently published as "Do NAEP scales yield valid criterion-referenced interpretation?" in Educational Measurement: Issues and Practice, vol. 10, no. 3).

36) Waltman, K. K. (1995). An investigation of the use of performance standards to link Iowa statewide ITBS results with results from NAEP.

37) Waltman, K. K. (1995). The use of pre-defined performance standards to establish performance regions on the ITBS--an achievement level setting study.

38) Jordan, R. P. (1996). Searching for information on tests: reference sources and a search strategy. (revision of article of the same title in The Reference Librarian, no. 48, 1995).

39) Omar, M. H. (1996). An investigation into the reasons item response theory scales show smaller variability for higher achieving groups.

40) Brennan, R. L. (1996). Conditional standard errors of measurement in generalizability theory.

41) Brennan, R. L. & Lee, W. (1997). Conditional standard errors of measurement for scale scores using binomia and compound binomial assumptions.

42) Brennan, R. L. (1998). Manual for urGENOVA version 1.3.

43) Lee, W., Brennan, R. L., & Kolen, M. J. (1998). A comparison of some procedures for estimating conditional scale-score standard errors of measurement.

44) Manhart, Jim J. (1998). Gender differences in scientific literacy.

45) Brennan, Robert L. (1999). Manual for mGENOVA, version 1.0.

46) Brennan, Robert L. (1999). Manual for urGENOVA version 1.4.

47) Brennan, Robert L. (1999). Manual for mGENOVA version 2.0.

48) Frisbie, David A. (2001). Checking the Alignment of an Assessment Tool and a Set of Content Standards.

49) Brennan, Robert L. (2001). Manual for urGENOVA version 2.1.

50) Brennan, Robert L. (2001). Manual for mGENOVA version2.1.

51) Frisbie, David A. (2003). Checking the Alignment of an Assessment Tool and a Set of Content Standards. (Revision of Occasional Paper No. 48.)

52) Kim, Seonghoon & Kolen, Michael J. (2005). Methods for Obtaining a Common Scale Under Unidimensional IRT Models: A Technical Review and Further Extensions.

53) Kim, Seonghoon & Feldt, Leonard S. (2011). Comparisons Among Coefficient Alpha and Congeneric-Model-Based Reliability Estimators for Tests Composed of Clusters of Items.

Research Reports (1966-1990)

1) Feldt, L.S. & Forsyth, R.A. (1966). The relationship of I.T.E.D. composite score to the expectations, aspirations, activities and sociological characteristics of Iowa high school students.

2) Hieronymus, A.N. & Stroud, J.B. (1969). Comparability of IQ scores on five widely used intelligence tests.

3) Forsyth, R.A., Hilpert, F.M. & Feldt, L.S. (1970). Norms for class growth on the Iowa Tests of Educational Development.

4) Forsyth, R.A., Feldt, L.S. & Brandenburg, D.C. (1973). Perceptions of Iowa teachers related to the use of I.T.E.D. results by administrators and counselors.

6) Forsyth, R.A. (1976). Readability levels of the reading passages in the I.T.E.D.: Final report (supercedes No. 5).

7) Feldt, L.S. & Melican, G.J. (1976). Tables for determining the tetrachoric correlation coefficient.

8) Forsyth, R.A. (1982). Survey of ITED testing programs in Iowa high schools.

9) Forsyth, R.A. & Allen, N. (1985). Norms for class growth on the Iowa Tests of Educational Development: 1976-1983.

10) Forsyth, R.A. & Becker, D.F. (1990). ITED testing practices of Iowa high schools: 1989 fall testing program.

## Dissertations: UI Measurement and Statistics Program, 1999-present

Kim, Han Yi (2014) A comparison of smoothing methods for the common item nonequivalent groups design

LaFond, Lee James (2014) Decision consistency and accuracy indices for the bifactor and testlet response theory models

Peterson, Jaime Leigh (2014). Multidimensional item response theory observed score equating methods for mixed-format tests

Dockery, Lori (2013). Testing Accommodations for ELL Students on an Achievement Test Battery

Lee, Eunjung (2013). Equating multidimensional tests under a random groups design: a comparison of various equating procedures

Su, Yu-Lan (2013). Cognitive diagnostic analysis using hierarchically structured skills

Topczewski, Anna (2013). Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Wang, Wei (2013). Mixed-format test score equating: effect of item-type multidimensionality, length and composition of common-item set, and group ability difference

Wang, Xuan (2013). Linking across forms in vertical scaling under the common-item nonequvalent groups design,

DenBleyker, John (2012). Comparing Trend and Gap Statistics Across Tests: Distributional Change Using Ordinal Methods and Bayesian Inference

Jarr, Karoline (2012). Education practitioners' interpretation and use of assessment results,

Stephens, Christopher Neil (2012). An Investigation into the Psychometric Properties of the Proportional Reduction of Mean

Westrick, Paul (2012). Validity Decay Versus Validity Stability in STEM and Non-STEM Fields

Andrews, Benjamin (2011). Assessing First- and Second-Order Equity for the Common-Item Non Equivalent Groups Design Using Multidimensional IRT

Castellano, Katherine (2011). Unpacking Student Growth Percentiles: Statistical Properties of Regression-based Approaches with Implications for Student and School Classificationstd

He, Yi (2011). Evaluating Equating Properties for Mixed-format Tests

Liu, Chunyan (2011). A Comparison of Statistics for Selecting Smoothing Parameters for Loglinear Presmoothing and Cubic Spline Postsmoothing under a Random Groups Design

Shin, Seonho (2011). A Comparison of Van der Linden's Conditional Equipercentile Equating Method with other Equating Methods under the Random Groups Design

Wall, Nathan (2011). Augmented Testing and Effects on Item and Proficiency Estimates in Different Calibration Designs

Wang, Chunxin (2011). An Investigation of the Bootstrap Methods for Estimating the Standard Error of Equating und the Common Item Nonequivalent Groups Design

Wood, Scott (2011). Differential Item Functioning Procedures for Polytomous Items When Examinee Sample Sizes Are Small

Brossman, Bradley (2010). Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory

Hagge, Sarah (2010). The Impact of Equating Method and Format Representation of Common Items on the Adequacy of Mixed-format Test Equating Using Nonequivalent Groups

Moore, Joann (2010). Estimating Standard Errors of Estimated Variance Components in Generalizability Theory Using Bootstrap Procedures

Powers, Sonya (2010). Impact of Matched Samples Equating Methods on Equating Accuracy and the Adequacy of Equating Assumptions

Knupp, Tawnya (2009). Estimating Decision Indices Based on Composite Scores

Lai, Emily (2009). Interim Assessment Use in Iowa Elementary Schools

Tao, Shuqin (2009). Using Collateral Information in the Estimation of Sub-Scores -- a Fully Bayesian Approach

Beard, Jonathan (2008). An Investigation of Vertical Scaling With Item Response Theory Using a Multi Stage Testing Framework

Beimers, Jennifer (2008). The Effects of Model Choice and Subgroup on Decisions in Accountability Systems Based on Student Growth

Chien, Yueh-mei (2008). An Investigation of Testlet-Based Item Response Models with a Random Facets Design in Generalizability Theory

Clough, Sara (2008). Computerized Versus Paper-and Pencil Measurement of Socially Desirable Responding: Score Congruence, Completion Time, and Respondent Preferences

Hazen, Timothy (2008). Assessing Information Literacy - The Multiple Narrative Approach

Magda, Tracey (2008). Comparing Trends at Multiple Cut Scores Under NCLB Accountability Policies

Nozawa, Yuki (2008). Comparison of Parametric and Nonparametric IRT Equating Methods Under The Common-Item Nonequivalent Groups Design

Thiessen, Bradley (2008). Relationship Between Test Security Policies and Test Score Manipulations

Zhang, Su (2008). Prior Predictive Checking of Item Response Theory (IRT) Models

Zhao, Xiaohui (2008). Investigation of the Impact of Various Factors on the Validity of Customized Norms

Cho, Youngwoo (2007). Comparison of Bootstrap Standard Errors of Equating Using IRT and Equipercentile Methods with Polytomously-Scored Items under the Common-Item Nonequivalent-Groups Design.

Croft, Michelle (2007). Modified Assessments for the NCLB "Two Percent" Students: Analysis of the Legal Requirements, Psychometric Standards, and Policy

Hou, Jianlin (2007). Effectiveness of the Hybrid Levine Equipercentile and Modified Frequency Estimation Equating Methods under the Common-Item Non-Equivalent Groups Design

Hsieh, Ming-Chuan (2007). An Investigation of a Bayesian Decision-Theoretic Procedure in the Context of Mastery Tests

Kim, Jungnam (2007). A Comparison of Calibration Methods and Proficiency Estimators for Creating IRT Vertical Scales

Li, Dongmei (2007). Models of Individual Growth and School Accountability

Meng, Huijuan (2007). A Comparison Study of IRT Calibration Methods for Mixed-Format Tests in Vertical Scaling

Middleton, Kyndra (2007). The Effect of a Read-aloud Accomodation on Items on a Reading Comprehension Test for Students with Reading-Based Learning Disabilities

Proctor, Thomas (2007). An Investigation of the Effects of Varying the Domain Definition of Science and Method of Scaling on a Vertical Scale

Zhang, Jin (2007). Dichotomous or Polytomous Model? Equating of Testlet-Based Tests In Light of Conditional Item Pair Correlations

Akour, Mutasem (2006). A Comparison of Various Equipercentile and Kernel Equating Methods under the Random Groups Design

Cui, Zhongmin (2006). Two New Alternative Smoothing Methods in Equating: The Cubic B-spline Presmoothing Method and the Direct Presmoothing Method

Hall, Erika (2006). Using Collateral Item and Examinee Information to Improve IRT Item Parameter Estimation

Kim, Hee Kyoung (2006). The Effect of Repeaters on Equating: A Population Invariance Approach

Von Schrader, Sarah (2006). On the Feasibility of Applying Skills Assessment Models to Achievement Test Data

Wan, Lei (2006). Estimating Classification Consistency for Single-administration Complex Assessments Using Non-IRT procedures

Huh, Noo Ree (2005). Group Invariance and Concordance

Kirkpatrick, Jr. (2005). The Effects of Item Format in Common Item Equating

Mao, Xia (2005). An Investigation of the Accuracy of the Estimates of Standard Errors for the Kernel Equating Functions

Vanden Berk, Eric (2005). Improving the Evaluation of Students Through Teacher Training: An Investigation of the Utility of the Student Evaluation Standards

Yi, Hyun sook (2005). A Method for Estimating Classification Consistency of Alternate Forms Under Equating Situations

Pitkin, Angela (2004). The Effect of Multidimensionality on the Distribution of a Person Fit Index

Shin, Ching-Wei David (2004). A Comparison of Methods of Estimating Objective Scores

Skulason, Sigurgrímur (2004). An Investigation of the Validity of the Iowa Early Learning Inventory

Sullivan, Richard (2004). The Rescue of Inactive Items by a Rescoring Scheme of Partial Credit: A Feasibility Study

Tong, Ye (2004). Comparisons of Methodologies And Results in Vertical Scaling for Educational Achievements Tests

Kim, Seonghoon (2003). Unidimensional IRT Scale Linking Procedures for Mixed-format Tests and Their Robustness to Multidimensionality

Widiatmo, Heru (2003). A Simulation and Evaluation of Computerized Adaptive Testing Designs for the Verbal Battery of the Cognitive Ability Test (CogAT)

Yin, Ping (2003). Estimating Reliability of Group Mean Difference Scores in Longitudinal Designs

Al-Mahrazi, Rashid Saif (2002). Investigating a New Modification of the Residual-Based Person Fit Index and It's Relationship with Other Indices in Dichotomous Hem Response Theory

Cumming, Tammie Lea (2002). Reliability Estimation in Complex Simulation Environments

Deng, Hui (2002). An Investigation of Stratified and Maximum Information Item Selection Procedures In Computerized Adaptive Testing

Feng, Wenchu (2002). Applicability of the Jackknife Procedures for Estimating Standard Errors of Variance Component Estimates in Selected Random Effects G Study Designs

Mengeling, Michelle (2002). An Analysis of District and School Variance Using Hierarchical Linear Modeling and Longitudinal Standardized Achievement Data

Cumming, Tammie (2001). Reliability Estimation in Complex Simulation Environments

Hendrickson, Amy (2001). Scaling of Two-stage Adaptive Test Configurations for Achievement Testing

Lei, Pui-Wa (2001). Power Estimation Methods in Structural Equation Modeling

Lin, Chuan-Ju (2001). Comparisons Between Classical Test Theory and Item Response Theory in Automated Assembly of Alternate Test Forms

Monahan, Patrick (2001). The Mantel-Haenszel Procedure for DIF: Alternative Matching Scores to Control Type I Error and Improve Distributional Properties

Perkhounkov, Elena (2001). Modeling the Dimensions of Language Achievement

Bishop, Norman "Scott" (2000). The Validity of ITBS Reading Comprehension Test Scores: Evidence of Generalizability across Different Test Administration Conditions

Chen, Huan-Wen (2000). Calibration of the ITBS Survey Test Battery to the Complete Test Battery: A Comparison of Five Linking Methods

Kim, Dong In (2000). A Comparison of IRT Equating and Beta 4 Equating

Xing, Huilan (2000). An Empirical Evaluation of an Improved Version of the Difference Method for Estimating the Measurement Error Variance at Specific Score Levels

Ban, Jae-Chun (1999). An Investigation of Practitioners' Program Theories in an Evaluation of Minority Teacher Recruitment Programs

Chen, Shu-Ying (1999). A Comparison of Item Selection Rules Including Precision, Content, and Exposure Considerations at the Early Stages of Computerized Adaptive Testing

Etsey, Young Kafui (1999). Teacher Educators' Perceptions of Classroom and Standardized Assessments

Twu, Bor-Yaun (1999). A Comparative Study of Stout's T-Statistic and McDonald's Nonlinear Factor Analysis in Detecting the Departure from Unidimensionality

## Iowa Testing Programs Founder: Everett F. Lindquist (1901-1978) Bibliography

