Research & Expertise

This page presents research supported by 日本素人 (MI) as well as unaffiliated research conducted by current MI staff. These publications and presentations illustrate our staff's wide-ranging expertise in the field of educational measurement.

Featured Research

Industry-leading, open-access publications authored by our staff.

Wilson, J., Zhang, S., Palermo, C., Cruz Cordero, T., Zhang, F., Myers, M., Potter, A., Eacker, H., & Coles, J. (2024). A latent dirichlet allocation approach to understanding students' perceptions of automated writing evaluation. Computers and Education Open.

Abstract: Automated writing evaluation (AWE) has shown promise in enhancing students' writing outcomes. However, further research is needed to understand how AWE is perceived by middle school students in the United States, as they have received less attention in this field. This study investigated U.S. middle school students' perceptions of the MI Write AWE system. Students reported their perceptions of MI Write's usefulness using Likert-scale items and an open-ended survey question. We used Latent Dirichlet Allocation (LDA) to identify latent topics in students' comments, followed by qualitative analysis to interpret the themes related to those topics. We then examined whether these themes differed among students who agreed or disagreed that MI Write was a useful learning tool. The LDA analysis revealed four latent topics: (1) students desire more in-depth feedback, (2) students desire an enhanced user experience, (3) students value MI Write as a learning tool but desire greater personalization, and (4) students desire increased fairness in automated scoring. The distribution of these topics varied based on students' ratings of MI Write's usefulness, with Topic 1 more prevalent among students who generally did not find MI Write useful and Topic 3 more prominent among those who found MI Write useful. Our findings contribute to the enhancement and implementation of AWE systems, guide future AWE technology 日本素人, and highlight the efficacy of LDA in uncovering latent topics and patterns within textual data to explore students' perspectives of AWE.

Cruz Cordero, T., Wilson, J., Myers, M., Palermo, C., Eacker, H., Potter, A., & Coles, J. (2023). Writing motivation and ability profiles and transition after a technology-based writing intervention. Frontiers in Psychology鈥擡ducational Psychology, 14.

Abstract: Students exhibit heterogeneity in writing motivation and ability. Profiles based on measures of motivation and ability might help to describe this heterogeneity and better understand the effects of interventions aimed at improving students' writing outcomes. We aimed to identify writing motivation and ability profiles in U.S. middle-school students participating in an automated writing evaluation (AWE) intervention using MI Write, and to identify transition paths between profiles as a result of the intervention. We identified profiles and transition paths of 2,487 students using latent profile and latent transition analysis. Four motivation and ability profiles emerged from a latent transition analysis with self-reported writing self-efficacy, attitudes toward writing, and a measure of writing writing: Low, Low/Mid, Mid/High, and High. Most students started the school year in the Low/Mid (38%) and Mid/High (30%) profiles. Only 11% of students started the school year in the High profile. Between 50 and 70% of students maintained the same profile in the Spring. Approximately 30% of students were likely to move one profile higher in the Spring. Fewer than 1% of students exhibited steeper transitions (e.g., from High to Low profile). Random assignment to treatment did not significantly influence transition paths. Likewise, gender, being a member of a priority population, or receiving special education services did not significantly influence transition paths. Results provide a promising profiling strategy focused on students' attitudes, motivations, and ability and show students' likeliness to belong to each profile based on their demographic characteristics. Finally, despite previous research indicating positive effects of AWE on writing motivation, results indicate that simply providing access to AWE in schools serving priority populations is insufficient to produce meaningful changes in students' writing motivation profiles or writing outcomes. Therefore, interventions targeting writing motivation, in conjunction with AWE, could improve results.

White Papers

Our collection of white papers, written by our own industry experts and psychometricians, include the latest industry research and best practices.

The Quest for Consistency: Double-Scoring Policies and Impacts on Fairness

by Corey Palermo, Ph.D. (April, 2023)
"Double-scoring a proportion of responses serves the goal of measuring scoring consistency, but when the process includes third readings or resolutions score comparability is undermined. Alternatives that ensure scoring quality and fairness should be a priority in assessment programs."

A Gentle Introduction to Automated Scoring

by Corey Palermo, Ph.D. (October, 2017)
"Automated scoring also offers a variety of benefits for assessment of learning. One benefit is that it is much faster than scoring by teachers or professional raters; once models have been generated, responses can be scored in seconds. This allows assessment results to be available to stakeholders very rapidly. A second benefit is that automated scoring tends to be as accurate or more accurate than multiple professional raters. Furthermore, automated-scoring engines are perfectly reliable in ways that raters are not鈥攁n automated-scoring engine will assign the same score to a response every time."

White Paper: PEG Changes

by Michael B. Bunch, Thomas Davis, Ann Hayes, Derek Justice, Julie St. John (July, 2017)
"MI continues to monitor advancements in the automated essay scoring field while searching for ways to make PEG as effective as possible in helping students learn to write. As a result, PEG will be ever-evolving."

The Case for Professional Learning Communities

by Tina B. Clayton (2017)
"A Professional Learning Community (PLC) is a small group of professionals who continuously seek cutting-edge ideas and collaboratively evaluate how to best apply the new information to the work. The PLC operates under the assumption that to stay ahead of the competition, an organization must learn faster than the competition and consistently produce exceptional work."

The Future of Testing

by Michael B. Bunch, Ph.D (2013)
"The future still looks a lot like it did 25 years ago: cognitive-based assessment, online assessment, widespread use of computer adaptive testing, universal access to technology, and instantaneous reporting of test results. So many wonderful things, still within our view but just beyond our grasp!"

It Takes Three

by Michael B. Bunch, Ph.D (2012)
"Making sure all students are college and career ready requires not only an alignment of curriculum and instruction with college and career requirements but also an approach to monitoring student progress on a continual basis, with in-class formative assessments, frequent interim assessments, and focused summative assessments. Taken together, formative, interim, and summative assessments, aligned to Common Core State Standards (CCSS), will support instructional decision making and enhance daily learning activities.”

Aligning Curriculum, Assessment, and Instruction

by Michael B. Bunch, Ph.D (2012)
"A key component of educational achievement test validation is alignment of the test to both curriculum and instruction. By alignment, we mean the degree to which the items of the test, both individually and collectively, match the structure and intent of the curriculum and instruction."

Publications

Peer-reviewed scholarly works by our staff.

2024

Palermo, C., & Wibowo, A. (2024). Automated essay evaluation at scale: Hybrid automated scoring/hand scoring in the summative assessment program. In M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation.

2023

Cui, Z. & He, Y. (2023). Practical considerations in choosing an anchor test form for equating under the random groups design. Measurement: Interdisciplinary Research and Perspectives, 2, 101-113.

2021

Clauser, B. E., & Bunch, M. B. (Eds.). (2021). The history of educational measurement: Key advancements in theory, policy, and practice. Routledge.

Jiang, N., Rogers, B., Fan, X., Hu, X., Lewis, A., Cai, B. (2021). School-level factors related to visual arts achievement for fourth graders: a longitudinal analysis. Studies in Art Education, 62(1), 47-62.

Palermo, C. (2022) Rater characteristics, response content, and scoring contexts: Decomposing determinates of scoring accuracy. Frontiers in Psychology, 13:937097.

2020

DiStefano, C., & Jiang, N. (2020). Applying the Rasch rating scale method to questionnaire data. In M. Khine (Ed.), Rasch measurement. Springer.

Fan, X., Jiang, N., & Lewis, A. (2020). Factors associated with fourth graders' music knowledge assessed by SCAAP. International Journal of Music Education, 38(4), 644-656. https://doi.org/10.1177/0255761420926664

He, Y. & Cui, Z. (2020). Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Applied Psychological Measurement, 44, 296-310.

Wang, W., Chen, J. & Kingston, N. (2020). How well do simulation studies inform decisions about multistage testing? Journal of Applied Measurement, 21(3), 271-281. PMID: 33983899.

2019

Liu, J., Burgess, Y., DiStefano, C., Pan, F., & Jiang, N. (2019). Validating the Pediatric Symptoms Checklist-17 in the preschool environment. Journal of Psychoeducational Assessment, 38(4), 460-474.

Murray, A. K., Daoust, C. J., & Chen, J. (2019). Developing instruments to measure Montessori instructional practices. Journal of Montessori Research, 5(1), 50-87.

Palermo, C., Bunch, M., & Ridge, K. (2019). Scoring stability in a large-scale assessment program: A longitudinal analysis of leniency/severity effects. Journal of Educational Measurement, 56(3), 626-652.

Palermo, C., & Thomson, M. M. (2019). Large-scale assessment as professional 日本素人: Teachers' motivations, ability beliefs, and values. Teacher Development, 23(2), 192-212.

2018

Chen, J. (2018). KR-20. In B. Frey (Ed.), Encyclopedia of educational research, measurement, and evaluation. Sage Publishing.

Chen, J. (2018). Interstate School Leaders Licensure Consortium (ISLLC) standards. In B. Frey (Ed.), Encyclopedia of educational research, measurement, and evaluation. Sage Publishing.

Chen, J. & Perie, M. (2018). Comparability with computer-based assessment: Does screen size matter? Computers in the Schools, 35(4), 268-283

Cui, Z., Liu, C., He, Y., & Chen, H. (2018). Evaluation of a new method for providing full review opportunities in computerized adaptive testing鈥擟omputerized adaptive testing with salt. Journal of Educational Measurement, 55(4), 582-594.

2017

DiStefano, C., Liu, J., Jiang, N., Shi, D. (2017). Examination of the weighted root mean square residual: Evidence for trustworthiness? Structural Equation Modeling: A Multidisciplinary Journal, 25(3), 453-466.

2016

Bunch, M. B., Vaughn, D., & Miel, S. (2016). Automated scoring in assessment systems. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill 日本素人 (pp. 611-626). IGI Global.

2015

He, Y., Cui, Z., & Osterlind, S.J. (2015). New robust scale transformation methods in the presence of outlying common items. Applied Psychological Measurement, 39(8), 613-626.

2014

Sotaridona, L. S., Wibowo, A., & Hendrawan, I. (2014). A parametric approach to detect a disproportionate number of identical item responses on a test. In N. M. Kingston & A. K. Clark (Eds.), Test fraud: Statistical detection and methodology (pp. 54-68). Routledge.

2013

He, Y., Cui, Z., Fang, Y., & Chen, H. (2013). Using a linear regression method to detect outliers in IRT common item equating. Applied Psychological Measurement 37, 522-540.

Presentations

Conference paper and poster presentations by our staff.

2023

Wibowo, A., Palermo, C., Vaughn, D., Justice, D. & He, Y. (2023, April). Combining Linguistic Features with Deep Neural Network Models to Fine-Tune Response Predictions for NAEP Reading Items. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Cruz Cordero, T., Wilson, J., Palermo, C., Eacker, H., Myers, M., Potter, A., & Coles, J. (2023, August). Writing motivation and ability profiles and transition after a technology-based writing intervention. Paper to be presented at the biennial European Association for Research on Learning and Instruction (EARLI) conference, Thessaloniki, Greece.

Cruz Cordero, T., Wilson, J., Palermo, C., Eacker, H., Myers, M., Potter, A., & Coles, J. (2023, April). Middle-school writing motivation: Profiles and transition in response to a technology-based writing intervention. Poster presented at the annual conference of the American Educational Research Association, Chicago, IL.

He, Y., & Chen, T. (2023, April). Using normalized theta score differences to evaluate equating with item parameter drifts. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Jiang, N., Chen, J., & DiStefano, C. (2023, April). Investigating model fit indices in multiple-group confirmatory factor analysis with ordinal data. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Wilson, J., Palermo, C., Myers, M., Cruz Cordero, T., Eacker, H., Coles, J., & Potter, A. (2023, April). Impact of MI Write automated writing evaluation on middle grade writing outcomes. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Zhang, F., Wilson, J., Cruz Cordero, T., Palermo, C., Eacker, H., Myers, M., Coles, J., & Potter, A. (2023, April). Identifying predictors of middle school students' perceptions of automated writing evaluation. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

2022

He, Y., Jing, S., & Lu, Y. (2022, April). A multilevel multinomial logit approach to bias detection. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

Jiang, N., Zhang, T., Gao, R., DiStefano, C., & Dou, J. (2022, April). Measurement invariance testing using multiple-group CFA: A systematic review. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

Justice, D. (2022, April). A linear model approach to bias detection. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA. LINK

Palermo, C. (2022, April). Examining hybrid automated scoring/handscoring results in a multi-state design. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

2021

Cui, Z., Liu, C., & He, Y. (2021, June). Using machine learning to administer salt items in computerized adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education (virtual).

Gao, R., Jiang, N., DiStefano, C., & Liu, J. (2021, April). Young children's behavior adjustment trajectories: A latent growth curve analysis. Paper presented at the annual meeting of the American Educational Research Association, Orlando, FL.

Gao, R., DiStefano, C., Liu, J., & Jiang, N. (2021, April). Longitudinal invariance analysis of Pediatric Symptom Checklist-17 (PSC-17). Paper presented at the annual meeting of the American Educational Research Association, Orlando, FL.

Jiang, N., Gao, R., DiStefano, C., & Liu, J. (2021, April). Using latent profile analysis to classify primary student's social-emotional and behavioral functioning. Paper presented at the annual meeting of the American Educational Research Association, Orlando, FL.

Thacker, A., Word, A., Sinclair, A., Nash, B., & Chen, J. (2021, June). Moving bookmark standards setting from in-person to virtual: Best practices/lessons learned. Paper presented at the annual meeting of the National Council on Measurement in Education, online.

2020

He, Y., Wu, Y.F., & Tao, W. (2020, September). Comparing CTT postequating and IRT preequating in the embedded field-test model. Paper presented at the annual meeting of the National Council on Measurement in Education.

Jiang, N., Pompey, K., & Burgess, Y. (2020, April). A comparison of two DIF methods for analyzing Rasch model data: A monte carlo investigation. Paper presented virtually at the annual meeting of the American Educational Research Association, San Francisco, CA.

Murray, A., Daoust, C., & Chen, J. (2020, April). Validating tools for measuring Montessori implementation. Paper presented at the annual meeting of the of American Educational Research Association, San Francisco, CA.

Wu, Y.F., He, Y., & Tao, W. (2020, September). Evaluating impacts on operational item performance in the embedded field-test model. Paper presented at the annual meeting of the National Council on Measurement in Education.

2019

Cui, Z., Liu, C., & He, Y. (2019, April). On administering salt items in computerized adaptive testing with salt. Paper presented at the annual meeting of the National Council on Measurement in Education, Toronto, Canada.

Daoust, C., Murray, A., & Chen, J. (2019, March). A reexamination of implementation practices in Montessori early childhood education. Paper presented at The Montessori Event, Washington, DC.

Murray, A., Chen, J., Daoust, C., & Amos, A. (2019, April). Dimensions of fidelity in a constructivist classroom. Paper presented at the annual meeting of American Educational Research Association, Toronto, Canada.

Pompey, K., Jiang, N., Burgess, Y., Lewis, A., & Dou, J. (2019, April). Differential item functioning analysis of a state-wide visual arts assessment using a two-stage procedure. Paper presented at the annual meeting of the American Educational Research Association, Toronto, Canada.

2018

Chen, T., Tao, W., & Gao, X. (2018, July). Evaluating item position effects on scrambled form pre-equating. Paper presented at the annual meeting of the International Test Commission Conference, Montreal, Canada.

Jiang, N., DiStefano, C., Liu, J., & Shi, D. (2018, April). An investigation of statistical power and sample size for CFA models with ordinal data: A monte carlo study. Paper presented at the Modern Modeling Methods Conference, Storrs, CT.

Jiang, N., Liu, J., Shi, D., & DiStefano, C. (2018, April). Performance of the weighted root mean square residual with categorical and continuous data. Paper presented at the Modern Modeling Methods Conference, Storrs, CT.

Jiang, N., Liu, J., Shi, D., & DiStefano, C. (2018, July). Sample size and statistical power for SEM: A simulation study. Paper presented at the International Meeting of Psychometric Society, New York, NY.

Jiang, N., Zheng, J., & Lewis, A. (2018, April). An HLM approach to investigate factors influencing visual arts achievement for elementary school students. Paper presented at the Chinese American Educational Research and Development Association Conference, New York, NY.

Wang, W., Zheng, Z., & Chen, J. (2018, October). Clustering students in a state classroom assessment system: Exploring the usages for classroom assessment. Paper presented at the National Council on Measurement in Education Special Conference on Classroom Assessment, Lawrence, KS.

2017

Burgess, Y., Lewis, A., & Jiang, N. (2017, November.) Increasing stakeholder use of assessment data through improved reporting. Paper presented at the annual meeting of the American Evaluation Association, Washington, DC.

Chen, T., Huang, C.H., & Liu, C. (2017). An imputation approach to handling incomplete computerized tests. Paper presented at the annual meeting of the International Association of Computerized Adaptive Testing, Niigata, Japan.

Fang, Y., Lu, Y., & He, Y. (2017, April). Can subtest equating borrow information from the full test? Paper presented at the annual meeting of the National Council on Measurement in Education, San Antonio, TX.

Guo, Z, Jiang, N., & Robert, J. (2017, April). Interrater reliability estimator accuracy and double-rated percentages: A monte carlo investigation. Paper presented at the annual meeting of the American Educational Research Association, San Antonio, TX.

He, Y., & Yi, Q. (2017, April). Impact of item parameter drift on mixed-format tests. Paper presented at the meeting of the annual meeting of the National Council on Measurement in Education, San Antonio, TX.

Leighton, E., Fan, X., Jiang, N. & Lewis, A. (2017, April). Using item response theory to investigate assessment quality in a large-scale music assessment program. Paper presented at the 6th International Symposium on Assessment in Music Education, Context Matters, Birmingham, UK.

Liu, J., Jiang, N., & DiStefano, C., (2017, May). Performance of weighted root mean square residual (WRMR) in structural equation modeling. Poster presented at the Modern Modeling Methods conference, Mansfield, CT.

2016

Cui, Z., Liu, C., He, Y., & Chen, H. (2016, April). A modified procedure in applying CATS to allow unrestricted answer changing. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.

Cui, Z., Liu, C., He, Y., & Chen, H. (2016, April). Evaluation of a new method in providing full review opportunities in computerized adaptive testing鈥擟omputerized adaptive testing with salt. Paper presented at the annual meeting of the National Council on Measurement in Education SRERA Distinguished Paper Session, Washington, D.C.

He, Y., Liu, R., & Cui, Z. (2016, April). Bayesian estimation of null categories in constructed-response items. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.

Jiang, N., Pan, F., Liu, J. & DiStefano, C. (2016, February). Paper presented at the South Carolina Educators for the Practical Use of Research (SCEPUR) annual conference, Columbia, SC.

Yi, Q., He, Y., & Wei, H. (2016, April). Sample size requirement for trend scoring in mixed-format test equating. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, D.C.

2015

Chen, T., & Tao, W. (2015, April). Linking multiple scaling tests under IRT. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Cui, Z., & He, Y. (2015, July). Practical considerations in choosing an anchor form for equating. Poster presented at the annual meeting of the Psychometric Society, Beijing, China.

Cui, Z., Liu, C., He, Y., & Chen, H. (2015, April). Allowing unrestricted answer changing through computerized adaptive testing with salt. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Cui, Z., Liu, C., He, Y., & Chen, H. (2015, September). Comparing CATS and the block review method in providing review options in CAT. Paper presented at the International Association for Computerized Adaptive Testing Summit, Cambridge, UK.

Cui, Z., Liu, C., He, Y., & Chen, H. (2015, December). Evaluation of a new method in providing full review opportunities in computerized adaptive testing鈥擟omputerized adaptive testing with salt. Paper received the IEREA Distinguished Research Award from Iowa Educational Research and Evaluation Association, Iowa City, IA.

Harris, D. J., Liu, C., & Chen, T. (2015). An exploratory study of starting a CAT with a non-scaled item pool. Paper presented at the annual meeting of the International Association of Computerized Adaptive Testing, Cambridge, England.

He, Y., Cui, Z., & Osterlind, S.J. (2015, April). Using robust scale transformation methods for multiple outlying common items. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Wibowo, A., & Sotaridona, L.S. (2015, July). Incorporating suspicious answer changes in the detection of aberrant response patterns. Paper presented at the International Meeting of the Psychometric Society (IMPS), Beijing, China.

2014

Cui, Z., He, Y., & Osterlind, S.J. (2014, August). New robust scale transformation methods in the presence of outlying common items. Paper presented at the annual meeting of the Psychometric Society, Madison, WI.

Cui, Z., Liu, C., He, Y., & Chen, H. (2014, October). Comparison of algorithms that allow item review in computerized adaptive testing. Paper presented at the International Association for Computerized Adaptive Testing Summit, Princeton, NJ.

He, Y. & Cui, Z. (2014, April). Comparison of IRT preequating methods when item positions change. Paper presented at the annual meeting of the National Council on Measurement in Education, Philadelphia, PA.

Su, I., He, Y., & Osterlind, S. J. (2014, April). Comparing different model-based standard setting procedures. Poster presented at the NCME Graduate Student Issues Committee (GSIC) poster session, Philadelphia, PA.

Yang, P., He, Y., & Wang, Z. (2014, April). Hierarchical bayesian modeling for two-parameter nested logit model with parallel computing. Poster presented at the NCME Graduate Student Issues Committee (GSIC) poster session, Philadelphia, PA.

2013

He, Y., Yang, P., & Osterlind, S. J. (2013, April). Weighted moment approaches in scale transformation for IRT equating. Poster presented at the NCME Graduate Student Issues Committee (GSIC) poster session, San Francisco, CA.

Sotaridona, L. S., Wibowo, A., & Hendrawan, I. (2013, October). Item-level analysis of wrong-to-right erasures. Paper presented at the 2nd Annual Conference on Statistical Detection of Possible Test Fraud, Madison, KS.

Sotaridona, L. S., Wibowo, A., & Hendrawan, I. (2013, April-May). The utility of dichotomous IRT models on group-level cheating detection method. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco, CA.

Sotaridona, L. S., Wibowo, A., Hendrawan, I., & Pornel, J. (2013, October). An application of nominal response model to identify erroneously scored test items. Invited paper presentation at the 12th National Convention on Statistics, Manila, Philippines.

Sotaridona, L. S., Wibowo, A., & Pornel, J. (2013, October). The stability of point biserial correlation coefficient estimates against different sampling schemes. Paper presented at the 12th National Conventions on Statistics, Manila, Philippines.

Wibowo, A., Sotaridona, L. S.& Hendrawan, I. (2013, October). Item-level analysis of response similarity. Paper presented at the 2nd Annual Conference on Statistical Detection of Possible Test Fraud, Madison, KS.

Wibowo, A., Sotaridona, L. S., & Hendrawan, I. (2013, April-May). Statistical models for flagging unusual number of wrong-to-right erasures. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco, CA.

Zopluoglu, C., Chen, T., Huang, C., & Mroch, A. (2013, April). Using previous test performance to improve the efficiency of statistical indices in detecting answer copying. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Zopluoglu, C., Chen, T., Huang, C., & Mroch, A. (2013, October). The performance of statistical indices in detecting answer copying on multiple-choice examinations using dichotomous item scores. Paper presented at the 2nd Annual Statistical Detection of Potential Test Fraud Conference, Madison, WI.