Date of Award


Document Type


Degree Name



Quantitative Research Methods

First Advisor

Kathy E. Green

Second Advisor

Frédérique Chevillot

Third Advisor

Duan Zhang

Fourth Advisor

Elizabeth Anderson


Differential item functioning, Item response theory, Latent class, Measurement invariance, Rasch mixture model, Simulation


Measurement invariance is crucial for an effective and valid measure of a construct. Invariance holds when the latent trait varies consistently across subgroups; in other words, the mean differences among subgroups are only due to true latent ability differences. Differential item functioning (DIF) occurs when measurement invariance is violated. There are two kinds of traditional tools for DIF detection: non-parametric methods and parametric methods. Mantel Haenszel (MH), SIBTEST, and standardization are examples of non-parametric DIF detection methods. The majority of parametric DIF detection methods are item response theory (IRT) based. Both non-parametric methods and parametric methods compare differences among subgroups categorized by observed covariates such as gender and grade. As a result, the differences within unobserved subgroups are likely to be neglected. The Rasch mixture model (RMM), a combination of the Rasch model and mixture model, is an alternative for extracting the latent class (LC) from summarizing similar identities of underlying latent traits. DIF can be calculated among LCs based on the differences among mean item difficulties for each LC.

The purpose of this study was to examine the robustness of the RMM in detecting DIF through manipulating five variables: number of items (i.e., test length, 2 levels), proportion of DIF (3 levels), LC structure (2 levels), group size (2 levels) and DIF type (2 levels), which yields 2*3*2*2*2 = 48 scenarios. A sample size of 3,000 was used for each replication of each scenario. The robustness of the RMM on detecting DIF was assessed from two perspectives: latent class structure recovery and parameter recovery. One hundred replications per scenario were used for LC structure recovery and 200 replications per scenario were used for parameter recovery.

The main and interactions effects of five manipulated factors on LC structure recovery and parameter recovery were examined by conducting factorial analysis of variance (ANOVA). Both AIC and BIC showed a conservative pattern on LC structure recovery in which the recovered LCs did not match the true structure perfectly or even in the majority of cases. That is, it was rare that the correct latent structure was recovered at 100%. For classifier recovery, all five manipulated factors showed effect sizes that were medium or larger except DIF type (η2 < 0.06), and there were two medium effect size interactions for classifier recovery, and they were number of items by group size interaction (η2 = 0.10) and LC structure by group size interaction (η2 = 0.13). There were three main and three interaction effects of the five manipulated factors on DIF recovery (η2 > 0.06) and they were effects of number of items, proportion of DIF items, LC structure, number of items by LC structure interaction, proportion of DIF items by LC structure interaction, and DIF type by LC structure interaction. Among these effects, group size (η2 = 0.45) had the strongest effect on classifier recovery and LC structure (η2 = 0.86) had the strongest effect on DIF recovery. It is recommended for practitioners to have close group sizes for latent classes, 20% to 40% proportion of DIF items, and a LC structure close to a two LC structure, to determine DIF using an RMM. Both AIC and BIC are not suggested as model selection methods in DIF detection using the RMM. Instead the Cressie-Read statistic can be an option for choosing the correct number of latent classes from observed response patterns as the Cressie-Read statistic includes statistical tests rather than using likelihood ratios. A practitioner can identify DIF and its direction through calculating the item difficulty difference Δb between two latent classes. It can be considered as no item DIF for using the RMM method when Δb < 0.3, small DIF when 0.3 ≤ Δb < 0.9, medium DIF when 0.9 ≤ Δb < 1.5, and large DIF when Δb ≥ 1.5.

Finding more reliable model selection indices for the RMM on DIF detection, increasing the efficiency of simulation, and including a single latent class structure as a comparison are directions for future study. The number of replications used in this study is recommended for practitioners who want to conduct simulation studies using the Rasch mixture model.

Publication Statement

Copyright is held by the author. User is responsible for all copyright compliance.


Received from ProQuest

Rights holder

Jinjin Huang

File size

274 p.

File format





Statistics, Educational tests and measurements