2020 FSA Posters
P062: SIMULATION-BASED ASSESSMENT OF PEDIATRIC PROVIDERS: A GENERALIZABILITY STUDY
Gulsen Tasdelen-Teker, PhD1; Marie Nader, MD2; Anthony DeStephens, MSME2; Samsun Lampotang, PhD2; Jennifer Munoz-Pareja, MD2; 1Hacettepe University; 2University of Florida
Background: The educational role of simulation is expanding with its use as a tool used in low stake formative assessments or higher stake summative assessments. However, the use of checklists, rating scales and clinical cases can result in different measurement errors which should be accounted for when studying the score reliability of the tool in question. Available reliability coefficients like Cronbach alpha or inter-rater reliability are very useful but insufficient in these situations. When building a simulation curriculum with multiple cases, it is primordial to evaluate the consistency of the participant’s performance during different scenarios. We hypothesize that the Generalizability (G) theory can address multiple sources of measurement error at once to reach powerful reliability coefficient.
Methods: We conducted generalizability (G) and decision (D) studies from the G theory to analyze data obtained from a simulation based formative assessment of crisis resource management skills during pediatric resuscitation. Participants were divided into groups of five to ten. A 15 minutes crash course focusing on major PALS algorithms was followed by a 25 minutes simulated resuscitation exercise using a low fidelity simulator. A 20-minute debriefing session was then performed. The exercise was repeated three days later. The participants were video-recorded and then independently evaluated by two raters using the Ottawa Global Rating Scale (O-GRS) by Kim et al. (2006). We used that data to (1) examine the psychometric characteristics of the O-GRS, (2) illustrate the use of the G theory in measuring multiple sources of error variance in a study design and (3) define the number of cases, raters and items needed for optimal reliability of O-GRS as an assessment tool.
Results: The G study estimated the largest proportion of total variance at 25.9% for both participant-case and participant-case-rater interactions. 16.1% of the total variance was due to the object of measurement. Although the variances attributed to cases, raters, rating scale items and “case-rater-item interaction” were almost zero; the interaction of participants to those facets was estimated as larger than zero.
The D study suggests that the effect of the number of items in the rating scale on the reliability coefficient of the latter is lower than that of the number of cases or raters. However, minimizing the number of raters while maximizing reliability coefficient improves adaptability and implementation potential. We estimated that 0.70 for a D study including two cases, two raters and six items was an appropriate threshold. This reliability coefficient is adequate for formative assessments. Increasing the number of raters and cases to three and keeping the number of items at six results in an estimated reliability of 0.80, However, its is more difficult to implement a curriculum with three raters in practice.
Discussion: Our results suggest adequate reliability of the O-GRS when using two cases, two raters and six items for formative assessment of healthcare provider’s performance in a simulated pediatric resuscitation. This study demonstrates the ability of G theory to determine the number of cases, raters and sources of error required to obtain optimum reliability for assessment.