Optimizating and Diffusing a Handover Behavioral Assessment Tool for Simulation




Chen, Rodney

Journal Title

Journal ISSN

Volume Title


Content Notes


INTRODUCTION: With multiple simulated and clinical scenarios included in the ongoing Quality Enhancement Plan (QEP), a standardized approach to assessing and trending handover quality across class years could quantify the improvements established through the QEP. This study assesses the utility of the Liang Handover Assessment Tool for Simulation (L-HATS), a valid and reliable behavioral assessment tool tested during the transition to clerkship (T2C) handover module. Here, we use the L-HATS to assess handovers delivered during residency essentials (RE) and COVID-19 telehealth courses, checking for tool reliability in settings other than T2C. In cases where we find the tool to be less reliable, we optimize the L-HATS by improving the observer training course. The study aim is to confirm tool reliability of ICC>0.75, consistent with levels of reliability found during testing in the T2C module. METHODS: We select volunteer observers from a group of medical students who had completed the T2C course, with each observer assigned a set of videos to score for each activity. The primary outcome measure for this study is the two-way random effects ICC, which represents tool inter-rater reliability in each novel activity. An ICC>0.75 is considered good reliability, an ICC 0.5-0.75 is considered moderate reliability, and an ICC<0.5 is considered poor reliability. As the volunteer observer training improves across activities, we assess for observers' intra-rater reliability. Intra-rater reliability is assessed along the same scale used for inter-rater reliability. RESULTS: RE inter-rater reliability was 0.561 [0.167, 0.953], with each of six observers scoring four videos. COVID-19 telehealth inter-rater reliability was 0.644 [0.244, 0.964], with five observers each scoring four videos. The intra-rater reliability calculated for the telehealth course ranged from 0.105 [-0.361, 0.863] to 0.667 [0.020, 0.971]. CONCLUSION: This study demonstrates moderate levels of reliability in both the RE and telehealth courses. However, neither novel activity could match the reliability scores calculated during original L-HATS testing, suggesting that the tool is less reliable in settings outside of the T2C course. Future studies might increase the number of graded videos per handover activity, to narrow the confidence intervals found in the present study. Moreover, we find that a universally flexible assessment tool is difficult to design, suggesting that each new learning activity may require a uniquely tailored behavioral assessment tool.

General Notes

Table of Contents


Related URI