Presenting Author

Bianca A Camacho

Presentation Type

Poster

Discipline Track

Other

Social Behavior

Abstract Type

Research/Clinical

Abstract

Background: Reliable, consistent, and objective data is a goal all studies aim to achieve, but many struggle to obtain when subjective biases between researchers can occur. Inter-rater reliability (IRR) is a statistical measure used to quantify the degree of agreement between researchers qualitatively scoring the same phenomenon. The primary goal of this study is to enhance the methodology used to achieve optimal IRR. Using an established ethogram, our team of researchers scored the social behavior of the adult gray short-tailed opossum (Monodelphis domestica) to propose an effective method for achieving high IRR that can contribute to future research in data accuracy across multiple disciplinary fields, including clinical research and practice.

Methods: A team of two raters used the Behavioral Observation Research Interactive Software (BORIS) to individually score the social behavior of each paired Monodelphis domestica subjects based on an ethogram with species-typical behaviors through observational recordings. Raters scored animal subjects’ behaviors while pausing throughout each observational video. After the first session of scoring, the team discussed misunderstandings of the ethogram during a review of our respective behavioral scorings for the first-paired animals throughout various timestamps. The team proceeded to individually score the behavior of the remaining four-paired subjects with continued pausing and without further consultations between raters on IRR. To measure IRR for every subject, Cohen’s Kappa Coefficient was used due to its strict statistical outcome that accounts for the possibility of false agreement between raters.

Results: Initial scoring sessions for the first-paired subjects yielded low IRR scores (k=0.356, k=0.317). Subsequent scoring sessions for the second-paired subjects demonstrated an increase in IRR (k=0.743, k=0.730). Our final scoring sessions for the third-paired subjects resulted in the highest IRR amongst all scoring sessions (k=0.906, k=0.879). A progressive trend towards increased agreement and inter-rater reliability scores were obtained as familiarity and understanding of the ethogram and BORIS program heightened in each rater.

Conclusions: By using an animal model to train our research team in increasing IRR, our study demonstrated how adjustments of an approach towards reducing subjectivity can contribute to the understanding of improving consistency in future studies. Reliable evaluations are critical for providing excellent healthcare, decreasing misdiagnoses, and optimizing treatment plans for patients. Examining methods towards increasing inter-rater reliability amongst healthcare providers and scientists can strengthen the standardization of clinical research and practice. A limitation in our study was that the consistent pausing while scoring social behaviors led to variation in the time raters spent scoring each subject. These variations in time may contribute to confounding variables, such as potentially inaccurate IRR scores. Our identified limitation emphasizes the importance of delving deeper into researching IRR for a more standardized scoring approach. Therefore, our goal in future investigations is to revise our initial methodology by scoring behavior in real time, decrease time spent on scoring, and assess any disparities in interrater-reliability scores as shown by differences in our methodology.

Academic/Professional Position

Graduate Student

Mentor/PI Department

Neuroscience

Share

COinS
 

Investigating Interrater-reliability in Assessing Social Behavior of Monodelphis Domestica

Background: Reliable, consistent, and objective data is a goal all studies aim to achieve, but many struggle to obtain when subjective biases between researchers can occur. Inter-rater reliability (IRR) is a statistical measure used to quantify the degree of agreement between researchers qualitatively scoring the same phenomenon. The primary goal of this study is to enhance the methodology used to achieve optimal IRR. Using an established ethogram, our team of researchers scored the social behavior of the adult gray short-tailed opossum (Monodelphis domestica) to propose an effective method for achieving high IRR that can contribute to future research in data accuracy across multiple disciplinary fields, including clinical research and practice.

Methods: A team of two raters used the Behavioral Observation Research Interactive Software (BORIS) to individually score the social behavior of each paired Monodelphis domestica subjects based on an ethogram with species-typical behaviors through observational recordings. Raters scored animal subjects’ behaviors while pausing throughout each observational video. After the first session of scoring, the team discussed misunderstandings of the ethogram during a review of our respective behavioral scorings for the first-paired animals throughout various timestamps. The team proceeded to individually score the behavior of the remaining four-paired subjects with continued pausing and without further consultations between raters on IRR. To measure IRR for every subject, Cohen’s Kappa Coefficient was used due to its strict statistical outcome that accounts for the possibility of false agreement between raters.

Results: Initial scoring sessions for the first-paired subjects yielded low IRR scores (k=0.356, k=0.317). Subsequent scoring sessions for the second-paired subjects demonstrated an increase in IRR (k=0.743, k=0.730). Our final scoring sessions for the third-paired subjects resulted in the highest IRR amongst all scoring sessions (k=0.906, k=0.879). A progressive trend towards increased agreement and inter-rater reliability scores were obtained as familiarity and understanding of the ethogram and BORIS program heightened in each rater.

Conclusions: By using an animal model to train our research team in increasing IRR, our study demonstrated how adjustments of an approach towards reducing subjectivity can contribute to the understanding of improving consistency in future studies. Reliable evaluations are critical for providing excellent healthcare, decreasing misdiagnoses, and optimizing treatment plans for patients. Examining methods towards increasing inter-rater reliability amongst healthcare providers and scientists can strengthen the standardization of clinical research and practice. A limitation in our study was that the consistent pausing while scoring social behaviors led to variation in the time raters spent scoring each subject. These variations in time may contribute to confounding variables, such as potentially inaccurate IRR scores. Our identified limitation emphasizes the importance of delving deeper into researching IRR for a more standardized scoring approach. Therefore, our goal in future investigations is to revise our initial methodology by scoring behavior in real time, decrease time spent on scoring, and assess any disparities in interrater-reliability scores as shown by differences in our methodology.

blog comments powered by Disqus
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.