Posters

Presenting Author

John M Gaddis

Presenting Author Academic/Professional Position

Medical Student

Academic Level (Author 1)

Medical Student

Discipline/Specialty (Author 1)

Orthopedic Surgery

Academic Level (Author 2)

Medical Student

Discipline/Specialty (Author 2)

Internal Medicine

Academic Level (Author 3)

Medical Student

Discipline/Specialty (Author 3)

Otolaryngology (OHNS)

Academic Level (Author 4)

Other

Discipline/Specialty (Author 4)

Orthopedic Surgery

Academic Level (Author 5)

Faculty

Discipline/Specialty (Author 5)

Orthopedic Surgery

Presentation Type

Poster

Discipline Track

Patient Care

Abstract Type

Research/Clinical

Abstract

Aims: The study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by ChatGPT 4.0 to 30 common patient questions about the Bernese Periacetabular Osteotomy (PAO).

Methods: Two fellowship-trained orthopaedic surgeons specializing in hip preservation selected thirty questions from a prior study identifying common PAO questions on social media. Each question was entered into ChatGPT 4.0, and the surgeons independently graded responses. Responses were evaluated using an established grading system: “excellent,” “satisfactory requiring minimal clarification,” “satisfactory requiring moderate clarification,” or “unsatisfactory.” Accuracy and comprehensiveness were assessed based on the concordance of response content with current literature. Readability was analyzed by calculating the Flesch-Kincaid Grade Level and Flesch-Kincaid Reading Ease. Interrater reliability was measured with Cohen's kappa.

Results: Regarding accuracy and comprehensiveness, 96.7% of responses were graded as "excellent" or "satisfactory, requiring minimal clarification." One reviewer rated 24 responses (80%) as "excellent," while the second reviewer assigned this rating to 17 responses (56.7%). Of the remaining responses, 6 (20%) and 12 (40%) were rated as "satisfactory, requiring minimal clarification" by the first and second reviewers, respectively. Only one response (3.3%) was graded as "satisfactory, requiring moderate clarification," and none were rated as "unsatisfactory." Interrater reliability showed moderate agreement (κ = 0.5). Readability analysis revealed an average Flesch-Kincaid Grade Level corresponding to an 11th-grade reading level (11.07 ± 1.60) and a mean Reading Ease score requiring college-level reading comprehension (39.89 ± 8.37). Notably, 93.3% of responses required at least a college-level education to comprehend (Grade Level ≥ 12.5 or Reading Ease ≤ 50.0).

Conclusion: ChatGPT 4.0 provided accurate, satisfactory answers to common questions about PAO, with most rated as excellent. However, the advanced reading level may pose comprehension challenges for patients. ChatGPT is a promising educational resource for PAO patients; future iterations should prioritize improving readability without compromising quality.

Included in

Orthopedics Commons

Share

COinS
 

Assessing the Accuracy and Readability of ChatGPT 4.0’s Responses to Common Patient Questions Regarding Periacetabular Osteotomy (PAO)

Aims: The study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by ChatGPT 4.0 to 30 common patient questions about the Bernese Periacetabular Osteotomy (PAO).

Methods: Two fellowship-trained orthopaedic surgeons specializing in hip preservation selected thirty questions from a prior study identifying common PAO questions on social media. Each question was entered into ChatGPT 4.0, and the surgeons independently graded responses. Responses were evaluated using an established grading system: “excellent,” “satisfactory requiring minimal clarification,” “satisfactory requiring moderate clarification,” or “unsatisfactory.” Accuracy and comprehensiveness were assessed based on the concordance of response content with current literature. Readability was analyzed by calculating the Flesch-Kincaid Grade Level and Flesch-Kincaid Reading Ease. Interrater reliability was measured with Cohen's kappa.

Results: Regarding accuracy and comprehensiveness, 96.7% of responses were graded as "excellent" or "satisfactory, requiring minimal clarification." One reviewer rated 24 responses (80%) as "excellent," while the second reviewer assigned this rating to 17 responses (56.7%). Of the remaining responses, 6 (20%) and 12 (40%) were rated as "satisfactory, requiring minimal clarification" by the first and second reviewers, respectively. Only one response (3.3%) was graded as "satisfactory, requiring moderate clarification," and none were rated as "unsatisfactory." Interrater reliability showed moderate agreement (κ = 0.5). Readability analysis revealed an average Flesch-Kincaid Grade Level corresponding to an 11th-grade reading level (11.07 ± 1.60) and a mean Reading Ease score requiring college-level reading comprehension (39.89 ± 8.37). Notably, 93.3% of responses required at least a college-level education to comprehend (Grade Level ≥ 12.5 or Reading Ease ≤ 50.0).

Conclusion: ChatGPT 4.0 provided accurate, satisfactory answers to common questions about PAO, with most rated as excellent. However, the advanced reading level may pose comprehension challenges for patients. ChatGPT is a promising educational resource for PAO patients; future iterations should prioritize improving readability without compromising quality.

blog comments powered by Disqus
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.