Breaking Barriers: ChatGPT’s Radiology Exam Triumph and Limitations Unveiled!

Yana Khare 31 May, 2023

3 min read

ChatGPT Passes Radiology Exam With Flying Colors! | GPT | medical community

In a groundbreaking development, the latest version of ChatGPT has astounded the medical community by passing a rigorous radiology board-style exam. This achievement has shed light on large language models‘ immense potential while highlighting the limitations that impede their reliability. Recent research studies published in Radiology have unveiled both the triumphs and challenges of integrating ChatGPT into radiology.

Also Read: Student Gets 94% in 72 Hours Using ChatGPT

Rise of ChatGPT in the Medical World

ChatGPT hailed as the fastest-growing consumer application in history, has gained tremendous traction. This surge in popularity is further fueled by the integration of similar chatbots into significant search engines like Google and Bing. Thus, revolutionizing how physicians and patients seek medical information. Dr. Rajesh Bhayana, an abdominal radiologist from the medical community, explains the significance of ChatGPT’s performance in radiology.

Learn More: Machine Learning & AI for Healthcare in 2023

Assessing ChatGPT’s Radiology Expertise

To evaluate ChatGPT’s aptitude in radiology, Dr. Bhayana and his colleagues conducted a comprehensive examination using the most commonly used version, GPT-3.5. The researchers meticulously designed 150 multiple-choice questions, carefully aligning them with the style, content, and difficulty level of esteemed institutions such as the Canadian Royal College and the American Board of Radiology.

Also Read: ChatGPT Outperforms Doctors in Providing Quality Medical Advice

Understanding ChatGPT’s Performance

The questions posed to ChatGPT did not include images and were categorized into different types to ascertain its capabilities. The researchers aimed to gain insights into lower-order thinking (knowledge recall and basic understanding) and higher-order thinking (apply, analyze, synthesize) questions. Further subcategories within the higher-order thinking questions delved into the description of imaging findings, clinical management, calculation and classification, and disease associations.

Results and Limitations of ChatGPT

Overall, ChatGPT based on GPT-3.5 answered 69% of the questions correctly, with a notable performance in lower-order thinking questions (84%). However, it struggled with higher-order thinking questions, securing only 60% accuracy. Specifically, ChatGPT faced challenges in areas such as describing imaging findings, calculating and classifying, and applying concepts. This outcome was expected, given that the model lacks radiology-specific pretraining.

Also Read: The Double-Edged Sword: Pros and Cons of Artificial Intelligence

Advent of GPT-4: An Improvement in Reasoning Capabilities

In March 2023, GPT-4 was introduced in limited form to paid users, boasting enhanced advanced reasoning capabilities compared to its predecessor, GPT-3.5. A follow-up study demonstrated GPT-4’s remarkable performance, correctly answering 81% of the same questions, surpassing the passing threshold of 70%. Notably, GPT-4 displayed significant progress in higher-order thinking questions. Particularly in those concerning the description of imaging findings and application of concepts.

Duality of GPT-4’s Performance

While GPT-4 exhibited commendable improvements in higher-order thinking questions, it showcased no significant progress in lower-order thinking questions compared to GPT-3.5. Additionally, GPT-4 provided incorrect answers to 12 questions that GPT-3.5 had answered correctly. Therefore, raising concerns about its reliability in information gathering. Dr. Bhayana expressed surprise at ChatGPT’s accurate and confident responses to challenging radiology questions. But she also acknowledges the occasional illogical and inaccurate assertions are inherent to how these models function.

Also Read: Lawyer Fooled by ChatGPT’s Fake Legal Research

Unleashing the Potential of ChatGPT

Despite its limitations, ChatGPT’s evolution, as demonstrated by GPT-4, showcases impressive growth potential in radiology. Dr. Bhayana emphasizes that ChatGPT is currently most effective in sparking ideas, aiding the medical writing process, and summarizing data. However, for quick information recall, fact-checking remains essential to ensure accuracy.

Our Say

The success of ChatGPT in passing a radiology board-style exam has unleashed a wave of excitement within the medical community. Although limitations persist, the evolution of GPT-4 presents a promising future for large language models in radiology and beyond. As researchers continue to refine these models, it is evident that ChatGPT has already begun to reshape the landscape of medical education and practice.