Written by : Nikita Saha
December 18, 2024
DREAM leverages anonymized data to create realistic messages, ensuring HIPAA compliance.
Johns Hopkins University researchers have introduced DREAM, an AI system designed to generate synthetic patient portal messages for training large language models (LLMs).
The tool addresses key challenges in medical AI by balancing patient privacy and data accuracy.
The study was published in the Journal of Biomedical Informatics and highlights the potential of AI-generated data to enhance clinical research and innovation.
DREAM leverages anonymized data to create realistic messages, ensuring HIPAA compliance.
“High-quality synthetic medical data can significantly advance health research and improve patient care,” said Casey Taylor, senior author and associate professor of biomedical engineering at Johns Hopkins.
Taylor noted that DREAM-generated datasets allow AI models to be developed without compromising patient privacy.
Using OpenAI’s GPT-4, the research team crafted synthetic patient messages focused on symptoms and medications. Prompt engineering—a technique for guiding AI to produce specific outputs—was key to generating humanlike, natural messages.
“We based our prompts on a carefully classified set of real-world messages from patients to providers,” said Natalie Wang, a PhD student and co-author.
The study evaluated 450 AI-generated patient messages using tools like Linguistic Inquiry and Word Count (LIWC) to assess sentiment and politeness.
Results showed that DREAM-generated messages closely mirrored the tone and quality of human-written communications. For instance, prompts indicating "high urgency" resulted in messages reflecting heightened need or concern.
Despite its promise, the study uncovered challenges, including racial bias in synthetic message generation. Messages prompted with terms such as "Black or African American" often scored lower for politeness and accuracy compared to those with "white."
Researchers emphasized the importance of refined prompt engineering to reduce such biases.
Looking ahead, the team believes synthetic datasets could streamline medical research by enabling experiments with AI applications while safeguarding patient privacy. "Synthetic data may help build tools that direct patient portal messages to appropriate providers based on urgency and severity," said Taylor.
The researchers also plan to explore applications in genomic medicine, aiming to identify messages relevant to pharmacogenomics specialists.
Additional authors of the study include Ayah Zirikly, Yuzhi Lu, Sukrit Treewaree, Michelle Nguyen, Bhavik Agarwal, Jash Shah, and James Stevenson. DREAM is publicly available on GitHub for further use and research.