Written by : Arti Ghargi
June 3, 2024
Jivi AI, a Gurugram-based healthcare startup specializing in artificial intelligence, has secured the top spot on the Open Medical LLM (Large Language Model) Leaderboard.
Jivi MedX achieved an average score of 91.65 across the leaderboard's nine benchmark categories.
This achievement places Jivi's medical LLM, Jivi MedX, ahead of established names such as OpenAI's GPT-4 and Google's Med-PaLM 2.
Founded by former BharatPe chief product officer Ankur Jain and GV Sanjay Reddy, chairman of Reddy Ventures, Jivi AI aims to leverage generative AI to transform primary health care.
Ankur Jain, elaborating on the startup’s mission said, "Jivi is revolutionizing primary healthcare through generative AI, making top-quality care accessible 24/7 at a fraction of the cost. Our mission is to harness artificial intelligence to enhance patient care. The platform accelerates diagnostics and ensures higher accuracy, enabling timely and precise treatment for all."
The leaderboard, hosted by Hugging Face, the University of Edinburgh, and Open Life Science AI, assesses medical-specific LLMs based on their ability to answer medical questions from exams and research.
Per the startup, Jivi’s Open Medical LLM MedX ranked higher than ChatGPT and MedPaLM-2.
The evaluation encompasses a variety of medical exams, including Indian entrance exams (AIIMS and NEET), the US Medical Licensing Exams (USMLE), and in-depth assessments in clinical knowledge, medical genetics, and professional medicine.
Jivi MedX leverages a massive, proprietary medical dataset for training. This dataset, one of the world's largest, includes millions of medical research papers, journals, clinical notes, and other sources.
The training method utilizes an instruction fine-tuning algorithm called Odds Ratio Preference Optimization.
"Jivi's mission is to make top-of-the-line healthcare available to everyone globally. Being the best LLM in the world gives us confidence as the company prepares Jivi for over a billion people," Jivi AI’s cofounder, Reddy added.
Despite its current size of 20 members, Jivi's team comprises physicians, surgeons, AI engineers, and data scientists.
The startup says it focuses on developing technology that will improve healthcare accessibility, affordability, and quality globally.
Medical LLMs (Large Language Models) are advanced AI models trained on large corpora of medical data to achieve high performance on various medical tasks.
They are designed to assist healthcare professionals in medical question-answering, dialogue systems, and text-generation tasks.
Medical LLMs are particularly useful in settings where they can help extract valuable insights from electronic health records (EHRs), medical literature, and patient-generated data.
Google launched MedPaLM-2 in March 2023. According to the MedPaLM 2 team, their model achieved a score of 85% on medical exam questions (USMLE MedQA), which is comparable to the level of an “expert” doctor.
This is an improvement of 18% from the previous performance of Med-PaLM, surpassing similar AI models. It also obtained results on benchmarks such as MedMCQA and MMLU clinical topics.
Google’s AI research lab DeepMind also released Med-Gemini, a group of AI models developed by Google for medical use.
As per Google, Med-Gemini achieved an accuracy of 91.1%, surpassing the previous best by 4.6%. In multimodal tasks, the models outperformed GPT-4 by an average of 44.5%.
On the other hand, GPT-4 developed by OpenAI provides a general-purpose language model that has been fine-tuned for medical applications and has shown promise in various medical tasks.
Similarly, several Medical LLMs have mushroomed in the past few years. This includes BioGPT- A medical-specific language model designed for biomedical text processing and analysis, BioBERT- A biomedical-specific version of the popular BERT language model, fine-tuned for medical text processing and analysis.
T5, is another general-purpose language model that has been fine-tuned for medical applications and has shown promise in various medical tasks.