Written by : Arti Ghargi
March 29, 2024
On March 3rd, 2024, Tim Brooks, research head at Open AI’s Sora shared a video on his X (formerly known as Twitter) account.
The 1-minute video begins with an aerial shot of a building surrounded by a landscape that looks like a city. The camera then proceeds inside the building and flies through the passage that boasts paintings and art on both sides of the walls.
It moves through the halls, sculptures, and statues. For a normal eye, the video seems like an uninterrupted visual of a museum captured by a camera mounted on top of a drone and flown through the museum, except it isn’t.
The video is in fact a product of a one-line prompt: “fly through tour of a museum with many paintings and sculptures and beautiful works of arts in all styles”. The video went viral on the internet and left people around the world awestruck with Sora’s generative AI potential.
Imagine a task that would generally require high-end equipment, manpower, and days of time investment, is being rendered in one click and in a matter of few minutes, all with just one line of text.
The potential of generative AI has taken the world by storm. From governance to healthcare, industries are racing to integrate AI capabilities into their work to improve efficiency and experience.
Generative AI tools have found multiple use cases in healthcare. Banking on these capabilities, several startups around the world are building solutions to solve healthcare problems.
There are tools that use Gen AI to talk to patients, book appointments, automatically fill prescriptions by reading the documents, and even convert voice instructions to text to ease the documentation burden on physicians.
How can Sora, a tool that offers text to AI video output be embedded in healthcare delivery?
If a picture is worth a thousand words, a video is worth a million. In healthcare, where caregivers are dealing with patient lives on a day-to-day basis, communication becomes an inalienable part of the job.
A clear communication in comprehensible language helps a patient to better understand his condition, and treatment, can clear his/her confusion, and improve trust.
For Example, imagine if a doctor tells a patient that they have an Anterior cruciate ligament fracture, their meniscus is damaged and they need to undergo surgery.
A common man who has had no exposure to medical education can get confused when encountered with medical jargon during consultation. He might not be even aware of what Anterior cruciate ligament or meniscus is.
While some doctors do explain the condition through the use of charts and pictures, a video explaining the condition, the course of treatment, and post-treatment care would give a patient more confidence and eliminate any confusion.
Take a look at another case: A patient taking physiotherapy is suggested to do some exercises at home.
How can a doctor ensure that the patient has access to instructions on how to perform these exercises? Text to AI Video platforms like Sora can create engaging content explaining how to perform a particular exercise step-by-step. It can provide a video guide to the patients.
Through the platform, one can also create “How to use” videos for medical devices or other services.
This can empower small clinics and healthcare organizations who may not be able to afford a full video creation setup due to budgetary constraints to create patient education content in minutes.
You generally find medical students with their heads buried in books hundreds or thousands of pages thick. For those who have just entered the medical education stream, grasping a particular concept might be cumbersome.
This, however, can be made easy by AI video. For example, what if a student can see in a 3D form rather than read what Tegmentum is and what it does?
One can run simulations to understand complex topics in various scientific fields of biology or biochemistry.
Surgical training could also benefit immensely with a visual representation making the training more interactive. It will allow medical students and practicing surgeons to hone their skills in a realistic virtual setting.
This can bring a substantial change in how medicine is taught. Some startups such as MedVR are already using VR and AI to create simulated environments for medical practice and education.
Beyond these two prominent use cases, text-to-AI Video tools can be used to create contents in multiple medical settings, especially those that require extensive communication with patients like mental health therapy or even self-examination techniques.
While the technology is yet to be made available for public use and leaves a lot of scope for improvement, it can definitely aid healthcare practitioners, trainees and even patients.
(Image Source: Freepik)
Although the technology is promising and revolutionary, one has to also be aware of the challenges associated with it.
LLMs behind Generative AI platforms are trained on vast sets of data. Data is the foundation of AI models. So, the generative AI tool will also be as good as the data it is trained on.
This brings in the question of privacy and safety too.
A recent study published in the British Medical Journal highlighted the potential risks of using AI-assistant or chatbots like ChatGPT for medical information.
It found that the safeguard policies to block the generation of healthcare misinformation were either not robust or could be compromised easily.
With a text to AI Video platform, the misinformation risk increases multifold. Bad actors with nefarious intentions can easily create a hyper-realistic video of a medical professional endorsing a particular medicine or line of treatment, thus misleading the common people.
All these aspects have to be taken into consideration when integrating text-to-AI Video capabilities in healthcare.