“(DeepSeek) is the most elegant middle finger to proprietary AI”
Yann LeCun, Chief AI Scientist at Meta
DeepSeek as an AI “juggernaut” has been incessantly in the news since the story broke a few weeks ago that this Chinese new AI model from Hangzhou (led by Chinese hedge fund founder Liang Wenfeng in 2023) claimed to be both robust and inexpensive. It was the most downloaded app on the iOS App Store. This AI technological surprise, coming at an interesting time of Chinese Lunar New Year, has sent reverberations throughout both Silicon Valley and Wall Street (Nvidia’s market cap plummeted by a few hundred billion dollars). The American AI companies that led the recent large language model (LLM) surge were humbled and had to recalibrate their assessment of presumed dominance in the global AI arms race. Perhaps the term DeepSeek Sputnik moment is not entirely hype.
DeepSeek and Its Technological Innovations (Explained). DeepSeek and its V3, R1, and Janus-Pro-7B (multimodal reasoning and text-to-image generation model) triad of complementary AI models have at least matched the lofty performance of OpenAI’s latest models (o1 pro and o3-mini). While DeepSeek-V3 (introduced December 2024) is a mixture-of-experts (MoE) model that maximizes efficiency without compromising performance (“the powerhouse model”), its sister AI model DeepSeek-R1 (introduced a month later in January 2025) incorporates a novel reinforcement learning technique without human supervision to enhance structured reasoning and decision-making (“the reasoning specialist”). The latter R series model, with multi-head latent attention (MLA) mechanism, is particularly well suited for tasks that demand deep logical analysis like scientific research, clinical cases, and biomedical problems. This multi-head latent attention is a game changing mechanism (compared to the multi-head attention or MHA mechanism of LLMs like GPT-4o) that can vastly reduce the computational cost of training large LLMs and thus reduces memory usage while it improves performance.
This impressive DeepSeek set of models is even more laudatory given that the cost (estimated to be $6 million) of development was significantly less than that for OpenAI’s ChatGPT. In addition, this development was also impressive as there had been restricted sale of advanced AI chips to China from the United States. One can almost argue that this reduced access to the most highly capable chips have almost forced the aforementioned innovation such as MLA (Ion Stoica, the Romanian-American computer scientist once said “constraints and scarcity fuel innovation”). Lastly, All of the DeepSeek models were purportedly “open source” but in actuality “open weight” in the sense that not all elements were open (datasets and data pipeline were not). Nevertheless, a new inflection point seems to have arrived in AI to continue its momentum for adoption and commoditization.
I have used OpenAI’s ChatGPT 4o, o1 pro, and o3-mini as well as OpenEvidence and now the DeepSeek models for clinical cases in pediatric cardiology during clinic sessions. Most of these models have performed well and the clinical “reasoning” for nuanced cases is at the level of a junior pediatric cardiology attending. Overall, DeepSeek and its congeners to follow will create a new paradigm of more expedient democratization and wider adoption of AI applications in healthcare with the powerful combination of technical sophistication and economic efficiency.
One dividend of DeepSeek is the reduced cost (purportedly $6 million for training) of deploying AI and LLMs which will then reduce the entry costs of AI in health services and companies especially ones relying on low cost application development. One promising repercussion of this DeepSeek evolution is that the smaller, more nimble, and more specialized AI companies (including those in healthcare) could now do more with less resources in their AI innovations. Another dividend of DeepSeek is its efficiency- lower energy and computational demands for good to excellent performance. Also, the open source nature (more precisely open weight as mentioned above) of the DeepSeek AI model provides access for some or most of healthcare facilities to more sophisticated AI models as well as promotes collaboration amongst healthcare institutions. Furthermore, DeepSeek with its more sophisticated reasoning framework is more effective than the prior AI models that use mainly deep learning for pattern recognition. In other words, these AI models that are capable of reason (vs inference) will be able to help healthcare domain experts even more.
Overall, these positive attributes of DeepSeek and its AI models can provide an economical, efficient, effective, explainable, and equitable (5 “E”s for convenience) AI resource to democratize AI for all stakeholders in healthcare, including patients and families, as well as to optimize elements towards the Quintuple Aim. The DeepSeek AI models will be particularly helpful for personalized medicine, drug discovery and development, remote patient monitoring and telehealth, and also healthcare administration. In China, the DeepSeek R1 AI model is already used by the Hangzhou startup ClouDr digital healthcare company and its platform to deliver efficiency gains in hospital and pharmacy administrative tasks.
A main concern of DeepSeek use in healthcare is security as the data one inputs to DeepSeek may be available to the company and its servers and thereby also the Chinese government. This issue has already led to the US not allowing DeepSeek on government devices. Another issue to address will be the regulatory aspects for not only security but also quality. The bottom line on DeepSeek and healthcare is to monitor its regulatory compliance and to proceed with caution while not over-regulate this resource and stunt its innovation.
There will be many discussions about DeepSeek and other AI developments as these relate to healthcare at the annual AIMed meeting (AIMed25) at the Manchester Grand Hyatt in San Diego on November 10-12, 2025 later this year.