The ability to “close to the attending clinicians of the three A hospitals”, but from the complete replacement of a number of regulatory and performance hurdles.

TCS, a rare disease with a prevalence of 0.005-0.025% in newborns, has been successfully diagnosed by ChatGPT - after Alex, who had the misfortune of suffering from TCS, had already consulted 17 doctors.

The success of ChatGPT has certainly thrown the AI Big Model into the medical world.

In fact, since May 2023, domestic and foreign technology giants such as Google, Nvidia, Amazon, Tencent, Jingdong, etc., have taken the lead in the layout of medical big models - according to incomplete statistics, as of August 2023, more than 40 medical-related AI models have been released in China alone. The scenarios in which each model has landed are similar, mainly focusing on intelligent consultation, image diagnosis, and knowledge query.

On 19 September 2023, Baidu handed in a medical big model assignment: the “Spirit Medical Big Model”.

In addition to the overall leap in big model technology, Baidu’s release of the medical big model hit the east wind of the opening of the policy gates. The State Drug Administration apparatus review centre for a new generation of AI technology, released in succession, “deep learning assisted decision-making software review points”, “artificial intelligence medical device registration review guiding principles (draft)” and other documents, in the construction of the regulatory mechanism at the same time, but also released a signal to accelerate the landing of AI-assisted medical treatment.

As for the bottom line of commercialisation, it lies in Baidu’s pre-customer cooperation cases.

For example, in the cooperation with the People’s Health Publishing House, the Spirit Medical Big Model makes up for the short board of the label search, so that doctors and patients will be able to find the corresponding knowledge through natural language description;

As another example, Cao Zhanqiang, Deputy Director of the Information Centre of Peking University Stomatological Hospital, mentioned that in addition to showing stronger capabilities in disease diagnosis, the big model can also handle multi-dimensional and complex hospital management data to improve the efficiency and safety of medical work. For example, if there are 50 patients in a ward, including patients with different stages of hypertension and heart disease, the big model will be able to flexibly arrange the time for doctors to intervene and operate in advance according to the number of patients and the severity of the disease.

Baidu wants to fill up the skill points, the model should be able to see patients, triage management, doctor training
The above practical experience, on the one hand, has accumulated desensitised clinical data for the birth of the Spirit Medicine Big Model.

According to the official introduction, the total amount of training data of the Ling medical model has reached hundreds of billions of Token, mainly covering the “medical - patient - medicine” scenario. For example, the hospital’s data includes desensitised clinical data, knowledge graphs, etc., while the patient’s data mainly comes from the health Q&A data accumulated by Baidu Search.

On the other hand, the cooperation case has also explored more scenarios for the commercialisation of the Spirit Medicine Big Model.

Baidu calls LingMed’s big model the “first industrial-grade” medical big model in China, because LingMed has eaten most of the medical scenarios that can be landed by the big model at one bite, including document comprehension, medical record comprehension, medical Q&A, etc. At the same time, Baidu’s commercialisation ambitions have also explored more scenarios for the commercialisation of the big model. At the same time, Baidu’s commercialisation ambitions are not only limited to hospitals, but also extend to different medical scenarios such as pharmaceutical companies, pharmacies and online hospitals.

Specifically to the deployment of the way, due to the reform of health care around the world, many costs need to be borne by the hospital department, the cost of expensive medical model at the outset will “dissuade” a lot of organisations.

In this regard, Baidu adopts the strategy of tailoring to the needs of the parameters from large to small, launched the flagship version (100 billion parameters), Lite version (one billion, ten billion parameters), customised version (customised based on customer data) of the three model services, according to the customer’s degree of sensitivity to the data, the use of private deployment or public domain deployment mode.


However, combining AI with healthcare, which is a matter of life and health, is tantamount to dancing on the tip of a knife.

The resistance to landing lies, first of all, in the gap in regulatory policy. Previously, You Mao, deputy director of the Health Development Research Centre of the National Health Commission, said in an interview with China Business News that the patent data of AI medical devices are mostly concentrated in the fields of “machine learning”, “medical imaging”, “natural and language processing” and “medical technology”. and language processing”, “knowledge base” and other areas of research is relatively insufficient, “decision rules” field research is almost blank.

More importantly, the question is whether the capabilities of current medical models are comparable to those of human doctors.

In order to know the capability of medical AI, “human-machine consistency test” is often essential. The so-called human-machine consistency test refers to having a human and a machine compete for skills in the same scenario. For example, the diagnostic consistency between MedGPT of Medlink Group and doctors in tertiary hospitals reached 96%; in the medical Q&A test of Google’s Med-PaLM, 92.9% of the answers were comparable to those of clinicians, and 92.9% of the lengthy answers were in line with the scientific consensus.

Even though some of the big medical models have reached the level of general doctors in terms of questioning consistency, there are still some problems when landing on specific diagnosis and treatment scenarios. For example, MedGPT is unable to conduct medical checkups during the consultation process, as well as unable to give patients more humane care.

The ability of the Spirit Medicine model is “close to the attending clinician of a tertiary hospital.”

From a launch event alone, it is difficult to gauge the true weight of the Spirit Medicine Big Model.

“I also watched a very large number of big model launch, the overall feeling is that everyone is comparing the parameters, than the performance, than the ranking.” Baidu Group Senior Vice President, Baidu Health Business Group President He Mingke opened, pointed out that the domestic large model track “group model chaotic dance”.

So Baidu’s Ling medical model performance can stand out in the “group model”? As well as in the landing process, medical institutions on the acceptance of AI big model how?

The media, with Liu Junwei, general manager of the AI industry department of Baidu’s big health business group, and Huang Haifeng, head of research and development of the AI industry department of Baidu’s big health business group, launched a dialogue:

  • Q: What are the main sources of training data for the Spirit Medicine Big Model? What are the advantages in terms of professional data sources compared to other companies?

  • Huang Haifeng: First of all, the data of the big model actually covers the three aspects of “doctor, medicine and patient”, which is also the advantage of Baidu’s medical big model, which can get the data of hospitals, online patients and medicines at the same time. Baidu has Baidu Health, Smart Healthcare, and GBI, a medical information data provider, which are more comprehensive in terms of data dimensions.

In terms of data quality, Baidu search has 200 million health searches per day, including smart medical products, a more comprehensive understanding of the data in the project landing, including data governance and data quality control technology, can also guarantee that the data used for training from the quality of better.

  • Q: So which part of the training data is the most core of the medical big model? Is there a difference in the importance of different kinds of data, such as consultation records and clinical data?

  • Huang Haifeng: First of all, there is no standard answer to which part is more core, there are different core data for different scenarios. For example, if we want to generate medical records, the most useful data for this task is the real medical record data, and if it is some kind of scientific Q&A, the most useful data may be the data of the medical dictionary, the dialogue between doctors and patients, and the real online consultation data.

  • Q: What are the scarce training data in China? What are the channels for Baidu to get these data?

  • Huang Haifeng: We basically have all these types of data, which is relatively complete in terms of types. In terms of access channels, there are both public and private. The public ones are Internet data, and we do strict quality screening because the quality varies. We also go through a variety of strategies, and through some small models, we first do some pre-processing, cleaning and screening of the data.

Some high-quality data, like data from electronic medical records, on the one hand, need to do strict desensitisation, and even do small model training in the hospital environment, so that the model can learn knowledge, and then do the data is not discharged from the hospital, the model is discharged from the hospital.

In terms of acquiring data, the most difficult is the hospital medical record data. Some other knowledge data, like the authoritative knowledge data of the Human Health Publishing House, is actually more crucial for the whole big model. Meanwhile, drug-related data is also very important. We acquired GBI this year, which covers 95% of multinational drug companies in the world, and the data accumulated for so many years is of great help to the model.

  • Q: The conference did not specifically mention some data about the performance of the Spirit Medicine model. Has Baidu ever done similar human-machine consistency ratings before? For example, the doctor and the big model were put in the same scene to make some comparisons.

  • Zhu Dongwei (product leader of AI industry department of Baidu’s big health business group): There are two layers, the base of the Wenxin big model will certainly do general knowledge evaluation, the medical big model will be compared with the human being, there are two layers of evaluation set, the first one is Baidu’s internal team of doctors, and the second one will be looking for external doctors of tertiary hospitals.

  • Q: How effective?

  • Zhu Dongwei: The results of our tests are close to the attending clinicians of the tertiary hospital.

  • Q: The seriousness of medical care may determine that it has a lower tolerance for error, compared to the application of large models in other fields, the commercialisation of medical scenarios will be a little more difficult?

  • Liu Junwei: First of all, there is a basic understanding of medical treatment: medical treatment itself is a livelihood project, which has both commercial properties and social values.

Now many customers will take the initiative to find us to cooperate, before we are good service, good product, their own initiative to find a variety of partners, today we find that there are a lot of people, for example, just mentioned no matter whether it is a pharmacy or even outpatient clinic, are willing to take the initiative to cooperate with Baidu. At the same time, our product line is open for testing, and through these tests we also found that there is indeed value, so the path to commercialisation is still good.

  • Q: For the medical industry, how high is the cost of introducing a large model?

  • Liu Junwei: For us, the cost of doing such a big model of spiritual medicine is objectively speaking controllable. Just now we said that there are already these capabilities, and even in the process of pre-training Wenxin big model, with some idle resources can be run out of the spirit of the medical big model, which is the other companies are relatively to go from 0-1 to do it, or before there is no accumulation of big model or even industry knowledge, there is a certain relative threshold.

  • Q: How is the current willingness of medical institutions to pay for big models? Previously, AI image-assisted diagnosis technology is also because of the high cost is more difficult to commercialise the landing.

  • Liu Junwei: This medical institutions to see divided into several parts, the first is that we understand the traditional public hospitals, and now also in the exploration, whether it is just said Beida Stomatology, through the way of the joint project, we are now with the Fudan Zhongshan is also in the co-operation, through the deployment of our Lite light version of doing some cooperation.

Secondly, regarding the demand for medical large models, firstly, we see a lot of demand in the scientific research scenario, and on the other hand, we also find that large models have been improved in terms of information technology, such as in the generation of medical records, and so on.

On the other hand, we also see that there is more application space for ToB scenarios such as chain groups and pharmacies. Public hospitals are more representative of serious medical care, and there is more room for imagination in ToB scenarios, for example, Internet hospitals also have sub-guidance and pre-questioning. We have also repeatedly stressed that ** public hospitals have accumulated a large model of product capabilities, can be quickly replicated to the ToB hospital scene inside to increase our commercialisation space **.

  • Q: Liu also mentioned that many companies they are rubbing hot spots, so please teach a question, how to judge whether the medical AI big model is reliable, what aspects can we judge?

  • Liu Junwei: I think there are three dimensions to see if a company is reliable in terms of big models.

The first is whether there is a sub-evaluation of the dataset, and whether it has been certified by the three doctors or authoritative institutions.

The second is whether there is an open product for people to experience. Many companies have not actually seen the product, staying in the research and development stage. Today we launched the spirit doctor BOT such products, we can really go to experience the test.

The third most important thing is that there are real customer cases, especially commercial cooperation, which can show the recognition of the industry.