Since 2019, roughly $60 billion has gone into using AI to discover drugs. About 175 AI-originated programs have entered human trials. The number approved by the FDA is zero.
A zero this early is not surprising on its own; it matches traditional drug discovery, but timing makes it worth attention. Over the next two years, a cluster of the most advanced AI-designed drugs will read out of mid- and late-stage trials. We are reaching the final turn of the race.
We should ask a more nuanced question than “does AI work in drug discovery.” Rather, we should ask which approaches are about to be proven, and which are about to be exposed, because the field is several bets, and they are not equally sound.
Why AlphaFold worked
Start with the one result everyone agrees on. In 2020, DeepMind’s AlphaFold predicted the three-dimensional shape a protein folds into from its amino acid sequence, solving a problem that had resisted biology for fifty years. It later won a Nobel Prize. It worked for a reason, the same one behind the LLM revolution: the training data encodes the useful thing it was being asked to find.
A protein is, in a precise sense, written in a language. There is a finite alphabet of twenty amino acids and each protein is a sequence read in one direction, across an enormous corpus, more than 250 million sequences. There is a hidden grammar: when a protein folds, positions along the sequence that end up touching tend to mutate together over evolutionary time, leaving a statistical fingerprint. AlphaFold’s architecture, the same family of model that powers LLMs, was built to read exactly that kind of patterned sequence. The mechanism it was asked to recover was sitting in the data, and it found it.
Compare that to what most of the rest of the field trained on: cell images, the published literature, patient records. These models are capturing the appearance of a disease with fidelity, but not the mechanism, because it was never in the data.
The horses in the AI drug discovery race
Three groups of competitors emerge when you sort them by the training data underneath.
1. Phenotypic data – The first group trains on data that records how something looks. BenevolentAI is a clear example. Its model is a knowledge graph of scientific literature. In late January 2020 its team successfully used the knowledge graph to identify Eli Lilly’s baricitinib as a Covi-19 treatment, and in 2022 it went public through a significant SPAC merger. Then its lead drug, a topical treatment for eczema, failed a mid-stage trial; the data suggested the molecule worked as designed but the target did not translate to benefit. In March 2025, it delisted from the Euronext Amsterdam exchange. A roughly £1.3B valuation became a company taking itself private in under 3 years.
Another example is Recursion’s REC-994 for cerebral cavernous malformation. Recursion used AI analysis of perturbed cell images to select a molecule to repurpose. It met its Phase 2 safety endpoint, but was discontinued in May 2025 after long-term extension data failed to show sustained improvement.
In 2025, several independent benchmarking studies tested the largest cell-based foundation models on the task the whole grouping depends on: predicting how a cell would respond to a perturbation it had not seen in training. The cleanest, published in Nature Methods, found that none of the models beat a trivial baseline that amounted to averaging the responses already in the training data. A model with a hundred million parameters did no better than arithmetic.
2. Surrogate data – The second group trains on data that carries the design task itself: structure-to-property relationships are real and learnable, and a closed test-learn loop generates its own labels. Exscientia is a good example. One famous proof point was DSP-1181 designed on its platform for Sumitomo in about 12 months versus roughly five years. The Phase 1 data was announced in Jan 2020 and it was discontinued in 2021 when it did not meet expected criteria. Its training data encodes “will this molecule bind and behave against the target we chose.” It says nothing about whether that target is the right one for the disease. The company was later absorbed into Recursion, expanding its approach to training data.
Insilico Medicine is arguably in both the surrogate and phenotypic data group. They use two bodies of training data: phenotypic data (for targeting) and surrogate data (for molecule design). The result, rentosertib, showed very strong safety data in Phase 2a trial and an encouraging but small early signal from a trial not designed to prove efficacy. The molecule is safe and behaves as designed. The efficacy verdict will depend on whether the AI picked the right target and that came from the weaker data. Note, Insilico’s success was achieved alongside traditional medicinal chemistry.
3. Mechanistic data – The third group of candidates trains on data that encodes the relevant mechanism as AlphaFold did. The families that followed, including ESM, RFdiffusion, and ProteinMPNN, extend the same logic from predicting existing proteins to designing new ones. Antibody design sits on the most language-like data in biology, with effectively unlimited training material from natural and engineered repertoires and a binding mechanism that is sequence-encoded.
The clinical evidence here is earlier but arriving. Generate Biomedicines has two AI-designed antibodies in the clinic, including GB-0895, an anti-TSLP antibody entering in Phase 3 for severe asthma, and GB-0669, a COVID-neutralizing antibody that reached a previously undruggable target with positive Phase 1 data. Isomorphic Labs, built directly on AlphaFold, has pushed its first-in-human trials to the end of 2026 and is not yet dosing patients, but is the cleanest test of the original mechanism in a drug-design setting.
None are yet at the point where the companies named earlier sit. The comparison is only structural so far. The companies above placed drugs into trials using training data that did not carry the mechanism the models needed to learn. Ultimately, the training data must encode whether the drug works in the human body, in the context of disease. This dataset is elusive.
For this group, a designed antibody still has to behave as a drug in a human, and the next two years will test that, but the structural question is already answered correctly. That is a different starting position.
The final stretch
None of this means AI failed in drug discovery. Most of these stories are not over, and AlphaFold shows how high the ceiling goes. The field placed several bets, and so far the one that paid was where the training data carried the mechanism.
When evaluating the AI race, the instinct is to evaluate the model, its size, benchmark, and demo. The first question should be whether the data underneath the model encodes the thing you are asking the model to predict. If you want a tool to predict which patients respond to a therapy, ask whether the training data holds the causal link between treatment and response, or only records who got what.
The AI drugs that have made it to trials fare better than the industry standard in Phase 1, a safety gate, but no better in the later stages where efficacy is tested, though the sample is still small. Their training data did not encode that success. This phase-gate discrepancy is the difference between reversing Eroom’s Law and accelerating it. If AI simply generates candidates faster, it only speeds up the rate at which companies reach expensive mid-stage failures. To actually reverse Eroom’s Law, AI must increase the probability of success, not just the velocity of discovery.
As the horses race to the finish over the next two years, it will be read as a verdict on AI in medicine. It is better read as a verdict on data. Where the training data carried the mechanism, the results will hold. Where it carried only the appearance, no model will recover what was never there.
Photo: metamorworks, Getty Images
Andrew Ryscavage is the founder of Brinton Bio. He spent two decades inside life sciences organizations, with earlier work at the NIH and Monitor Deloitte. He writes about where artificial intelligence is and is not changing how medicines get made. He is based in Tampa, FL and in Bethesda, MD.
This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.
