How we built an AI pipeline for antibiotic discovery

The problem

Antibiotic resistance is not just a scientific challenge. It also challenges the way we organize discovery. We now have many computational tools that promise to help, from protein structure prediction to generative chemistry, but we still lack clear guidance on how to connect them into a discovery process that people can actually use.

The idea

In this project, we tackled that gap directly. Instead of presenting yet another isolated model, we asked what an end-to-end AI-guided antibiotic pipeline should look like if we want to move from target selection to a realistic set of candidate compounds.

How it works

We began by looking across predicted proteomes from multiple pathogens to identify targets that are conserved, essential, and not close human counterparts. We then compared six 3D-structure-aware generative models from different families, including diffusion, autoregressive, graph neural network, and language-model approaches. From there, we kept the workflow deliberately practical: we used post-processing filters and commercial analogue searches to reduce more than 100,000 generated compounds to a much smaller set that we could realistically synthesize.

What the paper showed

We found that DeepBlock and TamGen were the strongest overall performers across the criteria we examined, but we also found that no single model solves the whole problem. Some methods are more sophisticated but harder to use well. Others are easier to deploy but weaker on chemical or biological relevance. That makes this paper useful not just as a benchmark, but as a map of the trade-offs we face when we try to turn AI promises into decisions.

Why it matters

I think that broader view matters because early-stage antibiotic discovery often suffers from fragmentation. We can easily end up with many impressive components and very little guidance on how to combine them. In this work, we shifted the focus from isolated performance claims to pipeline design: what we should prioritize, where the bottlenecks are, and how we can move from a large virtual search space toward something that could matter in practice.

Limits

We are still describing an early-stage blueprint, not a finished discovery engine. Our conclusions depend on the quality of the target-selection logic, generative models, and filtering steps underneath the pipeline, and all of those components will keep evolving. But that is part of the point: in this paper, we tried to make the workflow itself easier to see and judge.

Read the paper

Schuh, M. G.; Hesse, J.; Sieber, S. A. arXiv 2025. 10.48550/arXiv.2504.11091