Sequence-based drug–target interaction prediction

How we used self‑supervised learning + gradient boosting on 1D molecule and protein sequences to predict drug–target interactions—fast, interpretable, and state‑of‑the‑art.

  ·   3 min read

The one‑line problem #

Predicting drug–target interactions (DTIs) is central to early drug discovery: does a candidate molecule likely bind the protein we care about? Many modern models need high‑quality 3D structures or large bespoke datasets. In reality, those are scarce. We asked: can we get strong predictions using only one‑dimensional inputs—SMILES for the molecule and the amino‑acid sequence for the protein?

The idea in plain English #

We built BarlowDTI, a two‑step approach:

  1. Learn a shared “language” of drug–target pairs with a self‑supervised Barlow Twins objective.
  2. Predict interactions using a compact gradient‑boosted model trained on the learned embeddings.

This approach is based one the TwinBooster architecture, yet refined to the DTI task.

Why 1D? Because it’s cheap, fast and widely available. Sequences exist for nearly every protein, and SMILES come with every compound. We also tap into structure‑aware protein language models that encode subtle 3D hints directly from sequence.

What we actually built #

  • Molecule view: SMILES → ECFP fingerprints.
  • Protein view: Amino‑acid sequence → embeddings from a 3D‑aware protein language model.
  • Self‑supervised alignment: a Siamese setup with the Barlow Twins loss encourages invariant yet non‑redundant features across the two views.
  • Gradient‑boosted head: we extract embeddings and train XGBoost—fast, robust, frugal.

For readers who enjoy a formula, the Barlow Twins objective nudges the cross‑correlation matrix \(\mathcal C\) between the two views towards the identity:

$$\mathcal{L}_{BT} = \sum_i (1 - \mathcal{C}_{ii})^2 + \lambda \sum_{i}\sum_{j\ne i} (\mathcal{C}_{ij})^2.$$

How we tested it #

We evaluated across common DTI benchmarks and splits (e.g. BioSNAP, BindingDB, DAVIS, including Kang et al. splits). BarlowDTI achieved state‑of‑the‑art performance using only 1D inputs.

To understand what drives predictions, we adapted an influence‑function style analysis to trace impactful training pairs. Comparing against co‑crystal structures, the model’s focus aligns with catalytically active and stabilising residues—evidence that the sequence‑based signal captures meaningful interaction biology.

Finally, for real‑world use we trained BarlowDTI XXL on >3.6 million curated DTI pairs and showed strong early enrichment in retrospective virtual‑screening case studies.

Why this matters #

  • Accessible inputs: No crystal structures required; works from SMILES and sequence.
  • Compute‑friendly: Deep learning where it helps (representation), boosting where it counts (prediction).
  • Interpretable signals: Training‑example influence and residue‑level hints help build trust.
  • Practical tooling: There’s a web interface for quick experiments and teaching.

Who might find this useful #

  • Medicinal chemists needing a ranked shortlist before committing to assays.
  • Computational teams building rapid, scalable virtual screening workflows.
  • Method developers interested in self‑supervision and protein language models for DTI.

Limitations (and what’s next) #

  • Sequence‑only methods can miss cases where precise 3D geometry is decisive; if reliable structures exist, structure‑based approaches may edge ahead.
  • Performance relies on embedding quality and dataset curation; biases in public DTI sets can leak into models.
  • We’d love to see prospective (live) tests and broader community baselines on new targets.

Read the paper #

Schuh, M.G., Boldini, D., Bohne, A.I. et al. J Cheminform 17, 18 (2025). 10.1186/s13321‑025‑00952‑2

Try it #

Web demo: bio.nat.tum.de/oc2/barlowdti


TL;DR #

  • What we did: Built BarlowDTI, a 1D DTI model that fuses SMILES and protein sequences via Barlow Twins and a gradient‑boosted predictor; scaled up to BarlowDTI_{XXL}.
  • Why it matters: Delivers state‑of‑the‑art DTI predictions with cheap inputs, modest compute, and useful interpretability.
  • What it’s for: Rapid virtual screening, target triage, and as a strong baseline when structural data are limited.