PSI-AI in Rome: Setting the scaffolds of AI-readiness

A recap of the HUPO-PSI Spring Meeting 2026 sessions on AI-readiness

community
workshop
A recap of the HUPO-PSI Spring Meeting in Rome, where the AI-readiness Working Group discussed quantification formats, SDRF metadata, MIAPE-AI guidelines, and ethics across the proteomics data lifecycle.
Authors
Affiliations

Tine Claeys

VIB-UGent Center for Medical Biotechnology, VIB, Belgium

Department of Biomolecular Medicine, Ghent University, Belgium

Ralf Gabriels

VIB-UGent Center for Medical Biotechnology, VIB, Belgium

Department of Biomolecular Medicine, Ghent University, Belgium

Samuel Wein

University of Tübingen, Germany

OpenMS

Published

May 27, 2026

Introduction

The HUPO-PSI (Deutsch et al. 2023) AI-readiness Working Group joined the most recent Spring Meeting in Rome, where the full week was devoted to the many facets of AI in proteomics. AI is no longer an isolated corner of the conference program but has become one of the most prominent and fastest-evolving parts of our field, shaping how we process spectra, extract metadata, benchmark tools, and arrive at diagnostic conclusions. The Spring Meeting was an opportunity for the Working Group to decide how best to support and guide responsible AI development within this landscape.

Group picture of the HUPO-PSI Spring Meeting 2026, 5-8 May 2026, Rome (IT)

A recap of the PSI-AI sessions

Hands-on training in reprocessing public proteomics data

The week opened with the Education Day, a new format aimed at introducing PSI standards to newer members. The AI-readiness Working Group used the day for two things: a morning session on the core concepts of reprocessing public proteomics data, including the standards and analytical hurdles involved, followed by an afternoon of hands-on technical and biological tutorials that translated the conceptual material into actionable code. The materials will be shared openly and reused at the HUPO pre-congress workshop, which means the audience for this work extends well beyond the people who happened to be in the room.

NoteHands-on tutorials

Try out the hands-on tutorials from the Education Day and learn how PSI standards can be used in AI workflows for peptide property prediction:

  1. From search results to a spectral library (mzSpecLib, mzPAF, ProForma, and PSI-MOD)
  2. Retention time prediction (mzSpecLib, ProForma)
  3. Fragment intensity prediction with a BiLSTM (mzSpecLib, mzPAF, USI, and PROXI)

AI as a new layer on existing PSI standards

The meeting itself opened with a joint coordinating session, and the value of that format became clear almost immediately. AI is now present across the entire experimental workflow, which means the Working Group needs to engage closely with the other PSI Working Groups and treat AI as a new layer on top of existing standards rather than a parallel track. That principle keeps the work moving faster and preserves interoperability.

Adapting AnnData for proteomics with PSI-AI and scverse

Nowhere was this more visible than in the quantification format discussion. The proposal builds on AnnData (Virshup et al. 2024), the format the scverse community developed for single-cell omics, and adapts it for proteomics quantification. The division of labor is deliberate: PSI-AI defines the ontology and validation rules, while scverse contributes the underlying implementation. Feedback in the room was positive, particularly on the use of controlled vocabularies and on linking SDRF with AnnData so that provenance travels with the data, going beyond raw measurements to capture the statistical and biological transformations and conclusions drawn from them. LinkML emerged as a candidate for handling the unstructured section of AnnData, and the next steps focus on defining the specification, the validator, and reference implementations.

mzPeak is a faster alternative to mzML for upstream data access

If quantification was about giving downstream analysis a stable home, the mzPeak (Van Den Bossche et al. 2025) session was about making upstream access faster. This effort, a core focus of the PSI-MS group, already shows substantial speed improvements over mzML, and implementations are appearing across multiple programming languages. The next step is to formalize the standard, and vendor engagement will be actively pursued, since input from instrument manufacturers will be essential to ensure widespread adoption.

Automated metadata extraction from raw files and publications

Standards are only as useful as the metadata that travels with them, which is why the SDRF (Dai et al. 2021) session drew so much attention. HAMLET, an agentic pipeline that extracts structured metadata directly from raw files and accompanying publications, was introduced as a way to close the gap between the data that exists in public repositories and the metadata that should accompany it. The framework is tunable through prompts and ontologies, but the design choice that matters most is the emphasis on normalization rather than on the LLM itself. An essential point going forward will be a clear separation between human-annotated and AI-generated metadata files, so that downstream users always know which is which.

Toward a practical checklist for AI in proteomics

That naturally led into the benchmarking and MIAPE-AI session, where ProteoBench (Devreese et al. 2025) and the ProteomicsML (Rehfeldt et al. 2023) index of existing methods and workflows set the stage. The group identified the AI tasks that currently appear most often in proteomics and began outlining MIAPE-AI, starting from the DOME guidelines. The ambition is pragmatic: a checklist that developers and reviewers can both use, with topic-specific flavors much like SDRF has spawned for different experimental designs. Retention time prediction, de novo sequencing, and biomarker discovery each deserve their own profile, and existing MIAPE documents will be revisited to incorporate AI-specific considerations rather than being displaced by new ones.

Responsible AI across the proteomics data lifecycle

Ethics is an enormous topic, so the group approached it through the data lifecycle, since each step carries its own ethical concerns. Patient-level questions such as cohort selection, informed consent, and identifiability sit on one side, while data-level concerns such as encryption, metadata practices, and FASTA file handling sit on the other. Modeling-stage discussion centered on the DOME guidelines, and a viewpoint paper capturing this landscape is now in preparation.

HUPO-PSI Spring Meeting 2026, 5-8 May 2026, Rome (IT)

Looking ahead…

Three concrete deliverables are taking shape: the quantification format, the ethics viewpoint, and MIAPE-AI. Alongside those, the group is exploring formal collaborations with ELIXIR, the HUPO AI initiative, and other adjacent groups, with a longer-term goal of producing a PSI-powered, AI-ready proof-of-concept dataset that demonstrates the full integrated pipeline in practice. Four new members have joined the Working Group since the previous meeting, and the cadence remains the same, namely every two weeks on Tuesdays from 17:00 to 18:00 Central European time.

If any of this resonates, get in touch at . There is more than enough work to go around.

References

Dai, Chengxin, Anja Füllgrabe, Julianus Pfeuffer, et al. 2021. “A Proteomics Sample Metadata Representation for Multiomics Integration and Big Data Analysis.” Nature Communications 12 (1). https://doi.org/10.1038/s41467-021-26111-3.
Deutsch, Eric W., Juan Antonio Vizcaíno, Andrew R. Jones, et al. 2023. “Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work.” Journal of Proteome Research 22 (2): 287–301. https://doi.org/10.1021/acs.jproteome.2c00637.
Devreese, Robbe, Caroline Jachmann, Bart Van Puyvelde, et al. 2025. ProteoBench: The Community-Curated Platform for Comparing Proteomics Data Analysis Workflows. December. https://doi.org/10.64898/2025.12.09.692895.
Rehfeldt, Tobias G., Ralf Gabriels, Robbin Bouwmeester, et al. 2023. “ProteomicsML: An Online Platform for Community-Curated Data Sets and Tutorials for Machine Learning in Proteomics.” Journal of Proteome Research 22 (2): 632–36. https://doi.org/10.1021/acs.jproteome.2c00629.
Van Den Bossche, Tim, Theodore Alexandrov, Aivett Bilbao, et al. 2025. “mzPeak: Designing a Scalable, Interoperable, and Future-Ready Mass Spectrometry Data Format.” Journal of Proteome Research 24 (11): 5329–35. https://doi.org/10.1021/acs.jproteome.5c00435.
Virshup, Isaac, Sergei Rybakov, Fabian J. Theis, Philipp Angerer, and F. Alexander Wolf. 2024. “Anndata: Access and Store Annotated Data Matrices.” Journal of Open Source Software 9 (101): 4371. https://doi.org/10.21105/joss.04371.

Reuse

Citation

BibTeX citation:
@online{claeys2026,
  author = {Claeys, Tine and Gabriels, Ralf and Wein, Samuel},
  title = {PSI-AI in {Rome:} {Setting} the Scaffolds of {AI-readiness}},
  date = {2026-05-27},
  url = {https://www.psi-ai.org/blog/posts/2026-05-27-psi-spring-meeting/},
  langid = {en-US}
}
For attribution, please cite this work as:
Claeys, Tine, Ralf Gabriels, and Samuel Wein. 2026. “PSI-AI in Rome: Setting the Scaffolds of AI-Readiness.” May 27. https://www.psi-ai.org/blog/posts/2026-05-27-psi-spring-meeting/.