FASTA files, generated from data provided by

  1. Homo sapiens:
  2. Mus musculus:
  3. Rattus norvegicus:

These FASTA files contain the redundant list of microprotein sequences, based on the information.

Microproteins in MHC type I experiments, generated using sequences

  1. Homo sapiens:

The zip file contains a FASTA file with the non-redundant microprotein sequences and a TSV file containing a mapping between the non-redundant sequence identifiers and the data set protein identifiers.

An example microprotein FASTA entry:

>msorf|000000003| sorfs: dataset=30 gpmdb: dataset=29 psm=105 overlap: ensembl=0.00


  1. msorf|000000003| – the unique microprotein sequence;
  2. sorfs: dataset=30 – the smORF RNA sequence has been identified in 30 datasets;
  3. gpmdb: dataset=29 psm=105 – the microprotein was id'd in 29 gpmDB MHC type I LC/MS/MS runs by 105 PSMs; &
  4. overlap: ensembl=0.00 – the fraction of PSMs that also match ENSEMBL proteins.