FASTA files, generated from data provided by sorfs.org

  1. Homo sapiens: hs_sorf.zip
  2. Mus musculus: mm_sorf.zip
  3. Rattus norvegicus: rn_sorf.zip

These FASTA files contain the redundant list of microprotein sequences, based on the sorfs.org information.

Microproteins in MHC type I experiments, generated using sorfs.org sequences

  1. Homo sapiens: hs_mhc1_microproteins.zip

The zip file contains a FASTA file with the non-redundant microprotein sequences and a TSV file containing a mapping between the non-redundant sequence identifiers and the sorfs.org data set protein identifiers.

An example microprotein FASTA entry:

>msorf|000000003| sorfs: dataset=30 gpmdb: dataset=29 psm=105 overlap: ensembl=0.00
MASEAAGTR

where:

  1. msorf|000000003| – the unique microprotein sequence;
  2. sorfs: dataset=30 – the smORF RNA sequence has been identified in 30 sorfs.org datasets;
  3. gpmdb: dataset=29 psm=105 – the microprotein was id'd in 29 gpmDB MHC type I LC/MS/MS runs by 105 PSMs; &
  4. overlap: ensembl=0.00 – the fraction of PSMs that also match ENSEMBL proteins.