Small ORF protein sequence FASTA files

FASTA files, generated from data provided by sorfs.org

Homo sapiens: hs_sorf.zip
Mus musculus: mm_sorf.zip
Rattus norvegicus: rn_sorf.zip

These FASTA files contain the redundant list of microprotein sequences, based on the sorfs.org information.

Microproteins in MHC type I experiments, generated using sorfs.org sequences

Homo sapiens: hs_mhc1_microproteins.zip

The zip file contains a FASTA file with the non-redundant microprotein sequences and a TSV file containing a mapping between the non-redundant sequence identifiers and the sorfs.org data set protein identifiers.

An example microprotein FASTA entry:

>msorf|000000003| sorfs: dataset=30 gpmdb: dataset=29 psm=105 overlap: ensembl=0.00
MASEAAGTR

where:

msorf|000000003| – the unique microprotein sequence;
sorfs: dataset=30 – the smORF RNA sequence has been identified in 30 sorfs.org datasets;
gpmdb: dataset=29 psm=105 – the microprotein was id'd in 29 gpmDB MHC type I LC/MS/MS runs by 105 PSMs; &
overlap: ensembl=0.00 – the fraction of PSMs that also match ENSEMBL proteins.