FASTA files, generated from data provided by sorfs.org
- Homo sapiens: hs_sorf.zip
- Mus musculus: mm_sorf.zip
- Rattus norvegicus: rn_sorf.zip
These FASTA files contain the redundant list of microprotein sequences, based on the sorfs.org information.
Microproteins in MHC type I experiments, generated using sorfs.org sequences
- Homo sapiens: hs_mhc1_microproteins.zip
The zip file contains a FASTA file with the non-redundant microprotein sequences and a TSV file
containing a mapping between the non-redundant sequence identifiers and the sorfs.org data set protein identifiers.
An example microprotein FASTA entry:
>msorf|000000003| sorfs: dataset=30 gpmdb: dataset=29 psm=105 overlap: ensembl=0.00
MASEAAGTR
where:
- msorf|000000003| – the unique microprotein sequence;
- sorfs: dataset=30 – the smORF RNA sequence has been identified in 30 sorfs.org datasets;
- gpmdb: dataset=29 psm=105 – the microprotein was id'd in 29 gpmDB MHC type I LC/MS/MS runs by 105 PSMs; &
- overlap: ensembl=0.00 – the fraction of PSMs that also match ENSEMBL proteins.