Input and parameter setting

Expected format for the PPI file

TAB-separated:

ID1 ID2 PROBABILITY_OF_INTERACTION
ID3 ID2 PROBABILITY_OF_INTERACTION2
....  

where ID1, ID3 and ID2 are IDs of species proteins and PROBABILITY_OF_INTERACTION represents the reliability of the interaction (from 0 to 1). The interactions are undirected, i.e., ID1-ID2 is the same as ID2-ID1. The example below demonstrates 5 valid interactions in one protein network with varying reliability values:

# -------------BEGIN SAMPLE INTERACTION FILE------------
 
YAL001C YBR123C 0.4489
YAL002W YKR026C 0.4124
YAL002W YLR148W 0.5722
YAL002W YLR396C 0.5211
YAL002W YMR231W 0.5786
 
# --------------END SAMPLE INTERACTION FILE-------------

 

Expected format of FASTA data

The FASTA file provided should include the sequences of the queried proteins. Sequences of proteins not specified in the query will be ignored. Thus it is possible to use a single large FASTA file for many different queries, if it contains all proteins comprising those queries.

>ID1
Protein sequence of ID1
>ID2
Protein sequence of ID2
  

For example:

# -----------------BEGIN SAMPLE FASTA FILE------------
 
>YAL027W
MAPSIATVKIARDMVLPLRIFVNRKQILQTNDKTSNKSNATIFEAPLLSNNSIICLKSPN
TRIYLSQQDKKNLCDEIKEDLLLIVYELASPEIISSVLSKIRVGHSTDFQINVLPKLFAG
ADTDNAVTSHIQSVTRLAKFKYKLHYKHKWELDIFINSIKKIANLRHYLMFQTLTLNGFS
LNAGPKTLLARKIEKQPQVPNLLIENGDADALDTPVEEDIKPVIEFMYKPVINLGEIIDV
HVLHRPRRHKVRTQSKQPQEE*
>YAL055W
MPPPSRSRINKTRTLGIVGTAIAVLVTSYYIYQKVTSAKEDNGARPPEGDSVKENKKARK
SKCIIMSKSIQGLPIKWEEYAADEVVLLVPTSHTDGSMKQAIGDAFRKTKNEHKIIYCDS
MDGLWSCVRRLGKFQCILNSRDFTSSGGSDAAVVPEDIGRFVKFVVDSDVEDVLIDTLCN
*
>YBL038W
MFPYLTRMNLSIKMGGLTLKESSPNAFLNNTTIARRFKHEYAPRFKIVQKKQKGRVPVRT
GGSIKGSTLQFGKYGLRLKSEGIRISAQQLKEADNAIMRYVRPLNNGHLWRRLCTNVAVC
IKGNETRMGKGKGGFDHWMVRVPTGKILFEINGDDLHEKVAREAFRKAGTKLPGVYEFVS
LDSLVRVGLHSFKNPKDDPVKNFYDENAKKPSKKYLNILKSQEPQYKLFRGR* 
# ------------------END SAMPLE FASTA FILE-------------

TORQUE will try to retrieve sequences of query proteins not specified in the FASTA file from the Uniprot Database. FASTA files can be downloaded e.g. from Biomart.

 

Note: Input files can be uploaded gzip compressed, decompression is performed automatically.

Expected format of query complex

TORQUE matches complexes with size 4-25. Since no interaction information for the query is required, the complex should be given as a comma or whitespace separated list of protein names, and entered as free text in the relevant field. If left blank, TORQUE assumes that all the proteins specified in the FASTA file comprise the query. If the proteins are provided in Uniprot format, TORQUE will try to automatically retrieve their sequences.

For example:

# -----------------BEGIN SAMPLE COMPLEX------------
ENSMUSG00000020471, ENSMUSG00000027342, ENSMUSG00000025395, 
ENSMUSG00000006678, ENSMUSG00000026134, ENSMUSG00000024833, 
ENSMUSG00000056394, ENSMUSG00000038644, ENSMUSG00000070544, 
ENSMUSG00000024854, ENSMUSG00000030726, ENSMUSG00000020914, 
ENSMUSG00000017485
# ------------------END SAMPLE COMPLEX-------------

(This is the DNA synthesome complex in mouse).

Protein complexes to test can be obtained from many sources on the web. For example, mammalian complexes can be downloaded from CORUM, and yeast complexes can be obtained from the SGD website.

 

Predefined Data

We provide a PPI network and FASTA data for 3 species: Saccharomyces cerevisiae (yeast), Homo sapiens (human), and Drosophila melanogaster (fly). When using this data, please provide only the query proteins and their FASTA sequence. For more information on the sources of the protein interaction data and how the networks were computed, please see our RECOMB 09 paper.

All predefined data (PPI network and FASTA) is available for download below in zip format.

Saccharomyces cerevisiae

Homo sapiens

Drosophila melanogaster

 

Parameter description

  • Interaction Probability threshold : Interaction probability threshold to consider two proteins in the PPI network as interacting. Each such interaction is an edge in the resulting PPI network.
  • BLAST threshold: E-Value threshold to consider two proteins as sequence similar (and potential orthologs).