BacFITBase Tutorial

BacFitBase is a manually curated database of bacterial genes that shows information on their relevance during host infection as measured by transposon mutagenesis. Please note that BacFITBase relies heavily on JavaScript, so make sure it is enabled in your browser. To begin, please…

Navigate to the Search tab:

Navigate to the Search tab
Simply click the screenshot to proceed directly to the Search tab.

Searching for a Gene or Protein
To search for a gene or protein, simply type in its name or identifier. Any of the following options are available: gene symbols, gene locus identifiers, NCBI protein identifiers, UniProt protein accessions, or a free-text search in the gene product's description. Then, press the Search button. NCBI protein identifiers are recommended.

Searching for Host and Pathogen
To search within a particular host and/or pathogen, please select the pathogen and/or host name in the drop-down menus and press the Search button. If no gene or protein name is given, this will result in a complete list of genes, similar to the Browse view (described below).


Search results

After searching, the search results page will display a list of any bacterial genes matching the search term and species selected:

Search Results
Simply click the screenshot to proceed directly to the Search results.

The column matched by the search term is highlighted in green (if a search term was provided). The search function also supports partial matches, so free-text terms can be used (e.g. "ribonuclease"). For each pathogen species and gene, the search results page already shows a preview of the lowest fitness z-score across all available hosts, tissues, and post-infection time points, and the corresponding p-value.

By default, the table is sorted alphabetically across all columns from left to right, starting with the pathogen. To sort the table as desired, simply click on any of the column headers. The Search Results table can be downloaded as a comma-separated CSV file for export into spreadsheet software such as Microsoft Excel using the "Download Table" button in the top right. An appropriate readable file name is automatically generated. The results can also be linked to and shared with other researchers by right-clicking and copying the "Link to these results" link at the bottom of the page.

To proceed, please click on one of the genes for a detailed view.



Detailed view of infection fitness scores for a gene

After selecting a gene of interest, a view will open with all the infection fitness information available for the corresponding gene:

View Fitness Scores
Simply click the screenshot to proceed directly to the Display page.

The heading of this page provides information on the selected protein: protein and pathogen name, length, gene name and UniProt ID.

In the table, all experimental data are listed:

A brief description on the meaning of raw score, normalized z-score and p-value is also available as mouse-over explanation on the column headers. For any proteins in UniProt, a protein visualisation is automatically provided by ProViz from the Davey lab. One exception are proteins larger than 5,000 amino acids (due to display speed limitations), though this limit is unlikely to be encountered. ProViz is an interactive exploration tool for investigating the structural, functional and evolutionary features of proteins, including Pfam domains and transmembrane regions. This is particularly useful for uncharacterised proteins.

Alternatively, the protein's FASTA sequence can be displayed by pressing the "Show protein sequence" button, along with a "Copy" link in the top right corner to copy and paste the protein's sequence into other research tools, or into the BacFITBase BLAST Search to search for similar proteins. You can also immediately search for similar proteins via BLAST (see below for more details) by pressing the "Find similar proteins" button.

To sort the table as desired, please click on any of the column headers. The current table can be downloaded as a comma-separated CSV file for export into spreadsheet software such as Microsoft Excel using the "Download Table" button in the top right. An appropriate readable file name is automatically generated. The results can also be linked to and shared with other researchers by right-clicking and copying the "Link to these results" link at the bottom of the page.



Navigate to the BLAST tab:

Navigate to the BLAST tab
Simply click the screenshot to proceed directly to the BLAST search.

The BLAST Search tab provides a search by sequence similarity. When the protein of interest is not in our database, the user may search for similar proteins using BLAST sequence alignment. Finding a similar protein with low z-score (and low p-value) is a strong indication that the query sequence may be relevant for infection.

To search for similar proteins in our database using BLAST, please paste in your protein or coding sequence in FASTA format and press the Search button. Both protein and coding sequences can be used, but please ensure that the correct format (protein or coding sequence) is specified in the drop-down menu next to the Search button (as illustrated by the examples provided).


BLAST Search results

When the BLAST alignment is ready, a search results page will open with the following information:

BLAST search results
Simply click the screenshot to proceed directly to the BLAST search results.

In this view, we display alignment performance together with a complete description of the identified hits:
Identity: The percentage of sequence identity between query and target in the successfully aligned region.
Aligned: The total number of amino acids that were successfully aligned between query and target.
Bit score: The required size of a sequence database in which the current match could be found just by chance. The bit score is a log2-scaled and normalized raw score, meaning that each increase by one doubles the required database size.
E-value: The number of expected hits of similar quality (score) that could be found in the BLAST sequence database just by chance.

The meaning of the Pathogen, Locus, Protein, Gene, Product, p-value, and Fitness z-Score columns can be found in the Browse Tab section below, or via the mouse-over information symbols in the top row of any table.

By default, the BLAST matches with the highest Bit scores are shown first, and matches with 100% sequence identity will be highlighted in green. To sort the table as desired, simply click on any of the column headers. As for all tables, the results table can be downloaded as a comma-separated CSV file for export into spreadsheet software such as Microsoft Excel using the "Download Table" button in the top right corner. An appropriate readable file name is automatically generated. The results can also be linked to and shared with other researchers by right-clicking and copying the "Link to these results" link at the bottom of the results table, provided the search sequence is below ~2,000 characters.



Navigate to the Browse Tab:

Navigate to the Browse tab
Simply click the screenshot to proceed directly to the Browse tab.

The Browse Tab
The Browse tab provides an overview of all entries in the BacFITBase database. A pathogenic species of interest can be chosen in the selection element at the top. This table is sorted by significance and fitness z-score. It displays pathogen genes with a high and significant fitness impact during infection at the top, followed by insignificant genes by increasing fitness z-score. Genes with a significant increase in pathogen fitness are listed at the very end of the table.

Arrows next to each field provide links to useful external databases:
Pathogen: Links out to the NCBI Taxonomy database, a comprehensive taxonomic database.
Locus: Links out to the Ensembl Bacteria database, which provides genome annotation for many bacterial species.
Protein: Links out to the NCBI Protein database, which provides protein sequences and information.
UniProt Accession and Gene Symbol: Links out to the UniProt Knowledgebase, which provides comprehensive protein annotation.

Click the Locus, Protein, UniProt Accession or Gene Symbol entries to view details for the given protein in the external databases. This information is also available as a mouse-over explanation in the Browse tab.

As for all tables, the table can be downloaded as a comma-separated CSV file for export into spreadsheet software such as Microsoft Excel using the "Download Table" button in the top right. An appropriate readable file name is automatically generated. The results can also be linked to and shared with other researchers by right-clicking and copying the "Link to these results" link at the bottom of the page.



Navigate to the Download tab:

Navigate to the Download tab
Simply click the screenshot to proceed directly to the Download tab.

Downloading the Entire Database
To download the entire BacFITBase database for local analysis, please click the link available under the Download tab. Currently, BacFITBase v1 is available, and will be upgraded with new data as they become available.



See also:
The About section for background information, and please feel free to contact us!