G-quadruplexes (G4s) play crucial roles in key biological functions, including transcription, replication, telomere maintenance, and genomic instability, and have been an important component of organismal evolution.
Quadrupia is a comprehensive database of G4 sequences for a large collection of reference genomes from GenBank and RefSeq, enabling their study across the tree of life. Each genome has been annotated using two distinct methods (regular expressions and G4hunter) for the detection of G4 sequences. In addition, the database contains a curated list of G4 sequence clusters, created using the linclust algorithm of MMseqs2.
Through Quadrupia, users can:
Quadrupia is available through: https://pavlopoulos-lab.org/quadrupia.
You can explore our database using the navigation guide at the top of the page:
By clicking on "Home" (1), you'll be directed to a page containing a brief description of G quadruplexes as a biological concept along with key information about Quadrupia. Additionally, you can view the overall statistics of the database contents and learn how to properly cite it.
By clicking on "Browse" (2), you can choose between "Genomes" or "G4 Clusters". You will be directed to a dedicated page where you can explore the G4 sequences of over 100,000 reference genomes from all organism groups (Bacteria, Archaea, Eukaryota, and Viruses). You can browse by individual genome or by clusters, where genomes containing G4 sequences are organized.
Clicking on "Advanced Search" (3), you can explore a curated list of 319,784 sequence clusters, either at the genome or cluster level.
You can perform sequence-based, HMM-based, and motif-based queries against the database's contents by clicking on "Sequence Search" (4).
The "Help" page (5) will guide you on how to use and explore all the utilities of the database, whereas the "About" page (6) will provide more information about the team and how to cite the database.
Topics in the Manual are divided in separate tabs, accessible through header buttons at the top of the page. Click on any of these header buttons to navigate to its respective section.
Long sections are divided into subsections, that can be scrolled up and down. At the end of each subsection a link exists, labeled [Back to top]. Clicking on it will return you to the top of the section
Browsing data in Quadrupia is available through the Browse menu in the navigation bar. By clicking on Browse, you can select between Genomes and G4 Clusters.
When you select Genomes, you will be directed to the Browse Genomes page. Here, you can explore the G4 sequences of over 100,000 reference genomes from all organism groups (Bacteria, Archaea, Eukaryota, and Viruses) presented in an interactive table with four columns.
By clicking on a specific accession ID (as shown in example (5)), you will be redirected to a dedicated page for that genome accession ID. Here you can find:
(A) Overview Table:
By scrolling down you can find,
(B) G-quadruplexes Section:
By selecting the G4 Clusters option under Browse, you can explore a curated list of G4 sequence clusters. These clusters are created using the linclust algorithm of MMseqs2, allowing users to explore similar G4 sequences grouped together.
When you click on Browse → G4 Clusters, you will be redirected to a new page containing a table with all the clusters generated from the G4 database sequences.
By clicking on a cluster name in Column 1 of the table, you will be redirected to a new page dedicated to that specific cluster (as shown in the example (7)). You can easily obtain the Multiple Sequence Alignment and HMM profile for each cluster by simply clicking on the respective button (colored in blue and green) located below the name of the cluster. By scrolling down, you can navigate through the page, which includes four sections (A-D):
Section A: Overview This section contains two tables (A1, A2):
Table A1: Cluster Contents This table provides the following information:
Table A2: Taxonomic Distribution This table provides information about the taxonomic annotation of G4 sequences in the cluster. You can find the percentage of G4 sequences annotated as Bacteria, Archaea, Eukaryota, and/or Viruses. It is important to note that there are clusters containing only G4 sequences identified by G4 Hunter, clusters with sequences identified by regular expression, and mixed clusters. Additionally, there are clusters that come from only one taxonomic domain and mixed clusters.
Section B: Sequence Contents This section contains two tables (B1, B2):
Table B1: Multiple Sequence Alignment In this table, you can find the multiple sequence alignment of the top 1,000 G-quadruplexes within the cluster, generated by MAFFT and displayed using Jalview. If you wish to view the full alignment, you can download it by clicking on the blue button at the bottom of the table. The alignment includes a column with sequence IDs, labels, and the aligned sequences. In the mauve box at the top of the alignment, you will find various options, features, and information accessible via blue buttons. These allow you to:
Table B2: Sequence logo (HMM profile)
Here, you can find an interactive sequence logo for the cluster. You can navigate through the image by scrolling left or right, and clicking on the image allows you to copy or save it. The height of each residue in the logo corresponds to its frequency, providing a visual representation of the sequence conservation within the cluster.
The interactive sequence logo includes several useful features:
To explore specific columns in more detail, type the column number in the search gap labeled "Column" or click on a residue. A new tab will open, providing detailed information about the selected residue/column, including occupancy, insert probability, and length, along with a table showing residue probability. To close the detailed view tab, click the "toggle column annotation" button.
Section C: Representative Sequence
In this section, you can find the representative sequence of the selected cluster, which has been extracted using the linclust algorithm. The upper part of the table provides information about the sequence, including:
The lower part of the table displays the sequence itself.
Section D: 3D structure Annotation (if it is available):
You can access the 3D structure of the representative sequence of the cluster here. These structures are sourced from other repositories of structural G quadruplexes (PDB, OnQuadro) and you can navigate to them using the provided external links. The image of the 3D structure is interactive, allowing you to zoom in and out, rotate the structure, and select specific residues. Additionally, it offers an animation feature that enables you to control parameters such as duration or mode. You can even take screenshots of the structure.
Youc can perform complex data queries by selecting Advanced Search from the navigation bar. An input form with two options will be presented. You can select between Genome Assemblies (A) and G-quadruplex Clusters (B).
Here, you can search against genome assemblies using four (4) search chunks..
By clicking Submit (5) you submit your search parameters and a new tab with your search results will be presented. Clicking on “Reset” all the filters are removed and a new search can be performed.
Here, you can search against genome assemblies using four (4) search chunks..
Quadrupia offers a number of options for performing sequence-based queries on its contents, accessible throug the Sequence Search menu in the navigation bar:
Selecting Browse → G4 Sequences will redirect you to the "G4 Sequence Search (BLAST)" page. Here, you can search by structure in two steps:
Step 1 (A): You can either input or paste one or more DNA sequences, or upload a file by clicking the "Choose File" button in the lower part of the search box.
Step 2 (B): You can adjust BLAST search parameters by selecting:
You can also try examples by clicking on the grey button labeled Load Example. An example sequence will automatically load into the box, allowing you to experiment with different search parameters and become familiar with the search panel.
After the sequence of interest has been loaded in the search panel and the search parameters have been defined, you can submit your selections by clicking the blue Submit button at the bottom of the page, which will redirect you to the results page. If you selected the Open results in a new tab option, your results will be displayed in a new tab. By clicking the red Reset button, you can clear all your selections and start a new search.
The results will be presented on a page titled "BLAST Search Results":
Here, you can find an overview of the query parameters selected for the search, including the Reference Database, Search Method, Match/Mismatch scores, Gap open/extend penalties, and E-value cutoff.
2.Download and Navigation Options:
You can download the Input Sequence you provided for the search in FASTA format by clicking the blue button labeled "Input Sequence (FASTA)". You can download the BLAST results in XML format by clicking the green button labeled "BLAST (XML)". To return to the search panel, click the yellow button labeled "Run new search"
Scrolling down, you will find a table for each hit of each query sequence from the input, displaying all BLAST results based on the selected parameters. The upper gray line shows the genome accession assembly in which the query is aligned and the specific G quadruplex hit in this genome (genome assembly accession | chromosome | start position | end position | type – Genomic/CDS/RNA).
The second line contains alignment details such as the number of the alignment (No), length of the alignment, Bit score, E-value, Query start and end positions, Hit start and end positions, identities score, positives, and gaps.
By clicking the blue button labeled "Show" under the alignment, a new table will be presented with the detailed alignment between the query and the hit. You can hide the alignment by clicking on “Hide” button.
Selecting Sequence Search → G4 Clusters will redirect you to the "G4 Cluster Search (HMM)" page. Here, you can search by structure in two steps:
You can also try examples by clicking on the grey button labeled Load Example. An example sequence will automatically load into the box, allowing you to experiment with different search parameters and become familiar with the search panel.
After the sequence of interest has been loaded in the search panel and the search parameters have been defined, you can submit your selections by clicking the blue Submit button at the bottom of the page, which will redirect you to the results page. If you selected the Open results in a new tab option, your results will be displayed in a new tab. By clicking the red Reset button, you can clear all your selections and start a new search.
The results appear in the same format as the G4 Sequence (BLAST) search.
In the Sequence Search → Motif Search section of the Sequence Search, you can search for specific motifs in two steps:
Step 1: Enter the motif in the search panel using the specified format. For G quadruplexes, this typically involves a guanine-rich region with the potential to form G-tetrads. The loops between the G residues are flexible regions that can vary in length and sequence.
Step 2: Define search parameters:
An example sequence motif, formatted as a regular expression and as an IUPAC sequence, is shown in the table below:
Regular Expression | IUPAC Sequence | |
Motif | G{3,}.{1,7}G{3,}.{1,7}G{3,}.{1,7}G{3,} | GGGNNNNNGGGNNGGGNGGG |
When you are ready, you can submit your selections by clicking the blue Submit button at the bottom of the page, which will redirect you to the results page. If you selected the Open results in a new tab option, your results will be displayed in a new tab. By clicking the red Reset button, you can clear all your selections and start a new search.
The results will be presented on a page titled "Motif Search Results":
Here, you can find an overview of the query parameters selected for the search, including the Reference Database, Search Method, Match/Mismatch scores, Gap open/extend penalties, and E-value cutoff.
You can download the results in the FASTA format by clicking the blue button labeled "Result (FASTA)". To return to the search panel, click the yellow button labeled "Run new search"
Scrolling down, you will find a table for each hit of each query sequence from the input, displaying all results based on the selected parameters. The table contains a column with the sequence header, the corresponding genome and, finally, the G4 sequence containing the queried motif.