RNApdbee

In the RNA structural biology and bioinformatics an access to correct RNA secondary structure representation is of crucial importance. This is true for both, secondary and three-dimensional RNA structure prediction studies. RNApdbee is aimed to derive secondary structure topology from the tertiary structure of RNA and/or from the list of base pairs. The tool supports processing of large unknotted structures as well as RNAs with pseudoknots. In particular, it allows for the following operations:

extracting RNA secondary structure from a PDB or PDBx/mmCIF file,
converting CT to BPSEQ and vice versa,
converting CT and BPSEQ to dot-bracket,
encoding extracted RNA secondary structure in dot-bracket, CT and BPSEQ,
creating a visualization of RNA secondary structure,
finding multiple 2D structures from a number of 3D structure analyses.

There are four major scenarios of using RNApdbee application:

3D → (....), in which the secondary structure is extracted from a PDB or PDBx/mmCIF file,
2D → (....), where the secondary structure topology is derived from a list of base pairs,
(....) → image, where the secondary structure is visualized basing on a provided topology,
3D → multi 2D, where multiple secondary structures are found.

The first two scenarios are based on the algorithm that iteratively unknots the RNA structure, saves partial information about knotting order, to finally merge intermediate results and encode the RNA topology. The result is released in dot-bracket, BPSEQ and CT, together with visualization. The third scenario allows to visualize RNA structure given in dot-bracket format. The last scenario performs multiple analyses simultaneously. Let us describe the four scenarios in detail.

3D → (....) scenario

Step 1: In the first step of this scenario, a user should upload the PDB or PDBx/mmCIF file with RNA tertiary structure. The file can be uploaded either directly from a local drive (use "Browse" button to browse through the local repositories) or from Protein Data Bank. In the second case a user should enter PDB identifier into the edit box, and click the "Get" button. The associated PDB or PDBx/mmCIF file is automatically downloaded from Protein Data Bank and made ready for processing by the application.
There are 5 example PDB and PDBx/mmCIF files stored in the system and ready for processing. They enable new users an easy start with RNApdbee. Uploaded data can be viewed in the textarea after clicking "Show file contents" button, and edited before further processing.
Step 2: In the second step, an application used to identify base pairs in the uploaded structure should be selected from four available programs: 3DNA/DSSR (default), RNAView, MC-Annotate, FR3D. 3DNA/DSSR can be selected with additional option "Analyse helices". Citing 3DNA/DSSR documentation: "a helix is defined by base-stacking interactions, regardless of bp type and backbone connectivity". Therefore, when this option is set in RNApdbee, the resulting 2D structure (both in text and visualization) will contain pairs which in 3D structure form such defined helix, even if the nucleotides from opposite strands are not constituting canonical base pairs. Additionally, if one wants to have non-canonical base pairs annotated in the output (either in text and visualization, or visualization only), an appropriate option should be checked. By default, RNApdbee output representations contain only canonical base pairs, while non-canonical ones are included in a separate list only. In this step, one can also decide to remove isolated base pairs from the result.
Step 3: In the third step, an algorithm to resolve and encode secondary structure topology should be selected. There are five available options to choose from: Hybrid Algorithm (default), Dynamic Programming, Elimination Min-Gain, Elimination Max-Conflicts, First-Come-First-Served. Each algorithm is given a list of base pairs (identified by a tool from previous step) and proceeds to encode the secondary structure topology, which is a difficult task for highly complex and pseudoknotted RNA structures. Hybrid Algorithm performs an exhaustive search for small subproblems and random walk for larger ones. The class of Elimination algorithms works iteratively by selecting certain base pairs to be encoded as pseudoknots. The selection criterion is different in the Min-Gain and Max-Conflicts variants yielding differences in results. In Min-Gain, base pairs are analysed from the perspective of stem size they are part of and adjacent stem sizes. In Max-Conflicts, the number of adjacent stems is taken into account. The Dynamic Programming works on the same principle as Elimination algorithms, but the selection criterion is such as to optimize the global solution. The First-Come-First-Served algorithm is a simple heuristic encoding the topology from 5' to 3' ends.
Step 4: RNApdbee automatically finds secondary structure elements within the result. In this step, user can decide to treat pseudoknots as either paired (default) or unpaired residues. With the default option selected, residues encoded as pseudoknots will be allowed to form stems and terminating base pairs of loops. When the other option is set, the same residues will be part of single strands and inside of loops.
Step 5: Next, the user should decide whether the visualization is to be generated at the output, together with the textual representation of the resulting secondary structure. In order to get an image, one should select one of three available procedures: VARNA-based (default), PseudoViewer-based or R-chie-based procedure. Otherwise, 'No image' option must be selected. The first two procedures generate a classical visualization of secondary structure, while the last one produces an arc diagram. If the selected procedure fails to generate an image, the alternative procedure is automatically run.
Step 6: To start secondary structure extraction, the "Run" button should be clicked. This causes an immediate display of the results page with the secondary structure encoded in extended dot-bracket notation, BPSEQ and CT, listing of non-canonical base pairs and other RNA tertiary contacts with their classification, identified structural elements, and visualization (if previously requested). The results can be saved to a local drive. Moreover, the same input data can be processed again with the other set of options.

2D → (....) scenario

Step 1: In the first step of this scenario, the user should upload the BPSEQ or CT file from local repository (use "Browse" button to select a file from the required folder). There are also 3 example files in BPSEQ format and 3 example files in CT format available for upload. Uploaded data can be viewed in the textarea after clicking "Show file contents" button, and edited before further processing.
Step 2: In this step, the user can select how should isolated base pairs be treated. By default, they are included in the result just as any other pairing. In an isolated base pair, in the close vicinity of both residues there are no other interactions to stabilise the pairing. Therefore, user can decide to treat it as unsure or unstable and remove it from the result.
Step 3: In the third step, an algorithm to resolve and encode secondary structure topology should be selected. There are five available options to choose from: Hybrid Algorithm (default), Dynamic Programming, Elimination Min-Gain, Elimination Max-Conflicts, First-Come-First-Served. Each algorithm is given a list of base pairs (identified by a tool from previous step) and proceeds to encode the secondary structure topology, which is a difficult task for highly complex and pseudoknotted RNA structures. Hybrid Algorithm performs an exhaustive search for small subproblems and random walk for larger ones. The class of Elimination algorithms works iteratively by selecting certain base pairs to be encoded as pseudoknots. The selection criterion is different in the Min-Gain and Max-Conflicts variants yielding differences in results. In Min-Gain, base pairs are analysed from the perspective of stem size they are part of and adjacent stem sizes. In Max-Conflicts, the number of adjacent stems is taken into account. The Dynamic Programming works on the same principle as Elimination algorithms, but the selection criterion is such as to optimize the global solution. The First-Come-First-Served algorithm is a simple heuristic encoding the topology from 5' to 3' ends.
Step 4: RNApdbee automatically finds secondary structure elements within the result. In this step, user can decide to treat pseudoknots as either paired (default) or unpaired residues. With the default option selected, residues encoded as pseudoknots will be allowed to form stems and terminating base pairs of loops. When the other option is set, the same residues will be part of single strands and inside of loops.
Step 5: Next, the user should decide whether the visualization is to be generated at the output, together with the textual representation of the resulting secondary structure. In order to get an image, one should select one of three available procedures: VARNA-based (default), PseudoViewer-based or R-chie-based procedure. Otherwise, 'No image' option must be selected. The first two procedures generate a classical visualization of secondary structure, while the last one produces an arc diagram. If the selected procedure fails to generate an image, the alternative procedure is automatically run.
Step 6: To start secondary structure processing, the "Run" button should be clicked. This causes an immediate display of the results page with the secondary structure encoded in extended dot-bracket notation, BPSEQ and CT, identified structural elements, and visualization (if requested). The results can be saved to a local drive. Moreover, the same input data can be processed again with the other set of options.

(....) → image

Step 1: In the first step of this scenario, the user should upload the dot-bracket file from local repository (use "Browse" button to select a file from the required folder). There are also 3 example files in dot-bracket format available for upload. Uploaded data can be viewed in the textarea after clicking "Show file contents" button, and edited before further processing.
Step 2: RNApdbee automatically finds secondary structure elements within the result. In this step, user can decide to treat pseudoknots as either paired (default) or unpaired residues. With the default option selected, residues encoded as pseudoknots will be allowed to form stems and terminating base pairs of loops. When the other option is set, the same residues will be part of single strands and inside of loops.
Step 3: Next, the user should select one of three available procedures: VARNA-based (default), PseudoViewer-based or R-chie-based procedure. The first two procedures generate a classical visualization of secondary structure, while the last one produces an arc diagram. If the selected procedure fails to generate an image, the alternative procedure is automatically run.
Step 4: To start visualization of the secondary structure topology, the "Run" button should be clicked. This causes an immediate display of the results page with the secondary structure encoded in extended dot-bracket notation, BPSEQ and CT, identifies structural elements, and visualization. The results can be saved to a local drive. Moreover, the same input data can be processed again with the other set of options.

3D → multi 2D

Step 1: In the first step of this scenario, a user should upload the PDB or PDBx/mmCIF file with RNA tertiary structure. The file can be uploaded either directly from a local drive (use "Browse" button to browse through the local repositories) or from Protein Data Bank. In the second case a user should enter PDB identifier into the edit box, and click the "Get" button. The associated PDB or PDBx/mmCIF file is automatically downloaded from Protein Data Bank and made ready for processing by the application.
There are 5 example PDB and PDBx/mmCIF files stored in the system and ready for processing. They enable new users an easy start with RNApdbee. Uploaded data can be viewed in the textarea after clicking "Show file contents" button, and edited before further processing.
Step 2: In the second step, user can configure parameters of analysis. The first option (off by default) if selected would allow non-canonical base pairs to be included in the secondary structure topology. The second option (off by default) if selected would remove any isolated base pairs from the result.
Step 3: Next, the user should decide whether the visualization is to be generated at the output, together with the textual representation of the resulting secondary structure. In order to get an image, one should select one of three available procedures: VARNA-based (default), PseudoViewer-based or R-chie-based procedure. Otherwise, 'No image' option must be selected. The first two procedures generate a classical visualization of secondary structure, while the last one produces an arc diagram. If the selected procedure fails to generate an image, the alternative procedure is automatically run.
Step 4: To start the analysis, the "Run" button should be clicked. This causes an immediate display of the results page with possibly several secondary structures encoded in BPSEQ and CT format. Each entry in the result page is encoded by up to several different dot-bracket structures, each accompanied with an image (if requested). The results can be saved to a local drive.

RNApdbee integrates selected functionality of different tools for analysing and visualising RNA structures into a simple and coherent interface. Below, a short note on each of these tools is provided with respect to their roles in RNApdbee workflow.

Base-pair identification

RNApdbee webserver offers the choice between four major base pair annotation methods: 3DNA/DSSR, RNAView, MC-Annotate and FR3D. All of them process PDB- or PDBx/mmCIF-formatted data describing the three-dimensional structure of RNAs.

RNAView

RNAView identifies and classifies the types of base pairs and basic RNA motifs such as loops and bulges that are formed in nucleic-acid structures. It provides the implementation of edge-to-edge hydrogen bonding interactions according to Leontis/Westhof nomenclature . The program allows also for identification of tertiary interactions and visualization of 2D diagrams of RNA secondary topology in Postscript, VRML or RNAML format. RNAView can be downloaded from http://ndbserver.rutgers.edu/ndbmodule/services/download/rnaview.html and used after successful installation.

MC-Annotate

MC-Annotate provides a structural graph which encodes geometric information based on atomic coordinates and torsion angles. In general, the structural graph represents the description of every nucleotide conformation (based on sugar puckering modes and nitrogen base orientations around the glycosyl bond), base-base interactions (based on stacking and hydrogen bonding information) and pseudoknots identified in the input structure. MC-Annotate allows also for RNA motif recognition. It is available as a webserver at http://www-lbit.iro.umontreal.ca/mcannotate-simple/ .

3DNA/DSSR

3DNA/DSSR (DSSR: Dissecting the Spatial Structure of RNA) tool allows to characterize the geometric features of RNAs. It is provided as a command-line driven package, which was designed to analyse, reconstruct and visualise the three-dimensional nucleic acid structures. 3DNA/DSSR identifies base pairs of the input RNA structure taking into account modified and unmodified residues that form canonical (Watson-Crick) base pairs, non-canonical base pairs with at least one H-bond and non-pairing interactions (e.g. base stacking). Moreover, 3DNA/DSSR characterizes base pairs using both Leontis/Westhof and Saenger classifications. It detects triplets, higher-order base associations and pseudoknots. Additionally, 3DNA/DSSR provides RNA secondary structure in the dot-bracket notation. It is available as a web server at http://web.x3dna.org/dssr.

FR3D

FR3D is a suite of MATLAB programs to look for recurrent 3D motifs in RNA structure. This process includes classification of base pairs and stacking interactions. FR3D internally performs this operation using base-centered approach to analyse geometries of bases with respect to a common frame. The found base pairs are classified according to Leontis/Westhof notation. The web-accessible version of the tool is available as WebFR3D.

Handling of missing residues

The PDB and PDBx/mmCIF formats support explicit description of missing residues. These are known nucleobases (A, C, G or U) for which however the atom positions are absent. For PDB entries containing missing residues, RNApdbee is able to read a full sequence, but lacks the possibility of deriving the secondary structure. A special character, i.e. the minus sign "–", was chosen to describe missing or unidentified residues in the dot-bracket output for such cases. In PseudoViewer- and VARNA-based visualizations, missing residues are either (1) highlighted in red (if appearing at 5' or 3' terminal ends) or (2) not displayed, but counted (i.e. the number of preceding and proceeding nodes are adjusted).

Handling of non-canonical interactions

By default, RNApdbee output representations contain only canonical base pairs, while non-canonical ones are included in a separate list only. Upon user request, non-canonical base pairs can be also included in text and graphical representations of the output secondary structure. In general, RNApdbee provides information about strong non-canonical interactions (mediated by at least 2 hydrogen bonds between bases). They are annotated in all text and graphical representations. Due to limitations of text formats not all multiplet-involved interactions can be encoded in dot-bracket, CT and BPSEQ. However, they are present in the visualization. In dot-bracket representation non-canonical base pairs are printed in bold, while in CT format they are supplied with comments. Additional functions drive classification of non-canonical base pairs according to Leontis-Westhof and Saenger nomenclatures. An assignment of non-canonical interactions to classes is provided in CT file and in the supplementary table. This table contains also information about weak interactions (annotated as "base - base (1H)", "stacking", "base - sugar", "base - phosphate", "sugar - sugar", "sugar - phosphate", "phopshate - phosphate" or "other" in the 2nd column).

Secondary structure drawing

In order to draw secondary structure topology, RNApdbee webserver integrates the functionality of PseudoViewer, VARNA and R-chie, and supplements it with own scripts that annotate the orders of pseudoknot interactions, and non-canonical base pairs.

PseudoViewer

PseudoViewer allows for effective visualisation of large RNAs, also these including pseudoknots, as planar drawings. As an input, it accepts the sequence and the secondary structure data in the dot-bracket or paired format. At the output, a URL of the generated image file is returned. The visualization can be prepared in EPS, SVG, PNG, or GIF. PseudoViewer is available at http://pseudoviewer.inha.ac.kr/ .

VARNA (Visualization Applet for RNA)

VARNA is mainly an interactive software for drawing and editing RNA secondary structures. It supports several input file formats, including BPSEQ, CT and others. The main advantage of VARNA is its support for non-canonical base pairs (Leontis/Westhof nomenclature) and pseudoknots (first extracted maximal planar subset of canonical base pairs is a scaffold for the rest of the base pairings). The output visualisation can be extracted in vector and bitmap picture formats including EPS, SVG, XFIG, JPG, or PNG. VARNA is available as the lightweight applet and swing component at http://varna.lri.fr/.

R-chie

R-chie is a tool for generating different types of arc diagrams. It supports visualization of multiple sequence alignments and incorporation of co-variance information into the image. R-chie can read inputs in various formats including BPSEQ, CT and dot-bracket, also these containing higher order pseudoknots. It is highly configurable and allows to save output as PNG or PDF files. The webserver is available at http://www.e-rna.org/r-chie.

Pseudoknot annotation scripts

Graphical images of RNA secondary structure generated by PseudoViewer and VARNA are post-processed by the RNApdbee scripts. Their aim is to annotate the orders of pseudoknot interactions in a graphical way. Different colours have been assigned to particular pseudoknot orders:

Pseudoknot order	1st	2nd	3rd	4th	5th	6th	7th	8th
Text annotation (brackets & letter)	[]	{}	<>	Aa	Bb	Cc	Dd	Ee
Graphical annotation (colours)

Scripts to annotate non-canonical interactions

Additional scripts post-process the VARNA and PseudoViewer images to display non-canonical interactions. VARNA visualization supports Leontis/Westhof classification and it is used during RNApdbee post-processing (see table below). In case of PseudoViewer, the non-canonical interactions are shown either as gray-filled circles (for regular bps) or coloured dashed lines (for pseudoknots, see table of colours above). Gray dashed lines are used to connect multiplet-involved residues and other pairs unrepresentable in text format.

RNA base-base classification	Visualization symbol
cis Watson-Crick Watson-Crick
trans Watson-Crick Watson-Crick
cis Watson-Crick Hoogsteen
trans Watson-Crick Hoogsteen
cis Watson-Crick Sugar
trans Watson-Crick Sugar
cis Hoogsteen Watson-Crick
trans Hoogsteen Watson-Crick
cis Hoogsteen Hoogsteen
trans Hoogsteen Hoogsteen
cis Hoogsteen Sugar
trans Hoogsteen Sugar
cis Sugar Watson-Crick
trans Sugar Watson-Crick
cis Sugar Hoogsteen
trans Sugar Hoogsteen
cis Sugar Sugar
trans Sugar Sugar

Multiple secondary structures

RNApdbee is capable of performing multiple analyses of secondary structure information extracted from 3D coordinates of an RNA. The analysis involves running all supported base pair analysers (3DNA/DSSR, RNAView, MC-Annotate and FR3D) on a single input 3D structure. The resulting lists of base pairs are grouped into unique sets. Each set is encoded in BPSEQ and CT format, then it is sent to the pipeline of secondary structure topology resolution, which currently consists of five algorithm (Hybrid Algorithm, Dynamic Programming, Elimination Min-Gain, Elimination Max-Conflicts and First-Come-First-Served). The results are again grouped into unique sets which are displayed to the user in dot-bracket format and visualization.

RNApdbee can process input data from files in PDB, PDBx/mmCIF, BPSEQ and CT formats. The output data are released in BPSEQ, CT, extended dot-bracket and graphical formats.

PDB file stores various data concerning the three-dimensional structure of a molecule, the experiment for structure determination, authors, etc. The detailed description of this format is given here. In RNApdbee, the information about atom coordinates and missing or modified residues is considered.
PDBx/mmCIF is a dictionary-based storage format. The file contains well-formatted entries containing 3D atom coordinates of a molecule as well as a set of metadata related to the experiment of structure determination, authors, etc. The detailed description of this format is given here .
BPSEQ file contains information about base pairs, stored in three columns: 1st column contains the sequence position (starting at one), 2nd column contains the base encoded in one-letter notation, 3rd column contains the pairing base (if base from 2nd column is paired) or zero (if base from 2nd column is unpaired).
CT (connect) file is column based and contains the information about base pairs: 1st column specifies the sequence index (starting at one), 2nd column contains the base in one-letter notation, 3rd and 4th columns specify additional indices (the index of predecessor and successor of base in the chain; if one of them is zero, it represents the terminal base in the chain), 5th column gives the pairing base (if base from 2nd column is paired) or zero (if base from 2nd column is unpaired) and 6th column corresponds to base number. Additionally, if 7th column appears and starts with a '#', the rest of the line contains a comment.
Dot-bracket notation is used to encode RNA secondary structure topology. Standard dot-bracket encodes nested RNAs only. It is a string composed of dots and brackets, where an unpaired nucleotide is represented as a dot ".", and a base pair is represented as a pair of opening (begin) and closing (end) brackets, i.e. "(" and ")".
The extended dot-bracket is applied to represent knotted secondary structures: squared "[" and "]" brackets are used for lower-order structures, the curly brackets "{" and "}", angle brackets "<" and ">" and consecutive alphabet letters "A" and "a", "B" and "b", etc. represent higher orders and most complicated pseudoknots. Additionally, the minus sign "–" is used to encode an unidentified residue.

Example encoding of RNA secondary structure:

BPSEQ format	CT format	dot-bracket notation	visualization
1 G 8 2 G 7 3 C 0 4 A 0 5 U 0 6 U 0 7 C 2 8 C 1	8 1 G 0 2 8 1 2 G 1 3 7 2 3 C 2 4 0 3 4 A 3 5 0 4 5 U 4 6 0 5 6 U 5 7 0 6 7 C 6 8 2 7	GGCAUUCC ((....))

RNApdbee is designed to work with most of the available web browsers. The latest versions of browsers are strongly recommended:

Operating system	Recommended browser
Windows	Microsoft Internet Explorer (10 and later), Mozilla Firefox (4 and later), Opera (15 and later) or Google Chrome (19 and later)
Linux	Mozilla Firefox (4 and later), Opera (15 and later) or Google Chrome (19 and later)
macOS	Mozilla Firefox (4 and later), Opera (15 and later) or Google Chrome (19 and later)

RNApdbee is integrating many external tools:

Name	Version	Citation
3DNA/DSSR	1.7.6-2018mar22	X.-J. Lu and W.K. Olson. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res, 31:5108–5121, 2003.
RNAView	N/A	H. Yang, F. Jossinet, N.B. Leontis, L. Chen, J. Westbrook, H. Berman, and E. Westhof. Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res, 31:3450–3460, 2003.
MC-Annotate	1.5	H. Yang, F. Jossinet, N.B. Leontis, L. Chen, J. Westbrook, H. Berman, and E. Westhof. Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res, 31:3450–3460, 2003.
FR3D	N/A	M. Sarver, C.L. Zirbel, J. Stombaugh, A. Mokdad, and N.B. Leontis. FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. J Math Biol, 56(1-2):215–252, 2007.
VARNA	3.93-p1	K. Darty, A. Denise, and Y. Ponty. VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinformatics, 25(15):1974–1975, 2009.
PseudoViewer	3.0	Y. Byun and K. Han. PseudoViewer: web application and web service for visualizing RNA pseudoknots and secondary structures. Nucleic Acids Res, 34(W1):W416–W422, 2006.
R-CHIE	0.1.3	D. Lai, J.R. Proctor, J.Y.A. Zhu, and I.M. Meyer. R-CHIE: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Res, 40(12):e95–e95, 2012.

RNApdbee is using various maven-provided dependencies. The most important ones are presented below:

Name	Version
org.biojava.biojava-structure	5.0.0-alpha15
org.springframework	5.0.4.RELEASE
tiles-extras.tiles-extras	3.0.8
org.projectlombok.lombok	1.16.20
com.google.api-client.google-api-client	1.23.0

Any published work which has made use of RNApdbee should cite the following papers:

T. Zok, M. Antczak, M. Zurkowski, M. Popenda, J. Blazewicz, R.W. Adamiak, M. Szachniuk. RNApdbee 2.0: multifunctional tool for RNA structure annotation. Nucleic Acids Research 46(W1), 2018, W30-W35, (doi:10.1093/nar/gky314)
M. Antczak, T. Zok, M. Popenda, P. Lukasiak, R.W. Adamiak, J. Blazewicz, M. Szachniuk. RNApdbee – a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. Nucleic Acids Research 42(W1), 2014, W368-W372, (doi:10.1093/nar/gku330)