Real Dashboards: Adding a Component to Display a Histogram of Amino Acid Frequencies

Let’s add one more feature to our PDB dashboard to make it even more informative. We will add a component that displays a histogram of the amino acid frequencies in the loaded PDB structure. This will give users a quick visual overview of the composition of the molecule they are viewing. To do this, we will use Dash’s dcc.Graph component and have it contain a histogram created by Plotly Express based on the amino acid frequencies calculated from the PDB file. This will lead us to adding another row to our layout below the molecule viewer and header information columns.

../_images/pdb-dashboard-layout-with-graph-space.png

PDB dashboard layout with added row for amino acid frequency histogram.

Updated Imports

To implement this feature, we will need to import some additional components from Dash, BioPython, and Plotly Express. Specifically, we will need to import the dcc module from Dash to use the Graph component, the PDBParser class from Biopython to parse the PDB file, and the plotly.express module to create the histogram. Let’s update our imports in the app.py file.

from collections import Counter

import plotly.express as px
from Bio.PDB import PDBList, PDBParser, parse_pdb_header
from dash import Dash, Input, Output, State, callback, dcc, html

Updated Layout

Next, we will update our layout to include a new row for the amino acid frequency histogram. We will use a dbc.Row component to create this new row which will contain a dbc.Col component that will hold the dcc.Graph component for the histogram. This new column will span the full width of the layout (width=12).

dbc.Row([
    dbc.Col([
        dcc.Graph(id='amino-acid-histogram', figure={}, style={'display': 'none'})
    ], width=12)
], className="mt-4"),

Updated Callback Function

Again, since we want to update the content of the new histogram component when the user loads a new PDB structure, we will need to update our callback function to include an additional two outputs for the histogram figure.

@callback(
    [Output('molecule-viewer', 'children'),
    Output('header-info', 'children'),
    Output('amino-acid-histogram', 'figure'),
    Output('amino-acid-histogram', 'style'),
    Output('status-message', 'children')],
    Input('load-button', 'n_clicks'),
    State('pdb-input', 'value'),
    prevent_initial_call=True
)

As you saw in the layout section, we will target the id amino-acid-histogram to update both the content and style of the histogram component.

Once again we need to update the logic of our callback function, load_molecule, to calculate the amino acid frequencies from the PDB file and create a histogram figure using Plotly Express. We will add the following code to our callback function to calculate the amino acid frequencies:

# Parse PDB structure for amino acid analysis
bio_parser = PDBParser(QUIET=True)
structure = bio_parser.get_structure(pdb_id, pdb_file)
amino_acid_counts = count_amino_acids(structure)

As you can see, we will use Biopython’s PDBParser to parse the PDB file and then to get the structure. Then we will call a helper function called count_amino_acids that will take the parsed structure and return a collections.Counter object (dictionary) with the counts of each amino acid in the structure. The count_amino_acids function will look something like this:

def count_amino_acids(structure):
    """Count amino acid frequencies in a PDB structure"""
    # Standard amino acids (3-letter codes)
    standard_aa = {
        'ALA', 'CYS', 'ASP', 'GLU', 'PHE', 'GLY', 'HIS', 'ILE', 'LYS', 'LEU',
        'MET', 'ASN', 'PRO', 'GLN', 'ARG', 'SER', 'THR', 'VAL', 'TRP', 'TYR'
    }

    amino_acids = []

    # Iterate through all residues in all chains
    for model in structure:
        for chain in model:
            for residue in chain:
                # Get residue name and check if it's a standard amino acid
                res_name = residue.get_resname().strip()
                if res_name in standard_aa:
                    amino_acids.append(res_name)

    # Count frequencies
    return Counter(amino_acids)

Then, we will create a histogram figure using Plotly Express based on the amino acid counts. We have created a helper function called create_amino_acid_histogram that takes the amino acid counts and the PDB ID as input and returns a Plotly figure object.

# Create amino acid histogram
histogram = create_amino_acid_histogram(amino_acid_counts, pdb_id)

The create_amino_acid_histogram function will look something like this:

def create_amino_acid_histogram(amino_acid_counts, pdb_id):
    """Create a Plotly histogram of amino acid frequencies"""
    if not amino_acid_counts:
        return {}

    # Convert to lists for plotting
    amino_acids = list(amino_acid_counts.keys())
    counts = list(amino_acid_counts.values())

    # Create bar chart (histogram)
    fig = px.bar(
        x=amino_acids,
        y=counts,
        labels={'x': 'Amino Acid', 'y': 'Frequency'},
        title=f'Amino Acid Composition - PDB: {pdb_id.upper()}',
        color=counts,
        color_continuous_scale='Viridis'
    )

    # Update layout
    fig.update_layout(
        xaxis_title='Amino Acid (3-letter code)',
        yaxis_title='Count',
        showlegend=False,
        height=400,
        hovermode='x'
    )

    # Sort by amino acid name for consistent display
    fig.update_xaxes(categoryorder='category ascending')

    return fig

Once again, since we have added two new outputs to our callback function, we will also need to update the return statements in the callback function to include the new outputs for the histogram figure and style. For example, if we don’t receive a valid PDB ID, we will return:

if not pdb_id:
    return (
        html.Div("Please enter a valid PDB ID.", className="text-center text-muted mt-5"),
        html.Div("Header information will appear here.", className="text-center text-muted mt-5"),
        {},
        {'display': 'none'},
        dbc.Alert("Please enter a PDB ID.", color="warning")
    )

Or, if there is an error loading the molecule, we will return:

except Exception as e:
    error_msg = dbc.Alert(
        f"Error loading PDB {pdb_id.upper()}: {str(e)}",
        color="danger"
    )
    empty_viewer = html.Div(
        "Failed to load molecule. Please check the PDB ID and try again.",
        className="text-center text-muted mt-5"
    )
    empty_header = html.Div(
        "Header information will appear here.",
        className="text-center text-muted mt-5"
    )
    return empty_viewer, empty_header, {}, {'display': 'none'}, error_msg

And, finally, if the molecule loads successfully, we will return:

if not histogram:
        return viewer, header_display, histogram, {'display': 'none'}, status
    else:
        return viewer, header_display, histogram, {'display': 'block'}, status

Running the Updated App

Once again, putting all of these updates together, our updated app.py file should look like this:

Code
  1import os
  2from collections import Counter
  3
  4import dash_bio as dashbio
  5import dash_bootstrap_components as dbc
  6import plotly.express as px
  7from Bio.PDB import PDBList, PDBParser, parse_pdb_header
  8from dash import Dash, Input, Output, State, callback, dcc, html
  9from dash_bio.utils import PdbParser as DashPdbParser
 10from dash_bio.utils import create_mol3d_style
 11
 12# Initialize the Dash app
 13external_stylesheets = [dbc.themes.CERULEAN]
 14app = Dash(__name__, external_stylesheets=external_stylesheets)
 15
 16# App layout
 17app.layout = dbc.Container([
 18    dbc.Row([
 19        html.Div("Molecular Structure Viewer", className="text-primary text-center fs-3 mb-4")
 20    ]),
 21
 22    dbc.Row([
 23        dbc.Col([
 24            dbc.Label("Enter PDB ID:", className="fw-bold"),
 25            dbc.Input(
 26                id='pdb-input',
 27                type='text',
 28                placeholder='e.g., 4HHB, 3AID, 2MRU, 4K8X',
 29                value='4HHB',
 30                className="mb-2"
 31            ),
 32            dbc.Button("Load Structure", id='load-button', color="primary"),
 33            html.Div(id='status-message', className="mt-3")
 34        ], width=2),
 35
 36        dbc.Col([
 37            html.Div(id='molecule-viewer', children=[
 38                html.Div("Enter a PDB ID and click 'Load Structure' to view the molecule.",
 39                        className="text-center text-muted mt-5")
 40            ])
 41        ], width=5),
 42
 43        dbc.Col([
 44            html.Div(id='header-info', children=[
 45                html.Div("Header information will appear here.",
 46                        className="text-center text-muted mt-5")
 47            ], style={'maxHeight': '600px', 'overflowY': 'auto'})
 48        ], width=5)
 49    ], className="mt-4"),
 50
 51    dbc.Row([
 52        dbc.Col([
 53            dcc.Graph(id='amino-acid-histogram', figure={}, style={'display': 'none'})
 54        ], width=12)
 55    ], className="mt-4"),
 56], fluid=True)
 57
 58# Callback to load and display molecule
 59@callback(
 60    [Output('molecule-viewer', 'children'),
 61    Output('header-info', 'children'),
 62    Output('amino-acid-histogram', 'figure'),
 63    Output('amino-acid-histogram', 'style'),
 64    Output('status-message', 'children')],
 65    Input('load-button', 'n_clicks'),
 66    State('pdb-input', 'value'),
 67    prevent_initial_call=True
 68)
 69def load_molecule(load_clicks, pdb_id):
 70
 71    if not pdb_id:
 72        return (
 73            html.Div("Please enter a valid PDB ID.", className="text-center text-muted mt-5"),
 74            html.Div("Header information will appear here.", className="text-center text-muted mt-5"),
 75            {},
 76            {'display': 'none'},
 77            dbc.Alert("Please enter a PDB ID.", color="warning")
 78        )
 79
 80    try:
 81        # Clean up PDB ID (remove whitespace, convert to lowercase)
 82        pdb_id = pdb_id.strip().lower()
 83
 84        # Create PDB directory if it doesn't exist
 85        pdb_dir = './pdb_files'
 86        os.makedirs(pdb_dir, exist_ok=True)
 87
 88        # Download PDB file using BioPython
 89        pdbl = PDBList()
 90        pdb_file = pdbl.retrieve_pdb_file(pdb_id, pdir=pdb_dir, file_format='pdb')
 91
 92        # Read PDB file content for visualization
 93        dash_parser = DashPdbParser(pdb_file)
 94        pdb_data = dash_parser.mol3d_data()  # Get data in format suitable for Molecule3dViewer
 95        # create styles for visualization needed by Molecule3dViewer
 96        # atoms is a list of dictionaries obtained from parsing the PDB file with DashPdbParser
 97        # visualization_type can be 'cartoon', 'stick', 'sphere'
 98        # color_element can be 'residue', 'chain', 'element', 'partialCharge'
 99        styles = create_mol3d_style(
100            pdb_data['atoms'], visualization_type='cartoon', color_element='residue'
101        )
102
103        # Parse PDB structure for amino acid analysis
104        bio_parser = PDBParser(QUIET=True)
105        structure = bio_parser.get_structure(pdb_id, pdb_file)
106        amino_acid_counts = count_amino_acids(structure)
107
108        # Parse PDB header information
109        header_info = parse_pdb_header(pdb_file)
110
111        # Create Molecule3dViewer component
112        viewer = create_molecule_viewer(pdb_data, styles)
113
114        # Create header display
115        header_display = create_header_display(header_info, pdb_id)
116
117        # Create amino acid histogram
118        histogram = create_amino_acid_histogram(amino_acid_counts, pdb_id)
119
120        status = dbc.Alert(
121            f"Successfully loaded PDB ID: {pdb_id.upper()}",
122            color="success"
123        )
124
125        if not histogram:
126            return viewer, header_display, histogram, {'display': 'none'}, status
127        else:
128            return viewer, header_display, histogram, {'display': 'block'}, status
129
130    except Exception as e:
131        error_msg = dbc.Alert(
132            f"Error loading PDB {pdb_id.upper()}: {str(e)}",
133            color="danger"
134        )
135        empty_viewer = html.Div(
136            "Failed to load molecule. Please check the PDB ID and try again.",
137            className="text-center text-muted mt-5"
138        )
139        empty_header = html.Div(
140            "Header information will appear here.",
141            className="text-center text-muted mt-5"
142        )
143        return empty_viewer, empty_header, {}, {'display': 'none'}, error_msg
144
145def create_molecule_viewer(pdb_data, styles):
146    """Create a Molecule3dViewer from PDB data"""
147    return dashbio.Molecule3dViewer(
148        id='molecule-3d',
149        modelData=pdb_data,
150        styles=styles,
151        selectionType='atom',
152        backgroundColor='#F0F0F0',
153        height=600,
154        width='100%'
155    )
156
157def create_header_display(header_info, pdb_id):
158    """Create a formatted display of PDB header information"""
159    header_sections = []
160
161    # Title
162    if 'name' in header_info:
163        header_sections.append(
164            html.Div([
165                html.H6("Name", className="fw-bold mt-3 mb-2"),
166                html.P(header_info['name'], className="text-sm")
167            ])
168        )
169
170    # Structure Classification
171    if 'structure_method' in header_info:
172        header_sections.append(
173            html.Div([
174                html.H6("Method", className="fw-bold mt-3 mb-2"),
175                html.P(header_info['structure_method'], className="text-sm")
176            ])
177        )
178
179    # Release Date
180    if 'release_date' in header_info:
181        header_sections.append(
182            html.Div([
183                html.H6("Release Date", className="fw-bold mt-3 mb-2"),
184                html.P(header_info['release_date'], className="text-sm")
185            ])
186        )
187
188    # Deposition Date
189    if 'deposition_date' in header_info:
190        header_sections.append(
191            html.Div([
192                html.H6("Deposition Date", className="fw-bold mt-3 mb-2"),
193                html.P(header_info['deposition_date'], className="text-sm")
194            ])
195        )
196
197    # Resolution
198    if 'resolution' in header_info and header_info['resolution'] is not None:
199        header_sections.append(
200            html.Div([
201                html.H6("Resolution (Å)", className="fw-bold mt-3 mb-2"),
202                html.P(f"{header_info['resolution']:.2f}", className="text-sm")
203            ])
204        )
205
206    if 'journal_reference' in header_info and header_info['journal_reference']:
207        journal_text = header_info['journal_reference']
208        header_sections.append(
209            html.Div([
210                html.H6("Journal Reference", className="fw-bold mt-3 mb-2"),
211                html.P(journal_text, className="text-sm", style={'wordWrap': 'break-word'})
212            ])
213        )
214
215    # Keywords
216    if 'keywords' in header_info and header_info['keywords']:
217        keywords_text = header_info['keywords']
218        header_sections.append(
219            html.Div([
220                html.H6("Keywords", className="fw-bold mt-3 mb-2"),
221                html.P(keywords_text, className="text-sm", style={'wordWrap': 'break-word'})
222            ])
223        )
224
225    if header_sections:
226        return dbc.Card([
227            dbc.CardBody([
228                html.H5(f"PDB: {pdb_id.upper()}", className="card-title"),
229                html.Hr(),
230                *header_sections
231            ])
232        ], style={'height': '100%'})
233    else:
234        return html.Div("No header information available.", className="text-center text-muted mt-5")
235
236def count_amino_acids(structure):
237    """Count amino acid frequencies in a PDB structure"""
238    # Standard amino acids (3-letter codes)
239    standard_aa = {
240        'ALA', 'CYS', 'ASP', 'GLU', 'PHE', 'GLY', 'HIS', 'ILE', 'LYS', 'LEU',
241        'MET', 'ASN', 'PRO', 'GLN', 'ARG', 'SER', 'THR', 'VAL', 'TRP', 'TYR'
242    }
243
244    amino_acids = []
245
246    # Iterate through all residues in all chains
247    for model in structure:
248        for chain in model:
249            for residue in chain:
250                # Get residue name and check if it's a standard amino acid
251                res_name = residue.get_resname().strip()
252                if res_name in standard_aa:
253                    amino_acids.append(res_name)
254
255    # Count frequencies
256    return Counter(amino_acids)
257
258def create_amino_acid_histogram(amino_acid_counts, pdb_id):
259    """Create a Plotly histogram of amino acid frequencies"""
260    if not amino_acid_counts:
261        return {}
262
263    # Convert to lists for plotting
264    amino_acids = list(amino_acid_counts.keys())
265    counts = list(amino_acid_counts.values())
266
267    # Create bar chart (histogram)
268    fig = px.bar(
269        x=amino_acids,
270        y=counts,
271        labels={'x': 'Amino Acid', 'y': 'Frequency'},
272        title=f'Amino Acid Composition - PDB: {pdb_id.upper()}',
273        color=counts,
274        color_continuous_scale='Viridis'
275    )
276
277    # Update layout
278    fig.update_layout(
279        xaxis_title='Amino Acid (3-letter code)',
280        yaxis_title='Count',
281        showlegend=False,
282        height=400,
283        hovermode='x'
284    )
285
286    # Sort by amino acid name for consistent display
287    fig.update_xaxes(categoryorder='category ascending')
288
289    return fig
290
291# Run the app
292if __name__ == "__main__":
293    app.run(host='0.0.0.0', port=8050, debug=True)

Again, to run the updated app, simply execute the following command in your VS Code terminal (if it’s not already running):

(.venv) [mbs337-vm]$ python app.py
Dash is running on http://0.0.0.0:8050/

* Serving Flask app 'app'
* Debug mode: on

Now we can navigate to http://<IP_ADDRESS>:8050/ in our web browser to see the updated PDB dashboard with the new amino acid frequency histogram.

../_images/pdb-dashboard-with-histogram.png

PDB dashboard application with added amino acid frequency histogram running in a web browser.

Additional Resources