Homework 05
Due Date: Tuesday, February 17 by 11:00am CST
Unit 4 Best Practices: mmCIF Summary Script
This homework applies everything from Unit 4 (Code Organization, Documentation,
Logging, and Error Handling). You will build a single, well-structured Python
script called mmcif_summary.py that reads a mmCIF structure file, computes
per-chain residue statistics, and writes the result to a JSON file in a
specified format.
Input file: Use the hemoglobin structure 4HHB (same as in Homework 04). Download with:
wget https://files.rcsb.org/download/4HHB.cif.gz
gunzip 4HHB.cif.gz
Create a Python script named mmcif_summary.py that:
Parses a mmCIF file (e.g.,
4HHB.cif) usingMMCIFParserfromBio.PDB.For each chain in the structure, computes:
total_residues — total number of residues in the chain
hetero_residue_count — number of hetero residues (waters, ligands, ions, etc.)
standard_residues — number of standard (non-hetero) residues
Writes the summary to a JSON file in the exact format shown below:
{
"chains": [
{
"chain_id": "A",
"total_residues": 198,
"standard_residues": 141,
"hetero_residue_count": 57
},
{
"chain_id": "B",
"total_residues": 205,
"standard_residues": 146,
"hetero_residue_count": 59
},
{
"chain_id": "C",
"total_residues": 201,
"standard_residues": 141,
"hetero_residue_count": 60
},
{
"chain_id": "D",
"total_residues": 197,
"standard_residues": 146,
"hetero_residue_count": 51
}
]
}
Requirements checklist
Script name:
mmcif_summary.pyAt least 3 functions plus
main()Properly formatted
if __name__ == "__main__"statementType hints on all functions (parameters and return types)
Docstrings with description, Args, and Returns for every function
Logging at at least 3 levels
argparse for log level
socket used in logging
At least one try/except for error handling
Output JSON matches the required format
Use
MMCIFParserfromBio.PDB; iterate over the first model and all chains
Type Hints
Your arguments and return values may involve Biopython objects (e.g., Structure,
Chain, Residue) from Bio.PDB. You can use object as the type hint to
indicate that the parameter is some object provided by the Biopython library.
For built-in types (str, list, dict, int, etc.) and your own data
structures, use full type hints as usual.
Tip
You may see PDBConstructionWarning messages when parsing some mmCIF files (e.g., “Chain D is discontinuous”). These are safe to ignore for this assignment.
What to Turn In
Create a
homework05directory in your Git repository (on your VM).Add
mmcif_summary.pyto this directory.Add your summary (e.g.,
4HHB_summary.json) in anoutput_filesdirectory.Add a
README.mdinhomework05that:Describes what the script does and how to run it (including example commands)
Explains where to get the input file (4HHB.cif)
Includes a section on AI usage (if applicable — see note below)
Commit and push your work to GitHub.
Expected directory layout:
my-mbs337-repo/
├── homework05/
│ ├── mmcif_summary.py
│ ├── output_files/
│ │ └── 4HHB_summary.json
│ └── README.md
Note on Using AI
The use of AI to complete this assignment is not recommended, but it is permitted with the following restrictions:
The use of LLMs (like ChatGPT, Copilot, etc) or any other AI must be rigorously cited. Any code blocks or text that are generated by an AI model should be clearly marked as such with in-code comments describing what was generated, how it was generated, and why you chose to use AI in that instance. The homework README must also contain a section that summarizes where AI was used in the assignment.
Additional Resources
RCSB PDB — download mmCIF files (e.g. 4HHB)
Please find us in the class Slack channel if you have any questions!