Homework 05 =========== **Due Date: Tuesday, February 17 by 11:00am CST** Unit 4 Best Practices: mmCIF Summary Script ------------------------------------------- This homework applies everything from Unit 4 (Code Organization, Documentation, Logging, and Error Handling). You will build a single, well-structured Python script called ``mmcif_summary.py`` that reads a mmCIF structure file, computes per-chain residue statistics, and writes the result to a JSON file in a specified format. **Input file:** Use the hemoglobin structure **4HHB** (same as in Homework 04). Download with: .. code-block:: console wget https://files.rcsb.org/download/4HHB.cif.gz gunzip 4HHB.cif.gz **Create a Python script named** ``mmcif_summary.py`` **that:** 1. Parses a mmCIF file (e.g., ``4HHB.cif``) using ``MMCIFParser`` from ``Bio.PDB``. 2. For each chain in the structure, computes: * **total_residues** — total number of residues in the chain * **hetero_residue_count** — number of hetero residues (waters, ligands, ions, etc.) * **standard_residues** — number of standard (non-hetero) residues 3. Writes the summary to a JSON file in the exact format shown below: .. code-block:: json { "chains": [ { "chain_id": "A", "total_residues": 198, "standard_residues": 141, "hetero_residue_count": 57 }, { "chain_id": "B", "total_residues": 205, "standard_residues": 146, "hetero_residue_count": 59 }, { "chain_id": "C", "total_residues": 201, "standard_residues": 141, "hetero_residue_count": 60 }, { "chain_id": "D", "total_residues": 197, "standard_residues": 146, "hetero_residue_count": 51 } ] } Requirements checklist ----------------------- * Script name: ``mmcif_summary.py`` * At least **3 functions** plus ``main()`` * Properly formatted ``if __name__ == "__main__"`` statement * **Type hints** on all functions (parameters and return types) * **Docstrings** with description, Args, and Returns for every function * **Logging** at at least **3 levels** * **argparse** for log level * **socket** used in logging * At least **one try/except** for error handling * Output JSON matches the required format * Use ``MMCIFParser`` from ``Bio.PDB``; iterate over the first model and all chains .. admonition:: Type Hints Your arguments and return values may involve Biopython objects (e.g., ``Structure``, ``Chain``, ``Residue``) from ``Bio.PDB``. You can use ``object`` as the type hint to indicate that the parameter is some object provided by the Biopython library. For built-in types (``str``, ``list``, ``dict``, ``int``, etc.) and your own data structures, use full type hints as usual. .. tip:: You may see ``PDBConstructionWarning`` messages when parsing some mmCIF files (e.g., "Chain D is discontinuous"). These are safe to ignore for this assignment. What to Turn In --------------- 1. Create a ``homework05`` directory in your Git repository (on your VM). 2. Add ``mmcif_summary.py`` to this directory. 3. Add your summary (e.g., ``4HHB_summary.json``) in an ``output_files`` directory. 4. Add a ``README.md`` in ``homework05`` that: * Describes what the script does and how to run it (including example commands) * Explains where to get the input file (4HHB.cif) * Includes a section on AI usage (if applicable — see note below) 5. Commit and push your work to GitHub. **Expected directory layout:** .. code-block:: text my-mbs337-repo/ ├── homework05/ │ ├── mmcif_summary.py │ ├── output_files/ │ │ └── 4HHB_summary.json │ └── README.md Note on Using AI ---------------- The use of AI to complete this assignment is not recommended, but it is permitted with the following restrictions: The use of LLMs (like ChatGPT, Copilot, etc) or any other AI must be rigorously cited. Any code blocks or text that are generated by an AI model should be clearly marked as such with in-code comments describing what was generated, how it was generated, and why you chose to use AI in that instance. The homework README must also contain a section that summarizes where AI was used in the assignment. Additional Resources -------------------- * `Unit 4: Code Organization <../unit04/organization.html>`_ * `Unit 4: Documentation <../unit04/documentation.html>`_ * `Unit 4: Logging <../unit04/logging.html>`_ * `Unit 4: Error Handling <../unit04/error_handling.html>`_ * `Unit 3: mmCIF <../unit03/mmCIF.html>`_ * `Biopython PDB `_ * `RCSB PDB `_ — download mmCIF files (e.g. 4HHB) * Please find us in the class Slack channel if you have any questions!