Final

Due Date: Saturday, May 2 by 8:00am CDT

Overall Objective

The objective of the final project is to complete the work you began on your midterm project. It will exercise many of the concepts used during the semester. For example, consider the DNA Sequence Classifier dashboard we showed in class:

../_images/dashboard.png

Sample dashboard

This dashboard brings together many topics we covered in this class:

  • Backend code written in Python using best practices

  • Using ML model trained with scikit-learn

  • Front end dashboard build with Dash

  • Front end dashboard supports some graphing / vis

  • Sequence data stored in Redis database

  • Dashbard makes API calls to database to exchange data

  • Dashboard, database, and ML code all containerized

  • Containers orchestrated together using docker compose

  • Deployed to a cloud machine and public on the web

  • All code stored in git repo on GitHub

  • Continuous integration between GitHub and Docker Hub

Bringing together all these concepts resulted in: a useful tool that is accessible to others, and performs some scientific function that would generally fall under the description of research computing in biology.

Final Requirements

Part 1: Code Repository

We are looking for a complete, stand-alone repository that has all of the code, models, scripts, and supporting files necessary to deploy and run your project. Each project is a little bit different, but in general we would expect to find:

  • app.py: Assuming your project is a dashboard, this is where the majority of the code and logic will be

  • README.md: Use the README to introduce users to your project. Provide a high level description of the purpose - images and figures will help. Provide instructions to deploy the project, and instructions on using the actual web app

  • requirements.txt, Dockerfile, docker-compose.yml: We will be looking for files that aid in the deployment of your project, in line with the description provided in the README

  • model/: If your dashboard performs inference on user data, please provide the pickled model(s) and any supporting scripts that were used to train or facilitate inference on the model in a subdirectory

Part 2: Write Up

We are looking for a written document describing the project. The written document should be verbose and targeted towards a non-user, but technically savvy layperson (e.g. one of your fellow biology students who is not taking this class). Here are some things we will be looking for:

  • Title page. Contains descriptive title, student(s) names

  • Write up contains logical progression of sections with appropriate headers

  • High level description with introduction to the project, describes the motivation

  • Detailed but concise description of the data

  • Key technologies (e.g. Docker, Dash, Redis) are defined at a high level for people who might not know what they are

  • Usage section shows representative example code snippets - not necessarily exhaustive, but just enough

  • Citations page at the end

Part 3: Video Demo

Prepare a < 10 minute video demo of the application. We recommend using zoom to share your screen and record your narration of the process. At a minimum, we want to see you (1) describe and show the deployment process of your project (e.g. if you are using Docker Compose, demonstrate the deployment process and describe what is going on), (2) talk about the purpose of the project - describe what scientific function it is meant to perform, (3) demonstrate the usage of your project being sure to highlight anything you think is interesting or unique about your application.

What to Turn In

Please send the instructors an email with a link to the repository, and attach the write up as a pdf, and attach or provide a zoom or box download link to the video. If working in a group of two, only one person needs to send that email, but be sure to mention both group members names.

Note on Using AI

The use of AI to complete this assignment is not recommended, but it is permitted with the following restrictions:

The use of LLMs (like ChatGPT, Copilot, etc) or any other AI must be rigorously cited. Any code blocks or text that are generated by an AI model should be clearly marked as such with in-code comments describing what was generated, how it was generated, and why you chose to use AI in that instance. The homework README must also contain a section that summarizes where AI was used in the assignment.

Additional Resources