Final
Due Date: Saturday, May 2 by 8:00am CDT
Overall Objective
The objective of the final project is to complete the work you began on your midterm project. It will exercise many of the concepts used during the semester. For example, consider the DNA Sequence Classifier dashboard we showed in class:
Sample dashboard
This dashboard brings together many topics we covered in this class:
Backend code written in Python using best practices
Using ML model trained with scikit-learn
Front end dashboard build with Dash
Front end dashboard supports some graphing / vis
Sequence data stored in Redis database
Dashbard makes API calls to database to exchange data
Dashboard, database, and ML code all containerized
Containers orchestrated together using docker compose
Deployed to a cloud machine and public on the web
All code stored in git repo on GitHub
Continuous integration between GitHub and Docker Hub
Bringing together all these concepts resulted in: a useful tool that is accessible to others, and performs some scientific function that would generally fall under the description of research computing in biology.
Final Requirements
Part 1: Code Repository
We are looking for a complete, stand-alone repository that has all of the code, models, scripts, and supporting files necessary to deploy and run your project. Each project is a little bit different, but in general we would expect to find:
app.py: Assuming your project is a dashboard, this is where the majority of the code and logic will beREADME.md: Use the README to introduce users to your project. Provide a high level description of the purpose - images and figures will help. Provide instructions to deploy the project, and instructions on using the actual web apprequirements.txt,Dockerfile,docker-compose.yml: We will be looking for files that aid in the deployment of your project, in line with the description provided in the READMEmodel/: If your dashboard performs inference on user data, please provide the pickled model(s) and any supporting scripts that were used to train or facilitate inference on the model in a subdirectory
Part 2: Write Up
We are looking for a written document describing the project. The written document should be verbose and targeted towards a non-user, but technically savvy layperson (e.g. one of your fellow biology students who is not taking this class). Here are some things we will be looking for:
Title page. Contains descriptive title, student(s) names
Write up contains logical progression of sections with appropriate headers
High level description with introduction to the project, describes the motivation
Detailed but concise description of the data
Key technologies (e.g. Docker, Dash, Redis) are defined at a high level for people who might not know what they are
Usage section shows representative example code snippets - not necessarily exhaustive, but just enough
Citations page at the end
Part 3: Video Demo
Prepare a < 10 minute video demo of the application. We recommend using zoom to share your screen and record your narration of the process. At a minimum, we want to see you (1) describe and show the deployment process of your project (e.g. if you are using Docker Compose, demonstrate the deployment process and describe what is going on), (2) talk about the purpose of the project - describe what scientific function it is meant to perform, (3) demonstrate the usage of your project being sure to highlight anything you think is interesting or unique about your application.
What to Turn In
Please send the instructors an email with a link to the repository, and attach the write up as a pdf, and attach or provide a zoom or box download link to the video. If working in a group of two, only one person needs to send that email, but be sure to mention both group members names.
Note on Using AI
The use of AI to complete this assignment is not recommended, but it is permitted with the following restrictions:
The use of LLMs (like ChatGPT, Copilot, etc) or any other AI must be rigorously cited. Any code blocks or text that are generated by an AI model should be clearly marked as such with in-code comments describing what was generated, how it was generated, and why you chose to use AI in that instance. The homework README must also contain a section that summarizes where AI was used in the assignment.
Additional Resources
Sequence Classifier (if link is not live, ask the instructors)