Skip to main content

Build an ML Pipeline for Short-term Rental Prices in NYC

Project Description | Install | Data | Train model | Run sanity checks | Run tests | CI/CD | Dockerize | Request API | Model Card | Code Quality

Project Description

Apply the skills acquired in this course to develop a classification model on publicly available Census Bureau data. You will create unit tests to monitor the model performance on various data slices. Then, you will deploy your model using the FastAPI package and create API tests. The slice validation and the API tests will be incorporated into a CI/CD framework using GitHub Actions.

Source code: vnk8071/deploy_ml_pipeline_in_production

tree projects/deploy_ml_pipeline_in_production -I __pycache__

projects/deploy_ml_pipeline_in_production
├── EDA.ipynb
├── README.md
├── data
│ ├── census.csv
│ ├── census.csv.dvc
│ ├── census_clean.csv
│ └── census_clean.csv.dvc
├── images
│ ├── continuous_deployment.png
│ ├── continuous_integration.png
│ ├── live_get.png
│ ├── live_post.png
│ ├── local_post.png
│ └── settings_continuous_deployment.png
├── inference.py
├── main.py
├── model
│ └── model.pkl
├── model_card.md
├── module
│ ├── data.py
│ ├── model.py
│ └── train_model.py
├── requirements.txt
├── sanitycheck.py
├── slice_output.txt
└── tests
├── test_api.py
└── test_model.py

6 directories, 24 files
#FeatureStack
0LanguagePython
1Clean code principlesAutopep8, Pylint
2TestingPytest
3LoggingLogging
4Data versioningDVC
5Model versioningDVC
6ConfigurationHydra
7Development APIFastAPI
8DockerizeDocker
9Cloud computingRender
10CI/CDGithub Actions

Install

pip install -r requirements.txt

Hydra

@hydra.main(config_path=".", config_name="config", version_base="1.2")

Data

1. Download data

data/census.csv

Link: https://archive.ics.uci.edu/ml/datasets/census+income

2. EDA

EDA in notebook: Jupyter

3. Data versioning

dvc init
mkdir ../local_remote
dvc remote add -d localremote ../local_remote
dvc add data/census.csv
dvc add data/census_clean.csv
git add data/.gitignore data/census.csv.dvc data/census_clean.csv.dvc
git commit -m "Add data"
dvc push

Train model

python train.py

Result

2023-08-25 20:32:56,405 - INFO - Splitting data into train and test sets...
2023-08-25 20:32:56,412 - INFO - Processing data...
2023-08-25 20:32:56,634 - INFO - Training model...
2023-08-25 20:32:57,052 - INFO - LogisticRegression(max_iter=1000, random_state=8071)
2023-08-25 20:32:57,058 - INFO - Saving model...
2023-08-25 20:32:57,059 - INFO - Model saved.
2023-08-25 20:32:57,059 - INFO - Inference model...
2023-08-25 20:32:57,060 - INFO - Calculating model metrics...
2023-08-25 20:32:57,074 - INFO - >>>Precision: 0.6551724137931034
2023-08-25 20:32:57,074 - INFO - >>>Recall: 0.24934383202099739
2023-08-25 20:32:57,075 - INFO - >>>Fbeta: 0.36121673003802285
2023-08-25 20:32:57,075 - INFO - Calculating model metrics on slices data...
2023-08-25 20:32:58,281 - INFO - >>>Metrics with slices data:
feature ... category
0 workclass ... Private
1 workclass ... ?
2 workclass ... Federal-gov
3 workclass ... Self-emp-not-inc
4 workclass ... State-gov
.. ... ... ...
96 native-country ... Nicaragua
97 native-country ... Scotland
98 native-country ... Outlying-US(Guam-USVI-etc)
99 native-country ... Ireland
100 native-country ... Hungary

[101 rows x 5 columns]

Run sanity checks

python sanity_checks.py

Result

============= Sanity Check Report ===========
2023-08-24 23:16:57,951 - INFO - Your test cases look good!
2023-08-24 23:16:57,951 - INFO - This is a heuristic based sanity testing and cannot guarantee the correctness of your code.
2023-08-24 23:16:57,951 - INFO - You should still check your work against the rubric to ensure you meet the criteria.

Run tests

pytest tests/

Result

tests/test_api.py ....                                                         [ 33%]
tests/test_model.py ........ [100%]
=========================== 12 passed, 4 warnings in 3.65s ===========================

Dockerize

docker build -t deploy_ml_pipeline_in_production .
docker run -p 5000:5000 deploy_ml_pipeline_in_production

CI/CD

1. Github Actions

github_acontinuous_integrationctions

2. CD with Render

Settings continuous deployment on Render settings_continuous_deployment

Deployed app continuous_deployment

Request API

1. Local

uvicorn module.api:app --reload

Result local_post

2. Render

Check API get method at: https://vnk8071-api-deployment.onrender.com/docs live_get

Script to request API method POST

python inference.py

live_post

Model Card

Details in projects/deploy_ml_pipeline_in_production/model_card.md

Code Quality

Style Guide - Format your refactored code using PEP 8 – Style Guide. Running the command below can assist with formatting. To assist with meeting pep 8 guidelines, use autopep8 via the command line commands below:

autopep8 --in-place --aggressive --aggressive .

Style Checking and Error Spotting - Use Pylint for the code analysis looking for programming errors, and scope for further refactoring. You should check the pylint score using the command below.

pylint -rn -sn .

Docstring - All functions and files should have document strings that correctly identifies the inputs, outputs, and purpose of the function. All files have a document string that identifies the purpose of the file, the author, and the date the file was created.