AutoPrep

AutoPrep screenshot

AutoPrep is an automated data preprocessing and analysis Python package that generates comprehensive LaTeX reports. It handles common preprocessing tasks, creates insightful visualizations, and documents the entire process in a professional PDF report. It focuses on tabular data, supporting numerous explainable AI models. Emphasizing interpretability and ease of use, it includes subsections for each model, explaining their strengths, weaknesses, and providing usage examples.
The pipeline automatically detects task type (binary classification, multiclass classification or regression), generates an array of possibles preprocessing pipelines, scores them, trains models, tunes hyperparameters and generates a well-structured report.

Technologies: Python, poetry, Pandas, NumPy, scikit-learn, seaborn, matplotlib

Co-authors: Kruk Julia, Pozorski Paweł, Rogalska Katarzyna

Repository
Documentation

Hyperparameter Tunability

AutoPrep screenshot

This project focused on reproducing results from publication

Bernd Bischl, Anne-Laure Boulesteix, and Philipp Probst. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 2019.

The experiments were conducted on 5 datasets from OpenML and 3 models (XGBoost, logistic regression and k-nearest neighbours classifier). Two research questions were answered:

  1. Do the AUC scores differ significantly between models optimized with Random Search and Bayes Search?
  2. Does Random Search converge significantly faster than Bayes Search?

Technologies: Python, Pandas, NumPy, scikit-learn, seaborn, matplotlib, scipy.stats

Co-author: Kruk Julia

Other projects

Other projects include implementations in

Explore my projects