Python Data Science Digest June 2021
A list of the most popular Python data science posts from the last month
A list of the most popular posts featured on Python Posts you might have missed! in May 2021. All of the most exciting Python resources in deep learning, machine learning, visualisation and data analysis.
Featured posts
CS 229 - Supervised Learning Cheatsheet by Afshine Amidi and Shervine Amidi
Why this is useful: Part of the CS229 class at Stanford, this set of cheatsheets cover some important technical aspects of machine learning that are useful refreshers for both students and practitioners alike - highly recommended!
Fairlearn Quickstart — A Python package to assess and improve fairness of machine learning models by Microsoft Corporation and Fairlearn contributors.
Fairlearn landing page.
Why this is useful: From the authors: ‘Fairlearn is an open-source, community-driven project to help data scientists improve fairness of AI systems.' We all know how important fairness is for decisions affecting people. Fairlearn is toolbox for for measurement and improvement of fairness in machine learning models which includes a scikit-learn style API for modelling under parity contraints and lots more.
Navigating the MLOps tooling landscape (Part 1: The Lifecycle) by Lj Miranda.
Why this is useful: MLOps is the booming and increasingly crowded field of managing the full ML Lifecyle in production systems (training, orchestration, hosting, monitoring, refitting etc). The software ecosystem is exploding (as well as the terminology) and this post (first in a three-part series) describes the ML lifecycle as a starting point for understanding MLOps.
Machine learning
AlphaPy • Automated Machine Learning (AutoML) with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost by Robert Scott.
homemade-machine-learning • Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained by Oleksii Trekhleb.
MLAlgorithms • Minimal and clean examples of machine learning algorithms implementations by Artem Golubin.
Introducing TensorFlow Decision Forests — The TensorFlow Blog by Mathieu Guillame-Bert Sebastian Bruch, Josh Gordon and Jan Pfeifer.
sklearn-evaluation • plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis. by Eduardo Blancas.
Bayesian optimization for hyperparameter tuning by Stathis Kamperis.
PyGAD: an open-source Python library for building the genetic algorithm and optimizing machine learning algorithms by PyGAD developers.
Natural Language Processing (NLP)
GPT-v2: Language Models are Unsupervised Multitask Learners by Sebastian Raschka.
How To Build a GPT-3 Chatbot with Python by John Mannelly.
Using: Mixing Hugging Face Models with Gradio 2.0 by Abubakar Abid.
Deep learning
intro-to-deep-learning • A collection of materials to help you learn about deep learning by Tyler Bettilyon.
Deep Learning (NYU CENTER FOR DATA SCIENCE) by Alfredo Canziani and Yann LeCun.
Learn TensorFlow and Deep Learning fundamentals with Python (code-first introduction) by Daniel Bourke.
reprodl2021 • Host repository for the Reproducible Deep Learning PhD course by Simone Scardapane.
Learning Python
What’s Good? An Intro into Python and Soccer Data with Mckay Johns by CJ Mayes and McKay Johns.
Scikit-Learn Course - Machine Learning in Python Tutorial by freeCodeCamp.
python-training • Python training for business analysts and traders by J.P. Morgan Chase.
Statistics
Statistics in Python — Scipy lecture notes by Gael Varoquaux.
A visual explanation for regularization of linear models by Terence Parr.
Bioinformatics and other bits - Viewing the THOR dataset with Bokeh and Panel by Damien Farrell.
Optimizing k-Means in NumPy: SciPy · Nicholas Vadivelu by Nicholas Vadivelu.
K-Means clustering and similarity visualization of constitutions by Meher Béjaoui.
Austin Rochford - A PyMC3 Analysis of Tyrannosaurid Growth Curves by Austin Rochford.
A Primer on Pólya-gamma Random Variables - Part II: Bayesian Logistic Regression • Louis Tiao by Louis Tiao.
Visualisation
scattertext • Beautiful visualizations of how language differs among document types by Jason Kessler.
#269: Holoviz - a Suite of Tools for Python Visualization by Talk Python Podcast.
Matplotlib vs. Seaborn - Data analysis and visualisation in Python by Carberra Tutorials.
brokenaxes • Create matplotlib plots with broken axes by Ben Dichter.
Clustergam: visualisation of cluster analysis – Martin Fleischmann by Martin Fleischmann.
Finance and time series
dx • DX Analytics • Financial and Derivatives Analytics with Python by Yves Hilpisch.
Prophet: a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts by Facebook Open Source.
4 different approaches for Time Series Analysis: A ready-to-run Python code including different strategies and libraries for Time Series Analysis by Angelica Lo Duca.
Time Series Forecast Using SARIMA - Step-By-Step Process that Explains How it’s Done! by Rishabh Sharma.
Ethereum Price Prediction with Python: A short guide to time series forecasting using the Prophet library by Benedict Neo.
Time Series and Forecasting with Python code examples (II) by Jose Jorge.
Utilities
A Complete Machine Learning Project From Scratch: Setting Up by Mihail Eric.
Jupyter Book: an open source project for building beautiful, publication-quality books and documents from computational material by The Jupyter Book Community.
best-of-streamlit • A ranked gallery of awesome streamlit apps built by the community by Johannes Rieke.
PyInstaller - How to Turn Your Python Code into an Exe on Windows - Mouse Vs Python by Mike Driscoll.
Simple Multiprocessing In Python: Comparing core vs libraries by Samuel Hinton.
azure-python-labs • Labs demonstrating how to use Python with Azure, Visual Studio Code, GitHub, Windows Subsystem for Linux, and more! by Microsoft Azure.
Data manipulation
Anaconda • Why Data Preparation Should Never Be Fully Automated by Team Anaconda.
diffgram • Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning by Diffgram.
Cookbook — repository for short and sweet examples and links for useful pandas recipes by The Pandas Development Team.