📊 Python Data Science Digest August 2021
A list of the most popular posts featured on 'Python Posts you might have missed!' in July 2021'.
Kats: One stop shop for time series analysis in Python by Facebook Research.
A Swiss army knife for time series analysis! Time series analysis can be a bit of a specialised niche at times. Kats brings together some of the most important parts of the time series analyst’s toolbox into a single user friendly package. Includes support for modelling, feature extraction, outlier / changepoint detection and a lot more.
Clean up your pandas data frames with a single API. From the authors: “We take a base data file as the starting point, and perform actions on it, such as removing null/empty rows, replacing them with other values, adding/renaming/removing columns of data, filtering rows and others. More formally, these steps along with their relationships and dependencies are commonly referred to as a Directed Acyclic Graph (DAG)."
Build intelligent search interfaces to your documents. Haystack lets you use bleeding-edge NLP to search and retrieve from your documents and build simple search UIs. Includes tools to process, store, read and answer natural language questions. (I’m personally excited about this one, for the potential to index and search a corpus of data science news)!
Time Series Analysis
Multi-step Time Series Forecasting with ARIMA, LightGBM, and Prophet by Tomonori Masui.
Style sheets reference — Matplotlib by Matplotlib.
Palmer Penguins exploration with violinplots in Matplotlib by Tuo Wang, Tomas Capretto and Yan Holtz.
Python for teachers
Python for A Level Mathematics and Beyond by Dr Stephen Lynch.
Implementing Self-Organizing Maps with Python and TensorFlow by Nikola M. Zivkovic.
[Mastering XGBoost. Hyper-parameter Tuning: Optimization ](https://towardsdatascience.com/mastering-xgboost-2eb6bce6bc76 by Eric Luellen.
Permutation Importance with Multicollinear or Correlated Features by scikit-learn developers.
Beyond Grid Search: Hypercharge Hyperparameter Tuning for XGBoost by Druce Vertes.
Machine learning tutorials
Machine Learning from Scratch - Python Tutorials by Patrick Loeber.
All Python Libraries You Need For Machine Learning And Data Science by Patrick Loeber.
Scikit-learn Crash Course - Machine Learning Library for Python by Vincent D. Warmerdam.
Modeling Pipeline Optimization With scikit-learn by Machine Learning Mastery.
tuplex • Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code by Leonhard Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska.
Bayesian Factor Analysis Regression in Python with PyMC3 by Austin Rochford.
⇆ Comparing different clustering algorithms on toy datasets by scikit-learn developers.
Finally! Bayesian Hierarchical Modelling at Scale - Florian Wilhelm by Florian Wilhelm.
Data manipulation and utilities
Measuring the memory usage of a Pandas DataFrame by Itamar Turner-Trauring.
Introduction to Data Analysis Using Pandas by Stefanie Molin.
Ploomber • On writing clean Jupyter notebooks by Eduardo Blancas.
Natural language processing
stanza • Official Stanford NLP Python Library for Many Human Languages by Stanford NLP Group.
Understanding Gradient Descent with Python by Nikola M. Zivkovic.
Building a simple expected pass completion (xP) model using Keras by Paul Minogue.
Guide to Reinforcement Learning with Python and TensorFlow by Nikola M. Zivkovic.
torchdyn • A PyTorch based library for all things neural differential equations by DiffEqML Research Group.
Node Classification with Graph Neural Networks by Khalid Salama.