📊 Python Data Science Digest August 2021
A list of the most popular posts featured on 'Python Posts you might have missed!' in July 2021'.
Featured posts

Kats: One stop shop for time series analysis in Python by Facebook Research.
A Swiss army knife for time series analysis! Time series analysis can be a bit of a specialised niche at times. Kats brings together some of the most important parts of the time series analyst’s toolbox into a single user friendly package. Includes support for modelling, feature extraction, outlier / changepoint detection and a lot more.
pyjanitor • a Python implementation of the #rstats package janitor, and provides a clean API for cleaning data by PyJanitor devs.
Clean up your pandas data frames with a single API. From the authors: “We take a base data file as the starting point, and perform actions on it, such as removing null/empty rows, replacing them with other values, adding/renaming/removing columns of data, filtering rows and others. More formally, these steps along with their relationships and dependencies are commonly referred to as a Directed Acyclic Graph (DAG)."
Build intelligent search interfaces to your documents. Haystack lets you use bleeding-edge NLP to search and retrieve from your documents and build simple search UIs. Includes tools to process, store, read and answer natural language questions. (I’m personally excited about this one, for the potential to index and search a corpus of data science news)!
Time Series Analysis
Time series Forecasting in Python: R, Part 1 (EDA) • Time series forecasting using various forecasting methods in Python: R in one notebook by Sandeep Pawar.
Multi-step Time Series Forecasting with ARIMA, LightGBM, and Prophet by Tomonori Masui.
Stock-market-forecasting • Forecasting directional movements of stock prices for intraday trading using LSTM and random forest by Pushpendu Ghosh.
Visualization
Data visualization and data analysis in Python (OkCupid dataset) by Amy Birdee.
How to Plot with Python: 8 Popular Graphs Made with pandas, matplotlib, seaborn, and plotly.express by Dylan Castillo.
Style sheets reference — Matplotlib by Matplotlib.
Palmer Penguins exploration with violinplots in Matplotlib by Tuo Wang, Tomas Capretto and Yan Holtz.
panel-highcharts • The panel-highcharts package makes it really easy to use HighCharts in Python, Notebooks and with HoloViz Panel by Marc Skov Madsen.
Python for teachers
Python for A Level Mathematics and Beyond by Dr Stephen Lynch.
Machine learning
Implementing Self-Organizing Maps with Python and TensorFlow by Nikola M. Zivkovic.
[Mastering XGBoost. Hyper-parameter Tuning: Optimization ](https://towardsdatascience.com/mastering-xgboost-2eb6bce6bc76 by Eric Luellen.
Permutation Importance with Multicollinear or Correlated Features by scikit-learn developers.
Beyond Grid Search: Hypercharge Hyperparameter Tuning for XGBoost by Druce Vertes.
Machine learning tutorials
Machine Learning from Scratch - Python Tutorials by Patrick Loeber.
All Python Libraries You Need For Machine Learning And Data Science by Patrick Loeber.
Scikit-learn Crash Course - Machine Learning Library for Python by Vincent D. Warmerdam.
homemade-machine-learning • 🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained by Oleksii Trekhleb.
Welcome to Reproducible Data Science — Reproducible Data Science + Python + Real-World Data by Valentin Danchev.
MLOps
evidently • Interactive reports to analyze machine learning models during validation or production monitoring by Evidently AI.
client • A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API by Weights & Biases.
Modeling Pipeline Optimization With scikit-learn by Machine Learning Mastery.
Continuous integration for data science with pytest, Github Actions, and Hypervector • Hypervector Blog by Hypervector.
tuplex • Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code by Leonhard Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska.
Statistical modelling
probability • Probabilistic reasoning and statistical analysis in TensorFlow by TensorFlow.
Bayesian Factor Analysis Regression in Python with PyMC3 by Austin Rochford.
⇆ Comparing different clustering algorithms on toy datasets by scikit-learn developers.
Multiple Linear Regression and Visualization in Python • Pythonic Excursions by Eric Kim.
Generalized Linear Mixed Effects Models in R and Python with GPBoost • An introduction and comparison with ‘lme4’ and ‘statsmodels’ by Fabio Sigrist.
all-of-statistics • Self-study on Larry Wasserman’s All of Statistics by Telmo Correa.
Finally! Bayesian Hierarchical Modelling at Scale - Florian Wilhelm by Florian Wilhelm.
Data manipulation and utilities
Measuring the memory usage of a Pandas DataFrame by Itamar Turner-Trauring.
Introduction to Data Analysis Using Pandas by Stefanie Molin.
Ploomber • On writing clean Jupyter notebooks by Eduardo Blancas.
JupyterLite — a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions. by JupyterLite Contributors.
Natural language processing
How GPT3 Works - Visualizations and Animations – Jay Alammar – Visualizing machine learning one concept at a time by Jay Alammar.
stanza • Official Stanford NLP Python Library for Many Human Languages by Stanford NLP Group.
Deep learning
Understanding Gradient Descent with Python by Nikola M. Zivkovic.
Building a simple expected pass completion (xP) model using Keras by Paul Minogue.
Guide to Reinforcement Learning with Python and TensorFlow by Nikola M. Zivkovic.
torchdyn • A PyTorch based library for all things neural differential equations by DiffEqML Research Group.
Node Classification with Graph Neural Networks by Khalid Salama.
Create your profile
Only paid subscribers can comment on this post
Check your email
For your security, we need to re-authenticate you.
Click the link we sent to , or click here to sign in.