The Data Science Digest

Share this post
📊 Python Data Science Digest August 2021
thedatasciencedigest.substack.com

📊 Python Data Science Digest August 2021

A list of the most popular posts featured on 'Python Posts you might have missed!' in July 2021'.

Alastair Rushworth
Aug 22, 2021
2
Share this post
📊 Python Data Science Digest August 2021
thedatasciencedigest.substack.com

Featured posts

pastedGraphic.png
From left: Haystack logo (Deepset AI), Kats project logo (Facebook Research), PyJanitor logo (PyJanitor devs).

Kats: One stop shop for time series analysis in Python by Facebook Research.

  • A Swiss army knife for time series analysis! Time series analysis can be a bit of a specialised niche at times. Kats brings together some of the most important parts of the time series analyst’s toolbox into a single user friendly package. Includes support for modelling, feature extraction, outlier / changepoint detection and a lot more.

pyjanitor • a Python implementation of the #rstats package janitor, and provides a clean API for cleaning data by PyJanitor devs.

  • Clean up your pandas data frames with a single API. From the authors: “We take a base data file as the starting point, and perform actions on it, such as removing null/empty rows, replacing them with other values, adding/renaming/removing columns of data, filtering rows and others. More formally, these steps along with their relationships and dependencies are commonly referred to as a Directed Acyclic Graph (DAG)."

haystack • End-to-end Python framework for building natural language search interfaces to data. Leverages Transformers and the State-of-the-Art of NLP by Deepset AI.

  • Build intelligent search interfaces to your documents. Haystack lets you use bleeding-edge NLP to search and retrieve from your documents and build simple search UIs. Includes tools to process, store, read and answer natural language questions. (I’m personally excited about this one, for the potential to index and search a corpus of data science news)!

pastedGraphic_1.png
Haystack document search architecture

Time Series Analysis

  • Time series Forecasting in Python: R, Part 1 (EDA) • Time series forecasting using various forecasting methods in Python: R in one notebook by Sandeep Pawar.

  • Multi-step Time Series Forecasting with ARIMA, LightGBM, and Prophet by Tomonori Masui.

  • Stock-market-forecasting • Forecasting directional movements of stock prices for intraday trading using LSTM and random forest by Pushpendu Ghosh.

Visualization

  • Data visualization and data analysis in Python (OkCupid dataset) by Amy Birdee.

  • How to Plot with Python: 8 Popular Graphs Made with pandas, matplotlib, seaborn, and plotly.express by Dylan Castillo.

  • Style sheets reference — Matplotlib by Matplotlib.

  • Palmer Penguins exploration with violinplots in Matplotlib by Tuo Wang, Tomas Capretto and Yan Holtz.

  • panel-highcharts • The panel-highcharts package makes it really easy to use HighCharts in Python, Notebooks and with HoloViz Panel by Marc Skov Madsen.

Python for teachers

  • Python for A Level Mathematics and Beyond by Dr Stephen Lynch.

Machine learning

  • Implementing Self-Organizing Maps with Python and TensorFlow by Nikola M. Zivkovic.

  • [Mastering XGBoost. Hyper-parameter Tuning: Optimization ](https://towardsdatascience.com/mastering-xgboost-2eb6bce6bc76 by Eric Luellen.

  • Permutation Importance with Multicollinear or Correlated Features by scikit-learn developers.

  • Beyond Grid Search: Hypercharge Hyperparameter Tuning for XGBoost by Druce Vertes.

Machine learning tutorials

  • Machine Learning from Scratch - Python Tutorials by Patrick Loeber.

  • All Python Libraries You Need For Machine Learning And Data Science by Patrick Loeber.

  • Scikit-learn Crash Course - Machine Learning Library for Python by Vincent D. Warmerdam.

  • ML-YouTube-Courses • A repository to index and organize the latest machine learning courses found on YouTube by dair.ai

  • homemade-machine-learning • 🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained by Oleksii Trekhleb.

  • Welcome to Reproducible Data Science — Reproducible Data Science + Python + Real-World Data by Valentin Danchev.

MLOps

  • evidently • Interactive reports to analyze machine learning models during validation or production monitoring by Evidently AI.

  • client • A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API by Weights & Biases.

  • Modeling Pipeline Optimization With scikit-learn by Machine Learning Mastery.

  • Continuous integration for data science with pytest, Github Actions, and Hypervector • Hypervector Blog by Hypervector.

  • tuplex • Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code by Leonhard Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf and Tim Kraska.

Statistical modelling

  • probability • Probabilistic reasoning and statistical analysis in TensorFlow by TensorFlow.

  • Bayesian Factor Analysis Regression in Python with PyMC3 by Austin Rochford.

  • ⇆ Comparing different clustering algorithms on toy datasets by scikit-learn developers.

  • Multiple Linear Regression and Visualization in Python • Pythonic Excursions by Eric Kim.

  • Generalized Linear Mixed Effects Models in R and Python with GPBoost • An introduction and comparison with ‘lme4’ and ‘statsmodels’ by Fabio Sigrist.

  • all-of-statistics • Self-study on Larry Wasserman’s All of Statistics by Telmo Correa.

  • Finally! Bayesian Hierarchical Modelling at Scale - Florian Wilhelm by Florian Wilhelm.

Data manipulation and utilities

  • Measuring the memory usage of a Pandas DataFrame by Itamar Turner-Trauring.

  • Introduction to Data Analysis Using Pandas by Stefanie Molin.

  • Ploomber • On writing clean Jupyter notebooks by Eduardo Blancas.

  • JupyterLite — a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions. by JupyterLite Contributors.

Natural language processing

  • How GPT3 Works - Visualizations and Animations – Jay Alammar – Visualizing machine learning one concept at a time by Jay Alammar.

  • stanza • Official Stanford NLP Python Library for Many Human Languages by Stanford NLP Group.

Deep learning

  • Understanding Gradient Descent with Python by Nikola M. Zivkovic.

  • Building a simple expected pass completion (xP) model using Keras by Paul Minogue.

  • Guide to Reinforcement Learning with Python and TensorFlow by Nikola M. Zivkovic.

  • torchdyn • A PyTorch based library for all things neural differential equations by DiffEqML Research Group.

  • Node Classification with Graph Neural Networks by Khalid Salama.

Share this post
📊 Python Data Science Digest August 2021
thedatasciencedigest.substack.com
Comments

Create your profile

0 subscriptions will be displayed on your profile (edit)

Skip for now

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

TopNewCommunity

No posts

Ready for more?

© 2022 Alastair Rushworth
Privacy ∙ Terms ∙ Collection notice
Publish on Substack Get the app
Substack is the home for great writing