What is Kedro
Kedro is a development workflow tool that allows you to create portable data pipelines. It applies software engineering best practices to make your data science code reproducible, modular and well-documented. For example, you can easily create a template for new projects, build a documentation site, lint your code and always have an expected structure to find your config and data.
Kedro is a lightweight pipeline library without need to setup infracstructure.
In comparison to Airflow or Luigi, Kedro is much more lightweight. It helps you to write production-ready code, and let data engineer and data scientist work together with the same code base. It also has good Jupyter support, so data scientists can still use the tool that they are familiar with.
If you don’t want to get through Medium paywall, please go to my personal page, you can also subscribe with RSS, I am trying to write more and shorter blog about everything I code. It has better code formatting there.
Why we need a pipeline tool
Data Scientist often starts their development with a Jupyter Notebook. As the notebook grows larger, it’s inevitable to convert it to a python script. It starts with one file, then another one, and it…