Introduction to Kedro — pipeline for data science

Nok Chan
8 min readDec 4, 2020

What is Kedro

Kedro is a development workflow tool that allows you to create portable data pipelines. It applies software engineering best practices to make your data science code reproducible, modular and well-documented. For example, you can easily create a template for new projects, build a documentation site, lint your code and always have an expected structure to find your config and data.

Kedro is a lightweight pipeline library without need to setup infracstructure.

In comparison to Airflow or Luigi, Kedro is much more lightweight. It helps you to write production-ready code, and let data engineer and data scientist work together with the same code base. It also has good Jupyter support, so data scientists can still use the tool that they are familiar with.

If you don’t want to get through Medium paywall, please go to my personal page, you can also subscribe with RSS, I am trying to write more and shorter blog about everything I code. It has better code formatting there.

Why we need a pipeline tool

Data Scientist often starts their development with a Jupyter Notebook. As the notebook grows larger, it’s inevitable to convert it to a python script. It starts with one file, then another one, and it accumulates quickly. Converting a notebook could be more than just pasting the code in a script. It involves careful thinking and refactoring.

A pipeline library can be helpful in a few ways:

  • modular pipeline, it can be executed partially.
  • easily run in parallel
  • check for loop dependencies

Functions and Pipeline

Nodes

def split_data(data: pd.DataFrame, example_test_data_ratio: float):
...
return dict(
train_x=train_data_x,
train_y=train_data_y,
test_x=test_data_x,
test_y=test_data_y,
)

Node is the core component of kedro Pipeline. For example, we have a python function that split data into train/test set. A node take 4 arguments. func, inputs, outputs, name. To use this function as a node, we would write something like this.

--

--