Getting Started#

Introduction to Flyte#

Flyte is a workflow orchestrator that seamlessly unifies data, machine learning, and analytics stacks for building robust and reliable applications.

This introduction provides a quick overview of how to get Flyte up and running on your local machine.

Installation#

Prerequisites

Install Docker and ensure that you have the Docker daemon running.

Flyte supports any OCI-compatible container technology (like Podman, LXD, and Containerd) when running tasks on a Flyte cluster, but for the purpose of this guide, flytectl uses Docker to spin up a local Kubernetes cluster so that you can interact with it on your machine.

First install flytekit, Flyte’s Python SDK and Scikit-learn.

pip install flytekit flytekitplugins-deck-standard scikit-learn

Then install flytectl, which the command-line interface for interacting with a Flyte backend.

Homebrew

brew install flyteorg/homebrew-tap/flytectl

Curl

curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin

Creating a Workflow#

The first workflow we’ll create is a simple model training workflow that consists of three steps that will:

🍷 Get the classic wine dataset using sklearn.
📊 Process the data that simplifies the 3-class prediction problem into a binary classification problem by consolidating class labels 1 and 2 into a single class.
🤖 Train a LogisticRegression model to learn a binary classifier.

First, we’ll define three tasks for each of these steps. Create a file called example.py and copy the following code into it.

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression

import flytekit.extras.sklearn
from flytekit import task, workflow


@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    """Simplify the task from a 3-class to a binary classification problem."""
    return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))

@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
    """Train a model on the wine dataset."""
    features = data.drop("target", axis="columns")
    target = data["target"]
    return LogisticRegression(max_iter=3000, **hyperparameters).fit(features, target)

As we can see in the code snippet above, we defined three tasks as Python functions: get_data, process_data, and train_model.

In Flyte, tasks are the most basic unit of compute and serve as the building blocks 🧱 for more complex applications. A task is a function that takes some inputs and produces an output. We can use these tasks to define a simple model training workflow:

@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )

Note

A task can also be an isolated piece of compute that takes no inputs and produces no output, but for the most part to do something useful a task is typically written with inputs and outputs.

A workflow is also defined as a Python function, and it specifies the flow of data between tasks and, more generally, the dependencies between tasks 🔀.

Running Flyte Workflows in Python#

You can run the workflow in example.py on a local Python by using pyflyte, the CLI that ships with flytekit.

pyflyte run example.py training_workflow \
    --hyperparameters '{"C": 0.1}'

Running Workflows in a Flyte Cluster#

You can also use pyflyte run to execute workflows on a Flyte cluster. To do so, first spin up a local demo cluster. flytectl uses Docker to create a local Kubernetes cluster and minimal Flyte backend that you can use to run the example above:

Important

Before you start the local cluster, make sure that you allocate a minimum of 4 CPUs and 3 GB of memory in your Docker daemon. If you’re using the Docker Desktop, you can do this easily by going to:

Settings > Resources > Advanced

Then set the CPUs and Memory sliders to the appropriate levels.

flytectl demo start

Expected Output:

👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
❇️ Run the following command to export sandbox environment variables for accessing flytectl
	export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml
🐋 Flyte sandbox ships with a Docker registry. Tag and push custom workflow images to localhost:30000
📂 The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console

Important

Make sure to export the FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml environment variable in your shell.

Then, run the workflow on the Flyte cluster with pyflyte run using the --remote flag:

pyflyte run --remote example.py training_workflow \
    --hyperparameters '{"C": 0.1}'

Expected Output: A URL to the workflow execution on your demo Flyte cluster:

Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/<execution_name> to see execution in the console.

Where <execution_name> is a unique identifier for the workflow execution.

Inspect the Results#

Navigate to the URL produced by pyflyte run. This will take you to FlyteConsole, the web UI used to manage Flyte entities such as tasks, workflows, and executions.

getting started console

Note

There are a few features about FlyteConsole worth pointing out in the GIF above:

The default execution view shows the list of tasks executing in sequential order.
The right-hand panel shows metadata about the task execution, including logs, inputs, outputs, and task metadata.
The Graph view shows the execution graph of the workflow, providing visual information about the topology of the graph and the state of each node as the workflow progresses.
On completion, you can inspect the outputs of each task, and ultimately, the overarching workflow.

Summary#

🎉 Congratulations! In this introductory guide, you:

📜 Created a Flyte script, which trains a binary classification model.
🚀 Spun up a demo Flyte cluster on your local system.
👟 Ran a workflow locally and on a demo Flyte cluster.

What’s Next?#

Follow the rest of the sections in the documentation to get a better understanding of the key constructs that make Flyte such a powerful orchestration tool 💪.

Recommendation

If you’re new to Flyte we recommend that you go through the Flyte Fundamentals and Core Use Cases section before diving into the other sections of the documentation.

🔤 Flyte Fundamentals	A brief tour of the Flyte’s main concepts and development lifecycle
🌟 Core Use Cases	An overview of core uses cases for data, machine learning, and analytics practitioners.
📖 User Guide	A comprehensive view of Flyte’s functionality for data scientists, ML engineers, data engineers, and data analysts.
📚 Tutorials	End-to-end examples of Flyte for data/feature engineering, machine learning, bioinformatics, and more.
🚀 Deployment Guide	Guides for platform engineers to deploy a Flyte cluster on your own infrastructure.