Metadata

Propagate additional information through pipelines

Metadata is arbitrary JSON-serializable data that can be exposed to all Tasks in a Pipeline. Any tasks with code sources can read upstream metadata while Python Tasks can also transform and generate metadata. Metadata is automatically propagated to downstream non-Python tasks.

Metadata in SQL Tasks

In SQL-based tasks, metadata can be accessed via Jinja templating. For example, if a Pipeline has a metadata key my_key with value my_value, the following code will inject the value into the query:

1
SELECT * FROM {{ metadata.my_key }}

Metadata in Python Tasks

In Python-based tasks, metadata can be accessed via the metadata argument to the user-definedmain() function. Accessing this argument will yield a dictionary. These values can be modified and returned as part of the main() function’s output dictionary. For example, the below Task accesses a metadata attribute old_key and adds another key new_key with value my_value:

1
2
3
4
5
6
7
8
def main(data, metadata):
    df = data["data"]
    df = df[df["col"] > metadata["old_key"]]
    metadata["new_key"] = "new_value"
    return {
        "data": df, 
        "metadata": metadata,
    }