Metadata-Version: 2.4
Name: acryl-datahub-airflow-plugin
Version: 1.6.0.10rc1
Summary: DataHub Airflow plugin — automatically capture pipeline lineage, run history, and task metadata from Apache Airflow
Home-page: https://datahub.com/
License: Apache-2.0
Project-URL: Documentation, https://docs.datahub.com/
Project-URL: Source, https://github.com/datahub-project/datahub
Project-URL: Changelog, https://github.com/acryldata/datahub/releases
Project-URL: Releases, https://github.com/acryldata/datahub/releases
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: apache-airflow-providers-openlineage>=2.1.0
Requires-Dist: pydantic>=2.4.0
Requires-Dist: acryl-datahub[datahub-rest,sql-parser]==1.6.0.10rc1
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.10rc1
Requires-Dist: apache-airflow<4.0.0,>=3.0.0
Provides-Extra: ignore
Provides-Extra: airflow3
Provides-Extra: datahub-rest
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.10rc1; extra == "datahub-rest"
Provides-Extra: datahub-kafka
Requires-Dist: acryl-datahub[datahub-kafka]==1.6.0.10rc1; extra == "datahub-kafka"
Provides-Extra: datahub-file
Requires-Dist: acryl-datahub[sync-file-emitter]==1.6.0.10rc1; extra == "datahub-file"
Provides-Extra: dev
Requires-Dist: pytest-cov>=2.8.1; extra == "dev"
Requires-Dist: deepdiff!=8.0.0; extra == "dev"
Requires-Dist: types-tabulate; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: acryl-datahub[datahub-rest]==1.6.0.10rc1; extra == "dev"
Requires-Dist: apache-airflow<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: coverage>=5.1; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: types-dataclasses; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: mypy==1.17.1; extra == "dev"
Requires-Dist: types-six; extra == "dev"
Requires-Dist: pytest>=6.2.2; extra == "dev"
Requires-Dist: types-cachetools; extra == "dev"
Requires-Dist: types-python-dateutil; extra == "dev"
Requires-Dist: packaging; extra == "dev"
Requires-Dist: tenacity; extra == "dev"
Requires-Dist: types-toml; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: sqlalchemy-stubs; extra == "dev"
Requires-Dist: apache-airflow-providers-openlineage>=2.1.0; extra == "dev"
Requires-Dist: tox-uv; extra == "dev"
Requires-Dist: pydantic>=2.4.0; extra == "dev"
Requires-Dist: acryl-datahub[datahub-rest,sql-parser]==1.6.0.10rc1; extra == "dev"
Requires-Dist: ruff==0.11.7; extra == "dev"
Requires-Dist: types-click==0.1.12; extra == "dev"
Provides-Extra: integration-tests
Requires-Dist: apache-airflow-providers-snowflake; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-amazon; extra == "integration-tests"
Requires-Dist: acryl-datahub[testing-utils]==1.6.0.10rc1; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-google; extra == "integration-tests"
Requires-Dist: acryl-datahub[datahub-kafka]==1.6.0.10rc1; extra == "integration-tests"
Requires-Dist: acryl-datahub[sync-file-emitter]==1.6.0.10rc1; extra == "integration-tests"
Requires-Dist: virtualenv; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-teradata; extra == "integration-tests"
Requires-Dist: apache-airflow-providers-sqlite; extra == "integration-tests"
Requires-Dist: snowflake-connector-python>=2.7.10; extra == "integration-tests"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

<!-- PyPI long description. Keep concise, feature-discovery-first. -->

# DataHub Airflow Plugin

**Automatic lineage and run metadata from Apache Airflow into DataHub** — captures DAG structure, task inputs/outputs, and run history with no manual instrumentation.

## What you can do

- **Capture pipeline lineage** — automatically extract dataset-level and column-level lineage from SQL operators
- **Track run history** — record task execution status, duration, and failures in DataHub
- **Enhance OpenLineage** — patches Airflow's OpenLineage extractors with DataHub's advanced SQL parser for richer lineage
- **Support multiple emitters** — send metadata via REST, Kafka, or file

## Version compatibility

| Airflow Version | Support                                        |
| --------------- | ---------------------------------------------- |
| 3.0+            | ✅ Fully supported                             |
| 2.x             | ❌ Use `acryl-datahub-airflow-plugin <= 1.6.0` |

## Installation

```bash
pip install acryl-datahub-airflow-plugin

# With Kafka emitter
pip install 'acryl-datahub-airflow-plugin[datahub-kafka]'
```

## Configuration

Add to `airflow.cfg`:

```ini
[datahub]
enabled = True
conn_id = datahub_rest_default   # Airflow connection pointing to your DataHub GMS
```

Set up the Airflow connection:

```bash
airflow connections add datahub_rest_default \
  --conn-type HTTP \
  --conn-host http://localhost:8080
```

The plugin activates automatically — no changes to your DAG code required.

## Key configuration options

| Option                               | Default | Description                                       |
| ------------------------------------ | ------- | ------------------------------------------------- |
| `enable_extractors`                  | `True`  | Enhance OpenLineage extractors                    |
| `patch_sql_parser`                   | `True`  | Use DataHub's SQL parser for column-level lineage |
| `enable_multi_statement_sql_parsing` | `False` | Resolve temp tables across multi-statement tasks  |

## Links

- [Full documentation](https://docs.datahub.com/docs/lineage/airflow)
- [Apache Airflow](https://airflow.apache.org/)
- [GitHub](https://github.com/datahub-project/datahub)
- [Slack community](https://datahub.com/slack)
