SpecialistOff.NET / Вопросы / Статьи / Фрагменты кода / Резюме / Метки / Помощь / Файлы
НазадМетки: airflow apache airflow
Run subsections of a DAG for a specified date range. If reset_dag_run option is used, backfill will first prompt users whether airflow should clear all the previous dag_run and task_instances within the backfill date range. If rerun_failed_tasks is used, backfill will auto re-run the previous failed task instances within the backfill date range.
airflow backfill [-h] [-t TASK_REGEX] [-s START_DATE] [-e END_DATE] [-m] [-l]
[-x] [-i] [-I] [-sd SUBDIR] [--pool POOL]
[--delay_on_limit DELAY_ON_LIMIT] [-dr] [-v] [-c CONF]
[--reset_dagruns] [--rerun_failed_tasks] [-B]
dag_id
The id of the dag
The regex to filter specific task_ids to backfill (optional)
Override start_date YYYY-MM-DD
Override end_date YYYY-MM-DD
Mark jobs as succeeded without running them
Default: False
Run the task using the LocalExecutor
Default: False
Do not attempt to pickle the DAG object to send over to the workers, just tell the workers to run their version of the code.
Default: False
Skip upstream tasks, run only the tasks matching the regexp. Only works in conjunction with task_regex
Default: False
Ignores depends_on_past dependencies for the first set of tasks only (subsequent executions in the backfill DO respect depends_on_past).
Default: False
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Resource pool to use
Amount of time in seconds to wait when the limit on maximum active dag runs (max_active_runs) has been reached before trying to execute a dag run again.
Default: 1.0
Perform a dry run
Default: False
Make logging output more verbose
Default: False
JSON string that gets pickled into the DagRun’s conf attribute
if set, the backfill will delete existing backfill-related DAG runs and start anew with fresh, running DAG runs
Default: False
if set, the backfill will auto-rerun all the failed tasks for the backfill date range instead of throwing exceptions
Default: False
if set, the backfill will run tasks from the most recent day first. if there are tasks that depend_on_past this option will throw an exception
Default: False
List dag runs given a DAG id. If state option is given, it will onlysearch for all the dagruns with the given state. If no_backfill option is given, it will filter outall backfill dagruns for given dag id.
airflow list_dag_runs [-h] [--no_backfill] [--state STATE] dag_id
The id of the dag
filter all the backfill dagruns given the dag id
Default: False
Only list the dag runs corresponding to the state
List the tasks within a DAG
airflow list_tasks [-h] [-t] [-sd SUBDIR] dag_id
The id of the dag
Tree view
Default: False
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Clear a set of task instance, as if they never ran
airflow clear [-h] [-t TASK_REGEX] [-s START_DATE] [-e END_DATE] [-sd SUBDIR]
[-u] [-d] [-c] [-f] [-r] [-x] [-xp] [-dx]
dag_id
The id of the dag
The regex to filter specific task_ids to backfill (optional)
Override start_date YYYY-MM-DD
Override end_date YYYY-MM-DD
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Include upstream tasks
Default: False
Include downstream tasks
Default: False
Do not request confirmation
Default: False
Only failed jobs
Default: False
Only running jobs
Default: False
Exclude subdags
Default: False
Exclude ParentDAGS if the task cleared is a part of a SubDAG
Default: False
Search dag_id as regex instead of exact string
Default: False
Pause a DAG
airflow pause [-h] [-sd SUBDIR] dag_id
The id of the dag
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Resume a paused DAG
airflow unpause [-h] [-sd SUBDIR] dag_id
The id of the dag
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Trigger a DAG run
airflow trigger_dag [-h] [-sd SUBDIR] [-r RUN_ID] [-c CONF] [-e EXEC_DATE]
dag_id
The id of the dag
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Helps to identify this run
JSON string that gets pickled into the DagRun’s conf attribute
The execution date of the DAG
Delete all DB records related to the specified DAG
airflow delete_dag [-h] [-y] dag_id
The id of the dag
Do not prompt to confirm reset. Use with care!
Default: False
CRUD operations on pools
airflow pool [-h] [-s NAME SLOT_COUNT POOL_DESCRIPTION] [-g NAME] [-x NAME]
[-i FILEPATH] [-e FILEPATH]
Set pool slot count and description, respectively
Get pool info
Delete a pool
Import pool from JSON file
Export pool to JSON file
CRUD operations on variables
airflow variables [-h] [-s KEY VAL] [-g KEY] [-j] [-d VAL] [-i FILEPATH]
[-e FILEPATH] [-x KEY]
Set a variable
Get value of a variable
Deserialize JSON variable
Default: False
Default value returned if variable does not exist
Import variables from JSON file
Export variables to JSON file
Delete a variable
Start a kerberos ticket renewer
airflow kerberos [-h] [-kt [KEYTAB]] [--pid [PID]] [-D] [--stdout STDOUT]
[--stderr STDERR] [-l LOG_FILE]
[principal]
kerberos principal
keytab
Default: “airflow.keytab”
PID file location
Daemonize instead of running in the foreground
Default: False
Redirect stdout to this file
Redirect stderr to this file
Location of the log file
Render a task instance’s template(s)
airflow render [-h] [-sd SUBDIR] dag_id task_id execution_date
The id of the dag
The id of the task
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Run a single task instance
airflow run [-h] [-sd SUBDIR] [-m] [-f] [--pool POOL] [--cfg_path CFG_PATH]
[-l] [-A] [-i] [-I] [--ship_dag] [-p PICKLE] [-int]
dag_id task_id execution_date
The id of the dag
The id of the task
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Mark jobs as succeeded without running them
Default: False
Ignore previous task instance state, rerun regardless if task already succeeded/failed
Default: False
Resource pool to use
Path to config file to use instead of airflow.cfg
Run the task using the LocalExecutor
Default: False
Ignores all non-critical dependencies, including ignore_ti_state and ignore_task_deps
Default: False
Ignore task-specific dependencies, e.g. upstream, depends_on_past, and retry delay dependencies
Default: False
Ignore depends_on_past dependencies (but respect upstream dependencies)
Default: False
Pickles (serializes) the DAG and ships it to the worker
Default: False
Serialized pickle object of the entire dag (used internally)
Do not capture standard output and error streams (useful for interactive debugging)
Default: False
Initialize the metadata database
airflow initdb [-h]
List all the DAGs
airflow list_dags [-h] [-sd SUBDIR] [-r]
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Show DagBag loading report
Default: False
Get the status of a dag run
airflow dag_state [-h] [-sd SUBDIR] dag_id execution_date
The id of the dag
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Returns the unmet dependencies for a task instance from the perspective of the scheduler. In other words, why a task instance doesn’t get scheduled and then queued by the scheduler, and then run by an executor).
airflow task_failed_deps [-h] [-sd SUBDIR] dag_id task_id execution_date
The id of the dag
The id of the task
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Get the status of a task instance
airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date
The id of the dag
The id of the task
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Serve logs generate by worker
airflow serve_logs [-h]
Test a task instance. This will run a task without checking for dependencies or recording its state in the database.
airflow test [-h] [-sd SUBDIR] [-dr] [-tp TASK_PARAMS] [-pm]
dag_id task_id execution_date
The id of the dag
The id of the task
The execution date of the DAG
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Perform a dry run
Default: False
Sends a JSON params dict to the task
Open debugger on uncaught exception
Default: False
Start a Airflow webserver instance
airflow webserver [-h] [-p PORT] [-w WORKERS]
[-k {sync,eventlet,gevent,tornado}] [-t WORKER_TIMEOUT]
[-hn HOSTNAME] [--pid [PID]] [-D] [--stdout STDOUT]
[--stderr STDERR] [-A ACCESS_LOGFILE] [-E ERROR_LOGFILE]
[-l LOG_FILE] [--ssl_cert SSL_CERT] [--ssl_key SSL_KEY] [-d]
The port on which to run the server
Default: 8080
Number of workers to run the webserver on
Default: 1
Possible choices: sync, eventlet, gevent, tornado
The worker class to use for Gunicorn
Default: “sync”
The timeout for waiting on webserver workers
Default: 120
Set the hostname on which to run the web server
Default: “0.0.0.0”
PID file location
Daemonize instead of running in the foreground
Default: False
Redirect stdout to this file
Redirect stderr to this file
The logfile to store the webserver access log. Use ‘-‘ to print to stderr.
Default: “-“
The logfile to store the webserver error log. Use ‘-‘ to print to stderr.
Default: “-“
Location of the log file
Path to the SSL certificate for the webserver
Path to the key to use with the SSL certificate
Use the server that ships with Flask in debug mode
Default: False
Burn down and rebuild the metadata database
airflow resetdb [-h] [-y]
Do not prompt to confirm reset. Use with care!
Default: False
Upgrade the metadata database to latest version
airflow upgradedb [-h]
Start a scheduler instance
airflow scheduler [-h] [-d DAG_ID] [-sd SUBDIR] [-r RUN_DURATION]
[-n NUM_RUNS] [-p] [--pid [PID]] [-D] [--stdout STDOUT]
[--stderr STDERR] [-l LOG_FILE]
The id of the dag to run
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Set number of seconds to execute before exiting
Set the number of runs to execute before exiting
Default: -1
Attempt to pickle the DAG object to send over to the workers, instead of letting workers run their version of the code.
Default: False
PID file location
Daemonize instead of running in the foreground
Default: False
Redirect stdout to this file
Redirect stderr to this file
Location of the log file
Start a Celery worker node
airflow worker [-h] [-p] [-q QUEUES] [-c CONCURRENCY] [-cn CELERY_HOSTNAME]
[--pid [PID]] [-D] [--stdout STDOUT] [--stderr STDERR]
[-l LOG_FILE] [-a AUTOSCALE]
Attempt to pickle the DAG object to send over to the workers, instead of letting workers run their version of the code.
Default: False
Comma delimited list of queues to serve
Default: “default”
The number of worker processes
Default: 4
Set the hostname of celery worker if you have multiple workers on a single machine.
PID file location
Daemonize instead of running in the foreground
Default: False
Redirect stdout to this file
Redirect stderr to this file
Location of the log file
Minimum and Maximum number of worker to autoscale
Start a Celery Flower
airflow flower [-h] [-hn HOSTNAME] [-p PORT] [-fc FLOWER_CONF] [-u URL_PREFIX]
[-ba BASIC_AUTH] [-a BROKER_API] [--pid [PID]] [-D]
[--stdout STDOUT] [--stderr STDERR] [-l LOG_FILE]
Set the hostname on which to run the server
Default: “0.0.0.0”
The port on which to run the server
Default: 5555
Configuration file for flower
URL prefix for Flower
Securing Flower with Basic Authentication. Accepts user:password pairs separated by a comma. Example: flower_basic_auth = user1:password1,user2:password2
Broker api
PID file location
Daemonize instead of running in the foreground
Default: False
Redirect stdout to this file
Redirect stderr to this file
Location of the log file
Show the version
airflow version [-h]
List/Add/Delete connections
airflow connections [-h] [-l] [-a] [-d] [--conn_id CONN_ID]
[--conn_uri CONN_URI] [--conn_extra CONN_EXTRA]
[--conn_type CONN_TYPE] [--conn_host CONN_HOST]
[--conn_login CONN_LOGIN] [--conn_password CONN_PASSWORD]
[--conn_schema CONN_SCHEMA] [--conn_port CONN_PORT]
List all connections
Default: False
Add a connection
Default: False
Delete a connection
Default: False
Connection id, required to add/delete a connection
Connection URI, required to add a connection without conn_type
Connection Extra field, optional when adding a connection
Connection type, required to add a connection without conn_uri
Connection host, optional when adding a connection
Connection login, optional when adding a connection
Connection password, optional when adding a connection
Connection schema, optional when adding a connection
Connection port, optional when adding a connection
Create an account for the Web UI (FAB-based)
airflow create_user [-h] [-r ROLE] [-u USERNAME] [-e EMAIL] [-f FIRSTNAME]
[-l LASTNAME] [-p PASSWORD] [--use_random_password]
Role of the user. Existing roles include Admin, User, Op, Viewer, and Public
Username of the user
Email of the user
First name of the user
Last name of the user
Password of the user
Do not prompt for password. Use random string instead
Default: False
Delete an account for the Web UI
airflow delete_user [-h] [-u USERNAME]
Username of the user
List accounts for the Web UI
airflow list_users [-h]
Update existing role’s permissions.
airflow sync_perm [-h]
Get the next execution datetime of a DAG.
airflow next_execution [-h] [-sd SUBDIR] dag_id
The id of the dag
File location or directory from which to look for the dag. Defaults to ‘[AIRFLOW_HOME]/dags’ where [AIRFLOW_HOME] is the value you set for ‘AIRFLOW_HOME’ config you set in ‘airflow.cfg’
Default: “[AIRFLOW_HOME]/dags”
Rotate all encrypted connection credentials and variables; see https://airflow.readthedocs.io/en/stable/howto/secure-connections.html#rotating-encryption-keys.
airflow rotate_fernet_key [-h]