Статья №594: FAQ

Метки: [airflow]; [apache airflow]; 

Why isn’t my task getting scheduled?

There are very many reasons why your task might not be getting scheduled. Here are some of the common causes:

You may also want to read the Scheduler section of the docs and make sure you fully understand how it proceeds.

How do I trigger tasks based on another task’s failure?

Check out the Trigger Rule section in the Concepts section of the documentation

Why are connection passwords still not encrypted in the metadata db after I installed airflow[crypto]?

Check out the Connections section in the Configuration section of the documentation

What’s the deal with start_date?

start_date is partly legacy from the pre-DagRun era, but it is still relevant in many ways. When creating a new DAG, you probably want to set a global start_date for your tasks using default_args. The first DagRun to be created will be based on the min(start_date) for all your task. From that point on, the scheduler creates new DagRuns based on your schedule_interval and the corresponding task instances run as your dependencies are met. When introducing new tasks to your DAG, you need to pay special attention to start_date, and may want to reactivate inactive DagRuns to get the new task onboarded properly.

We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.

Previously we also recommended using rounded start_date in relation to your schedule_interval. This meant an @hourly would be at 00:00 minutes:seconds, a @daily job at midnight, a @monthlyjob on the first of the month. This is no longer required. Airflow will now auto align the start_dateand the schedule_interval, by using the start_date as the moment to start looking.

You can use any sensor or a TimeDeltaSensor to delay the execution of tasks within the schedule interval. While schedule_interval does allow specifying a datetime.timedelta object, we recommend using the macros or cron expressions instead, as it enforces this idea of rounded schedules.

When using depends_on_past=True it’s important to pay special attention to start_date as the past dependency is not enforced only on the specific schedule of the start_date specified for the task. It’s also important to watch DagRun activity status in time when introducing new depends_on_past=True, unless you are planning on running a backfill for the new task(s).

Also important to note is that the tasks start_date, in the context of a backfill CLI command, get overridden by the backfill’s command start_date. This allows for a backfill on tasks that have depends_on_past=True to actually start, if that wasn’t the case, the backfill just wouldn’t start.

How can I create DAGs dynamically?

Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their global namespace, and adds the objects it finds in the DagBag. Knowing this all we need is a way to dynamically assign variable in the global namespace, which is easily done in python using the globals() function for the standard library which behaves like a simple dictionary.

def create_dag(dag_id):
    A function returning a DAG object.

    return DAG(dag_id)

for i in range(10):
    dag_id = f'foo_{i}'
    globals()[dag_id] = DAG(dag_id)

    # or better, call a function that returns a DAG object!
    other_dag_id = f'bar_{i}'
    globals()[other_dag_id] = create_dag(other_dag_id)

What are all the airflow run commands in my process list?

There are many layers of airflow run commands, meaning it can call itself.

How can my airflow dag run faster?

There are three variables we could control to improve airflow dag performance:

How can we reduce the airflow UI page load time?

If your dag takes long time to load, you could reduce the value of default_dag_run_display_numberconfiguration in airflow.cfg to a smaller value. This configurable controls the number of dag run to show in UI with default value 25.

How to fix Exception: Global variable explicit_defaults_for_timestamp needs to be on (1)?

This means explicit_defaults_for_timestamp is disabled in your mysql server and you need to enable it by:

  1. Set explicit_defaults_for_timestamp = 1 under the mysqld section in your my.cnf file.

  2. Restart the Mysql server.

How to reduce airflow dag scheduling latency in production?

#airflow #apache airflow 
© RemiZOffAlex