Airflow High Available cluster - 3

Airflow logo

Part 3

Implementing the Airflow HA solution

Well, let’s continue building the HA cluster.

The current chapter of the tutorial is installing and configuring application for DAGs synchronization and Airflow itself.

1.8 csync2

Install and configure csync2:

                # apt-get -y install csync2
            
                root@node1:~# cat /etc/csync2.cfg
# Please read the documentation:
# http://oss.linbit.com/csync2/paper.pdf
nossl * *;
tempdir /tmp/;
lock-timeout 30;
  group DAGS
  {
     host node1;
     host node2;
     host node3;
     key /root/airflow/csync2.key_airflow_dags;
     include /root/airflow/dags;
     auto younger;
  }
            

Pre-shared keys can be generated using the following:

                # csync2 -k filename
            

Then specify the file with your keys in the string

                key /root/airflow/csync2.key_airflow_dags
            

This is used for authentication nodes.

Which directory is synchronized should be specified in

                include /root/airflow/dags;
            

So, add a job to crontab:

                # crontab -e
* * * * * /usr/sbin/csync2 -A -x
            

Then you can create a file or directory and they will appear on all nodes(will be synchronized). Of course, only one key is used here across all nodes.

To autostart the csync2 daemon with necessary params, it’s necessary to modify the systemd unit:

                root@node1:~# cat /lib/systemd/system/csync2.service

[Unit]
Description=Cluster file synchronization daemon
Documentation=man:csync2(1)
After=network.target

[Service]
ExecStart=/usr/sbin/csync2 -A -ii -l
StandardError=syslog

[Install]
WantedBy=multi-user.target
            

Start csync2:

                # systemctl start csync2
            

1.9 Airflow

And finally, we got the moment when all preparations have been done and we can install Airflow itself:

                pip3 install 'apache-airflow[celery]'
            

Initialize a database:

                # airflow db init
            

Create airflow components systemd units:

                # cat /lib/systemd/system/airflow-worker.service

[Unit]
Description=Airflow celery worker daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow celery worker
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target
            
                # cat /lib/systemd/system/airflow-scheduler.service

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
            
                # cat /lib/systemd/system/airflow-webserver.service

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service rabbitmq-server.service
Wants=postgresql.service rabbitmq-server.service

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target
            

Don’t forget to enable them.

Create config file by the command in the default directory:

                # mkdir ~/airflow
# airflow config list > ~/airflow/airflow.cfg
            

Find and modify the following settings:

in [code] section:

                executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@10.5.0.10/airflow
            

in section [celery]:

                broker_url = amqp://airflow:cafyilevyeHa@localhost:5672;amqp://airflow:cafyilevyeHa@node2:5672;amqp://airflow:cafyilevyeHa@node3:5672
result_backend = db+postgresql://airflow:airflow@10.5.0.10:5432/airflow
            

As for broker_url, I specified all three nodes, but it’s redundant because we suppose celery will connect to localhost and if the whole node is down, there is no reason to reconnect to anyone else node. But for reliability, I added three nodes. You can consider specifying only localhost.

Create a user:

                # airflow users create --username admin --firstname Denis --lastname Matveev --role Admin --email denis@example.com
            

Restart the airflow:

                # systemctl restart airflow-scheduler airflow-webserver airflow-worker
            

Look if some errors occur. 

To get access to the web interface I recommend using an an ssh tunnel.

Right now we have the Airflow cluster is up and running. Double-check all settings again to make sure you didn’t make an error.

Troubleshooting

  1. If you have issues with patroni, etcd, or PostgreSQL

Try the following:

                # systemctl stop etcd patroni
# rm /var/lib/etcd/default/* -rf
            

sometimes you can remove all PostgreSQL data:

                # rm -rf /var/lib/postgresql/13/main/
            

But be careful! This can be done only if you have test environment, production servers are not for experiments, you should understand what you are doing.

And restart the services:

                # systemctl start etcd patroni
            

Issues with csync can be resolved by removing directories with a database:

                # systemctl stop csync2
# rm /var/lib/csync2/*.db3
# systemctl start csync3
            

Then take a look at statuses of processes by issuing the command:

                # systemctl status <process>
            

Last but not least I’d like to say, or even strongly recommend, you should consider the implementation of monitoring of the system. What system to use is up to you. I use Zabbix and special templates for PostgreSQL, templates for tracking if processes are running, and so on. Any comments are welcome!


Only registered users can post comments. Please, login or signup.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

Denis Matveev

sysadmin/devops, Ignitia AB

@denismatveev
I am an IT professional with 15 years experience. I have a really strong background in system administration and programming.
Stats
50

Influence

4k

Total Hits

3

Posts