Setting up Airflow on AWS Linux was not direct, because of outdated default packages. For example I had trouble using setuid
in Upstart config, because AWS Linux AMI came with 0.6.5
version of Upstart.
AMI Version: amzn-ami-hvm-2016.09.1.20161221-x86_64-gp2 (ami-c51e3eb6)
1
|
sudo yum install gcc-c++ python-devel python-setuptools
|
Upgrade pip
1
|
sudo pip install --upgrade pip
|
Install airflow using pip
1
|
sudo /usr/local/bin/pip install airflow[s3, hive, python]
|
Create User and Group
1
2
3
|
sudo groupadd airflow
sudo useradd airflow -g airflow
sudo passwd -d airflow
|
This will create a password less user airflow
Initialize Airflow
1
2
3
|
su airflow
cd ~
airflow initdb
|
Test run
1
2
3
|
su airflow
cd ~
airflow webserver
|
You should be able to view Airflow ui at port 8080
Upstart Config for Airflow Webserver
Now let’s use upstart to manage Airflow process and respawning
This Amazon Linux AMI comes with Upstart 0.6.5
, which is very sad. So setuid
and setgid
doesnot work.
airflow-webserver.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
description "Airflow webserver daemon"
start on runlevel [2345]
stop on runlevel [016]
respawn
respawn limit 5 30
env AIRFLOW_CONFIG=/home/airflow/airflow/airflow.cfg
env AIRFLOW_HOME=/home/airflow/airflow/
export AIRFLOW_CONFIG
export AIRFLOW_HOME
pre-start script
echo "starting airflow-webserver..." >> /var/log/airflow-webserver.log
echo $AIRFLOW_HOME >> /var/log/airflow-webserver.log
echo $AIRFLOW_CONFIG >> /var/log/airflow-webserver.log
end script
# exec su -s /bin/sh -c 'exec "$0" "$@"' username -- /path/to/command [parameters...]
exec su -s /bin/sh -c 'exec "$0" "$@"' airflow -- /usr/local/bin/airflow webserver >> /var/log/airflow-webserver.log
pre-stop script
echo "stopping airflow-webserver" >> /var/log/airflow-webserver.log
end script
|
You should be able to view airflow-webserver
in initctl list
Start Airflow with upstart
1
|
sudo initctl start airflow-webserver
|
You can find the process id at /home/airflow/airflow/airflow-webserver.pid
Upstart Config for Airflow Scheduler
airflow-scheduler.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
description "Airflow scheduler daemon"
start on started networking
stop on (deconfiguring-networking or runlevel [016])
respawn
respawn limit 5 10
env AIRFLOW_CONFIG=/home/airflow/airflow/airflow.cfg
env AIRFLOW_HOME=/home/airflow/airflow/
export AIRFLOW_CONFIG
export AIRFLOW_HOME
# required setting, 0 sets it to unlimited. Scheduler will restart after every X runs
env SCHEDULER_RUNS=5
export SCHEDULER_RUNS
# exec su -s /bin/sh -c 'exec "$0" "$@"' username -- /path/to/command [parameters...]
exec su -s /bin/sh -c 'exec "$0" "$@"' airflow -- /usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS} >> /var/log/airflow-scheduler.log
|
Start Airflow Scheduler with upstart
1
|
sudo initctl start airflow-scheduler
|
This should keep Airflow Scheduler running in the background and respawn it in case of failures.
References