The default Ansible execution: every host runs every task, in roughly parallel batches
of forks. That's fine for stateless installs but a disaster for stateful systems.
Restarting all 6 MySQL replicas at once = 6 minutes of read-traffic downtime. Restarting
them one at a time, with replica-lag checks between each = zero downtime.
The serial keyword overrides Ansible's normal "all hosts at once" behaviour.
It accepts a single number, a percentage, or a list for tapered rollouts:
---
# Form 1: one host at a time (safest, slowest)
- hosts: mysql_replica
serial: 1
tasks:
- name: Apply config and restart
ansible.builtin.template:
src: my.cnf.j2
dest: /etc/my.cnf
notify: restart mysql
# Form 2: percentage of fleet at a time (scales with size)
- hosts: webservers
serial: "25%" # 25% of fleet per batch
tasks:
- name: Deploy new release
ansible.builtin.include_role:
name: deploy_app
# Form 3: tapered rollout — 1 host first, then 3, then everyone
- hosts: api_servers
serial:
- 1 # canary: just one host
- 3 # then three
- "100%" # then the rest
tasks:
- name: Apply patch
ansible.builtin.dnf:
name: "*"
state: latest
By default, Ansible continues to the next batch even if the current one had failures. For risky changes, set a threshold — if more than N% of the batch fails, the play aborts before the next batch:
---
- hosts: mysql_replica
serial: 2
max_fail_percentage: 25 # if >25% of a batch fails, stop
tasks:
- name: Restart mysql
ansible.builtin.systemd:
name: mysqld
state: restarted
- name: Wait for replication to catch up
community.mysql.mysql_query:
query: "SHOW SLAVE STATUS"
login_user: root
login_password: "{{{{ vault_mysql_root_password }}}}"
register: status
until: (status.query_result[0][0].Seconds_Behind_Master | int) < 5
retries: 30
delay: 10
# With serial: 2 + max_fail_percentage: 25:
# - If 1 host of 2 in a batch fails (50%), play aborts
# - If 0 of 2 fail, continue to next batch
max_fail_percentage is the only way to stop a half-broken rollout from continuing into the next batch and breaking that one too.---
- name: Rolling MySQL restart, one replica at a time
hosts: mysql_replica
serial: 1
max_fail_percentage: 0 # zero tolerance — abort on any failure
become: true
tasks:
- name: Drain this host from ProxySQL
community.proxysql.proxysql_mysql_servers:
login_host: "{{{{ proxysql_admin_host }}}}"
login_user: admin
login_password: "{{{{ vault_proxysql_admin_password }}}}"
hostgroup_id: 20 # reader hostgroup
hostname: "{{{{ ansible_default_ipv4.address }}}}"
status: OFFLINE_SOFT
state: present
delegate_to: localhost
- name: Wait for active queries to drain
community.proxysql.proxysql_query:
login_host: "{{{{ proxysql_admin_host }}}}"
login_user: admin
login_password: "{{{{ vault_proxysql_admin_password }}}}"
query: "SELECT ConnUsed FROM stats_mysql_connection_pool WHERE srv_host='{{{{ ansible_default_ipv4.address }}}}'"
register: pool
until: pool.query_result[0].ConnUsed | int == 0
retries: 30
delay: 5
delegate_to: localhost
- name: Restart MySQL
ansible.builtin.systemd:
name: mysqld
state: restarted
- name: Wait for MySQL to be reachable
ansible.builtin.wait_for:
port: 3306
timeout: 60
- name: Wait for replication to catch up
community.mysql.mysql_query:
query: "SHOW SLAVE STATUS"
login_user: root
login_password: "{{{{ vault_mysql_root_password }}}}"
register: status
until: (status.query_result[0][0].Seconds_Behind_Master | int) < 5
retries: 30
delay: 10
- name: Re-add to ProxySQL
community.proxysql.proxysql_mysql_servers:
login_host: "{{{{ proxysql_admin_host }}}}"
login_user: admin
login_password: "{{{{ vault_proxysql_admin_password }}}}"
hostgroup_id: 20
hostname: "{{{{ ansible_default_ipv4.address }}}}"
status: ONLINE
state: present
delegate_to: localhost
For tasks that should fire exactly once across the whole inventory (registering a
shard, running a database migration, sending a notification), use run_once:
---
- hosts: mongo
tasks:
- name: Initialise the replica set (only on one host)
community.mongodb.mongodb_replicaset:
replica_set: rs0
members:
- "{{{{ groups['mongo'][0] }}}}:27017"
- "{{{{ groups['mongo'][1] }}}}:27017"
- "{{{{ groups['mongo'][2] }}}}:27017"
run_once: true # ← critical: only one host runs this
delegate_to: "{{{{ groups['mongo'][0] }}}}"
- name: Apply schema migration (only once across all webservers)
ansible.builtin.command: |
liquibase --changeLogFile=db/changelog.xml update
run_once: true
delegate_to: "{{{{ groups['webservers'][0] }}}}"
delegate_to redirects a task to run on a specific host while keeping the
play's host context. Common uses: cloud API calls, central log writes, load-balancer
manipulations:
---
- hosts: webservers
serial: 1
tasks:
# 1. Call AWS API from localhost (control node), not from each web host
- name: Update ELB target health (run from control node)
community.aws.elb_target:
target_group_arn: "{{{{ tg_arn }}}}"
target_id: "{{{{ ansible_ec2_instance_id }}}}"
deregister: true
delegate_to: localhost
# 2. Push notification to slack (run from one notifier host)
- name: Notify deploy started
ansible.builtin.uri:
url: "{{{{ slack_webhook }}}}"
method: POST
body_format: json
body: {{ text: "Deploying {{ inventory_hostname }}" }}
delegate_to: notifier.example.com
# 3. Restart the actual host
- name: Restart application
ansible.builtin.systemd:
name: myapp
state: restarted
# no delegate_to → runs on the inventory host
# 4. Re-register with ELB
- name: Re-add to ELB
community.aws.elb_target:
target_group_arn: "{{{{ tg_arn }}}}"
target_id: "{{{{ ansible_ec2_instance_id }}}}"
deregister: false
delegate_to: localhost
By default, when one host fails, the others continue. With any_errors_fatal,
any single host's failure aborts the play for everyone:
---
- hosts: galera_cluster
any_errors_fatal: true # one node failing = abort entire play
tasks:
- name: Apply Galera config update
ansible.builtin.template:
src: wsrep.cnf.j2
dest: /etc/my.cnf.d/wsrep.cnf
notify: restart galera
# Use case: cluster ops where partial application breaks the cluster.
# If one node's config push fails, you don't want to restart only the
# other 2 — that would split-brain the cluster.
---
- name: Canary phase
hosts: webservers
serial: 1
max_fail_percentage: 0
tasks:
- name: Deploy new release on canary
ansible.builtin.include_role:
name: deploy_app
when: inventory_hostname == groups['webservers'][0]
- name: Wait 10 minutes to soak
ansible.builtin.pause:
minutes: 10
when: inventory_hostname == groups['webservers'][0]
delegate_to: localhost
run_once: true
- name: Roll out to everyone else
hosts: webservers:!canary
serial: "25%"
max_fail_percentage: 25
tasks:
- name: Deploy new release
ansible.builtin.include_role:
name: deploy_app
You've now covered everything Ansible offers beyond the basics: custom Python plugins, performance tuning to 5–10× speedup, multi-environment discipline with per-env vault passwords, and rolling-update orchestration with circuit breakers and canary phases.
Next — Section 8: Operations and CI/CD (3 pages, 63–65), the final section. Ansible Tower / AWX, integrating Ansible into GitLab CI / GitHub Actions, and testing playbooks with Molecule + ansible-lint.