Ansible OEL 8 DevOps · OEL 8 · Advanced

Ansible — Rolling Updates & Serial Execution

The keywords that turn a 50-host playbook from 'all hosts in parallel' into 'one host at a time, with health checks between each' — serial, max_fail_percentage, run_once, delegate_to, and the rolling-restart pattern for a database cluster.

Why Rolling Matters

The default Ansible execution: every host runs every task, in roughly parallel batches of forks. That's fine for stateless installs but a disaster for stateful systems. Restarting all 6 MySQL replicas at once = 6 minutes of read-traffic downtime. Restarting them one at a time, with replica-lag checks between each = zero downtime.

serial — Process Hosts in Batches

The serial keyword overrides Ansible's normal "all hosts at once" behaviour. It accepts a single number, a percentage, or a list for tapered rollouts:

YAML — serial in three forms

---
# Form 1: one host at a time (safest, slowest)
- hosts: mysql_replica
  serial: 1
  tasks:
    - name: Apply config and restart
      ansible.builtin.template:
        src: my.cnf.j2
        dest: /etc/my.cnf
      notify: restart mysql

# Form 2: percentage of fleet at a time (scales with size)
- hosts: webservers
  serial: "25%"      # 25% of fleet per batch
  tasks:
    - name: Deploy new release
      ansible.builtin.include_role:
        name: deploy_app

# Form 3: tapered rollout — 1 host first, then 3, then everyone
- hosts: api_servers
  serial:
    - 1              # canary: just one host
    - 3              # then three
    - "100%"         # then the rest
  tasks:
    - name: Apply patch
      ansible.builtin.dnf:
        name: "*"
        state: latest

max_fail_percentage — Circuit Breaker

By default, Ansible continues to the next batch even if the current one had failures. For risky changes, set a threshold — if more than N% of the batch fails, the play aborts before the next batch:

YAML — max_fail_percentage

---
- hosts: mysql_replica
  serial: 2
  max_fail_percentage: 25     # if >25% of a batch fails, stop
  tasks:
    - name: Restart mysql
      ansible.builtin.systemd:
        name: mysqld
        state: restarted

    - name: Wait for replication to catch up
      community.mysql.mysql_query:
        query: "SHOW SLAVE STATUS"
        login_user: root
        login_password: "{{{{ vault_mysql_root_password }}}}"
      register: status
      until: (status.query_result[0][0].Seconds_Behind_Master | int) < 5
      retries: 30
      delay: 10

# With serial: 2 + max_fail_percentage: 25:
# - If 1 host of 2 in a batch fails (50%), play aborts
# - If 0 of 2 fail, continue to next batch

💡 Tip: The default failure behaviour is fail-and-continue per host. max_fail_percentage is the only way to stop a half-broken rollout from continuing into the next batch and breaking that one too.

Real Pattern — Rolling MySQL Restart

YAML — rolling-restart-mysql.yml

---
- name: Rolling MySQL restart, one replica at a time
  hosts: mysql_replica
  serial: 1
  max_fail_percentage: 0      # zero tolerance — abort on any failure
  become: true
  tasks:
    - name: Drain this host from ProxySQL
      community.proxysql.proxysql_mysql_servers:
        login_host: "{{{{ proxysql_admin_host }}}}"
        login_user: admin
        login_password: "{{{{ vault_proxysql_admin_password }}}}"
        hostgroup_id: 20      # reader hostgroup
        hostname: "{{{{ ansible_default_ipv4.address }}}}"
        status: OFFLINE_SOFT
        state: present
      delegate_to: localhost

    - name: Wait for active queries to drain
      community.proxysql.proxysql_query:
        login_host: "{{{{ proxysql_admin_host }}}}"
        login_user: admin
        login_password: "{{{{ vault_proxysql_admin_password }}}}"
        query: "SELECT ConnUsed FROM stats_mysql_connection_pool WHERE srv_host='{{{{ ansible_default_ipv4.address }}}}'"
      register: pool
      until: pool.query_result[0].ConnUsed | int == 0
      retries: 30
      delay: 5
      delegate_to: localhost

    - name: Restart MySQL
      ansible.builtin.systemd:
        name: mysqld
        state: restarted

    - name: Wait for MySQL to be reachable
      ansible.builtin.wait_for:
        port: 3306
        timeout: 60

    - name: Wait for replication to catch up
      community.mysql.mysql_query:
        query: "SHOW SLAVE STATUS"
        login_user: root
        login_password: "{{{{ vault_mysql_root_password }}}}"
      register: status
      until: (status.query_result[0][0].Seconds_Behind_Master | int) < 5
      retries: 30
      delay: 10

    - name: Re-add to ProxySQL
      community.proxysql.proxysql_mysql_servers:
        login_host: "{{{{ proxysql_admin_host }}}}"
        login_user: admin
        login_password: "{{{{ vault_proxysql_admin_password }}}}"
        hostgroup_id: 20
        hostname: "{{{{ ansible_default_ipv4.address }}}}"
        status: ONLINE
        state: present
      delegate_to: localhost

run_once — Run Just on One Host

For tasks that should fire exactly once across the whole inventory (registering a shard, running a database migration, sending a notification), use run_once:

YAML — run_once

---
- hosts: mongo
  tasks:
    - name: Initialise the replica set (only on one host)
      community.mongodb.mongodb_replicaset:
        replica_set: rs0
        members:
          - "{{{{ groups['mongo'][0] }}}}:27017"
          - "{{{{ groups['mongo'][1] }}}}:27017"
          - "{{{{ groups['mongo'][2] }}}}:27017"
      run_once: true              # ← critical: only one host runs this
      delegate_to: "{{{{ groups['mongo'][0] }}}}"

    - name: Apply schema migration (only once across all webservers)
      ansible.builtin.command: |
        liquibase --changeLogFile=db/changelog.xml update
      run_once: true
      delegate_to: "{{{{ groups['webservers'][0] }}}}"

delegate_to — Run on a Different Host

delegate_to redirects a task to run on a specific host while keeping the play's host context. Common uses: cloud API calls, central log writes, load-balancer manipulations:

YAML — delegate_to patterns

---
- hosts: webservers
  serial: 1
  tasks:
    # 1. Call AWS API from localhost (control node), not from each web host
    - name: Update ELB target health (run from control node)
      community.aws.elb_target:
        target_group_arn: "{{{{ tg_arn }}}}"
        target_id: "{{{{ ansible_ec2_instance_id }}}}"
        deregister: true
      delegate_to: localhost

    # 2. Push notification to slack (run from one notifier host)
    - name: Notify deploy started
      ansible.builtin.uri:
        url: "{{{{ slack_webhook }}}}"
        method: POST
        body_format: json
        body: {{ text: "Deploying {{ inventory_hostname }}" }}
      delegate_to: notifier.example.com

    # 3. Restart the actual host
    - name: Restart application
      ansible.builtin.systemd:
        name: myapp
        state: restarted
      # no delegate_to → runs on the inventory host

    # 4. Re-register with ELB
    - name: Re-add to ELB
      community.aws.elb_target:
        target_group_arn: "{{{{ tg_arn }}}}"
        target_id: "{{{{ ansible_ec2_instance_id }}}}"
        deregister: false
      delegate_to: localhost

any_errors_fatal — All-or-Nothing

By default, when one host fails, the others continue. With any_errors_fatal, any single host's failure aborts the play for everyone:

YAML — any_errors_fatal

---
- hosts: galera_cluster
  any_errors_fatal: true       # one node failing = abort entire play
  tasks:
    - name: Apply Galera config update
      ansible.builtin.template:
        src: wsrep.cnf.j2
        dest: /etc/my.cnf.d/wsrep.cnf
      notify: restart galera

# Use case: cluster ops where partial application breaks the cluster.
# If one node's config push fails, you don't want to restart only the
# other 2 — that would split-brain the cluster.

Putting It All Together — Canary Deploys

YAML — canary deploy: one host, soak, then the rest

---
- name: Canary phase
  hosts: webservers
  serial: 1
  max_fail_percentage: 0
  tasks:
    - name: Deploy new release on canary
      ansible.builtin.include_role:
        name: deploy_app
      when: inventory_hostname == groups['webservers'][0]

    - name: Wait 10 minutes to soak
      ansible.builtin.pause:
        minutes: 10
      when: inventory_hostname == groups['webservers'][0]
      delegate_to: localhost
      run_once: true

- name: Roll out to everyone else
  hosts: webservers:!canary
  serial: "25%"
  max_fail_percentage: 25
  tasks:
    - name: Deploy new release
      ansible.builtin.include_role:
        name: deploy_app

✅ Tip: End of Section 7. With these orchestration tools you can roll out database changes across 50-node clusters with zero downtime, run migrations exactly once, drain hosts from load balancers safely, and abort entire plays the moment something looks wrong.

End of Section 7 — Advanced

You've now covered everything Ansible offers beyond the basics: custom Python plugins, performance tuning to 5–10× speedup, multi-environment discipline with per-env vault passwords, and rolling-update orchestration with circuit breakers and canary phases.

Next — Section 8: Operations and CI/CD (3 pages, 63–65), the final section. Ansible Tower / AWX, integrating Ansible into GitLab CI / GitHub Actions, and testing playbooks with Molecule + ansible-lint.