Ansible OEL 8 DevOps · OEL 8 · Advanced

Ansible — Performance Tuning Ansible

When playbooks take 30 minutes against 50 hosts, you're leaving 5× speedup on the table. The five tuning levers, the Mitogen plugin that's almost cheating, fact caching at scale, and the strategy: free pattern that unblocks slow hosts.

Why Ansible Gets Slow

Ansible's defaults are conservative: 5 forks, no SSH multiplexing, no pipelining, gather facts on every run, sync hosts at every task. Against a small lab those defaults are fine. Against 50 production hosts running 200 tasks, you'll watch the playbook chew through 30 minutes of mostly idle SSH overhead.

Lever 1 — Pipelining

Without pipelining, every task does THREE SSH operations: SFTP the module file, SSH to chmod it, SSH again to run it. With pipelining, the module is piped over the existing SSH connection's stdin and runs inline — one SSH op per task instead of three.

INI — ansible.cfg

[ssh_connection]
pipelining = True

⚠ Warning: Requires requiretty to be off in /etc/sudoers on managed nodes. OEL 8 ships with this disabled. RHEL 6 / CentOS 6 nodes need Defaults !requiretty added or pipelining will fail with a cryptic error.

Lever 2 — ControlPersist

OpenSSH can multiplex multiple sessions through one TCP connection. With ControlPersist, the connection stays open between tasks (configurable timeout). One TCP handshake per host for the whole playbook instead of one per task.

INI — ansible.cfg

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=300s -o PreferredAuthentications=publickey
control_path_dir = ~/.ansible/cp

Combined with pipelining, this turns 6 SSH operations per task (3 for SFTP + 3 for new connections) into 1 operation. Free 5× speedup on its own.

Lever 3 — Forks (Parallelism)

Default is 5. That means at most 5 hosts get a task in parallel; the other 45 wait. Bump this to match your control node's resources:

INI — forks scaling

[defaults]
forks = 25     # default 5 → bump for any fleet >10 hosts

Fleet size	Reasonable forks	Constraint
5–20 hosts	10–20	Control node CPU
20–100 hosts	25–50	SSH server `MaxStartups`
100–500 hosts	50–100	Bastion bandwidth
500+ hosts	Use AWX/Tower or split inventories	Single-node Ansible bottlenecks

💡 Tip: Set ANSIBLE_FORKS as an env var instead of editing ansible.cfg if you want a per-run override. ANSIBLE_FORKS=50 ansible-playbook ...

Lever 4 — Fact Caching

The setup module (fact gathering) runs at the start of every play. It's not free — it shells out to dozens of small commands per host. Cache facts to skip this on subsequent runs:

INI — fact caching

[defaults]
gathering = smart                          # use cache if fresh
fact_caching = jsonfile                    # or 'redis' / 'memcached'
fact_caching_connection = ./.fact_cache
fact_caching_timeout = 7200                # 2 hours

Skipping fact gathering on a 50-host playbook saves ~2 minutes. For large fleets, use fact_caching = redis with a real Redis server so all team members and CI share the same cache.

Lever 5 — Strategy: free

Default strategy is linear: every host finishes a task before any host moves to the next. This means a single slow host (network glitch, busy system) stalls the whole fleet.

With strategy: free, each host races independently — host 1 can be on task 50 while host 2 is still on task 30. Total wall-clock time drops to the time of the slowest single host's full run, not the sum.

YAML — strategy: free

---
- hosts: webservers
  strategy: free        # don't sync hosts
  tasks:
    - name: Apply patch
      ansible.builtin.dnf:
        name: "*"
        state: latest
    - name: Reboot
      ansible.builtin.reboot:
        reboot_timeout: 600

⚠ Warning: strategy: free doesn't work when later tasks depend on facts from earlier hosts in the same play (gathered at runtime). For most patching/upgrading work it's safe; for cluster orchestration it's not.

Bonus — Mitogen Plugin

Mitogen is a third-party plugin that replaces Ansible's connection layer with a much faster Python-based one. It runs the remote Python interpreter once per host and reuses it across tasks instead of forking + tearing down per task.

BASH — install Mitogen

pip install mitogen

# Find install path
python3 -c "import mitogen, os; print(os.path.dirname(mitogen.__file__))"
# /home/user/.local/lib/python3.9/site-packages/mitogen

INI — enable Mitogen in ansible.cfg

[defaults]
strategy_plugins = /home/user/.local/lib/python3.9/site-packages/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

# Or for free-style + Mitogen
# strategy = mitogen_free

💡 Tip: Mitogen typically gives 3–7× additional speedup on top of pipelining + ControlPersist. The catch: it's not officially supported by Red Hat and may break on Ansible upgrades. Use it on internal tooling, not on regulated production.

Putting It All Together

INI — production ansible.cfg for fast, large-fleet playbooks

[defaults]
forks = 25
gathering = smart
fact_caching = jsonfile
fact_caching_connection = ./.fact_cache
fact_caching_timeout = 7200
strategy = mitogen_linear            # or 'linear' if Mitogen unavailable
strategy_plugins = ~/.local/lib/python3.9/site-packages/ansible_mitogen/plugins/strategy

# nicer output
stdout_callback = yaml
callbacks_enabled = profile_tasks, timer

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=300s -o PreferredAuthentications=publickey
control_path_dir = ~/.ansible/cp

Profiling — Find Out What's Slow

BASH — profile a playbook to find the slow tasks

ansible-playbook site.yml

# At the end of the run:
PLAY RECAP ******************************************************
Sunday 03 May 2026  10:14:23 +0000 (0:00:00.012)
=================================================================
Install MySQL packages -------------------------------- 142.31s
mysql_upgrade ------------------------------------------- 89.45s
Configure my.cnf --------------------------------------- 24.73s
...

# The first 3 tasks above account for 70% of total runtime.
# Optimise those, ignore the rest.

✅ Tip: With pipelining + ControlPersist + forks=25 + fact caching, a 30-minute playbook routinely drops to 5–6 minutes. Add Mitogen and you're at 2–3 minutes. The control node CPU becomes the bottleneck before the network does.