Ajitabh Pandey's Soul & Syntax

Exploring systems, souls, and stories – one post at a time

Category: SysAdmin

  • Solving Ansible’s Flat Namespace Problem Efficiently

    In Ansible, the “Flat Namespace” problem is a frequent stumbling block for engineers managing multi-tier environments. It occurs because Ansible merges variables from various sources (global, group, and host) into a single pool for the current execution context.

    If you aren’t careful, trying to use a variable meant for “Group A” while executing tasks on “Group B” will cause the play to crash because that variable simply doesn’t exist in Group B’s scope.

    The Scenario: The “Mixed Fleet” Crash

    Imagine you are managing a fleet of Web Servers (running on port 8080) and Database Servers (running on port 5432). You want a single “Security” play to validate that the application port is open in the firewall.

    The Failing Code:

    - name: Apply Security Rules
    hosts: web:database
    vars:
    # This is the "Flat Namespace" trap!
    # Ansible tries to resolve BOTH variables for every host.
    app_port_map:
    web_servers: "{{ web_custom_port }}"
    db_servers: "{{ db_instance_port }}"

    tasks:
    - name: Validate port is defined
    ansible.builtin.assert:
    that: app_port_map[group_names[0]] is defined

    This code fails when Ansible runs this for a web_server, it looks at app_port_map. To build that dictionary, it must resolve db_instance_port. But since the host is a web server, the database group variables aren’t loaded. Result: fatal: 'db_instance_port' is undefined.

    Solution 1: The “Lazy” Logic

    By using Jinja2 whitespace control and conditional logic, we prevent Ansible from ever looking at the missing variable. It only evaluates the branch that matches the host’s group.

    - name: Apply Security Rules
    hosts: app_servers:storage_servers
    vars:
    # Use whitespace-controlled Jinja to isolate variable calls
    target_port: >-
    {%- if 'app_servers' in group_names -%}
    {{ app_service_port }}
    {%- elif 'storage_servers' in group_names -%}
    {{ storage_backend_port }}
    {%- else -%}
    22
    {%- endif -%}

    tasks:
    - name: Ensure port is allowed in firewall
    community.general.ufw:
    rule: allow
    port: "{{ target_port | int }}"

    The advantage of this approach is that it’s very explicit, prevents “Undefined Variable” errors entirely, and allows for easy defaults. However, it can become verbose/messy if you have a large number of different groups.

    Solution 2: The hostvars Lookup

    If you don’t want a giant if/else block, you can use hostvars to dynamically grab a value, but you must provide a default to keep the namespace “safe.”

    - name: Validate ports
    hosts: all
    tasks:
    - name: Check port connectivity
    ansible.builtin.wait_for:
    port: "{{ vars[group_names[0] + '_port'] | default(22) }}"
    timeout: 5

    This approach is very compact and follows a naming convention (e.g., groupname_port). But its harder to debug and relies on strict variable naming across your entire inventory.

    Solution 3: Group Variable Normalization

    The most “architecturally sound” way to solve the flat namespace problem is to use the same variable name across different group_vars files.

    # inventory/group_vars/web_servers.yml
    service_port: 80
    # inventory/group_vars/db_servers.yml
    service_port: 5432
    # Playbook - main.yml
    ---
    - name: Unified Firewall Play
    hosts: all
    tasks:
    - name: Open service port
    community.general.ufw:
    port: "{{ service_port }}" # No logic needed!
    rule: allow

    This is the cleanest playbook code; truly “Ansible-native” way of handling polymorphism but it requires refactoring your existing variable names and can be confusing if you need to see both ports at once (e.g., in a Load Balancer config).

    The “Flat Namespace” problem is really just a symptom of Ansible’s strength: it’s trying to make sure everything you’ve defined is valid. I recently solved this problem in a multi-play playbook, which I wrote for Digital Ocean infrastructure provisioning and configuration using the Lazy Logic approach, and I found this to be the best way to bridge the gap between “Group A” and “Group B” without forcing a massive inventory refactor. While I have generalized the example code, I actually faced this problem in a play that set up the host-level firewall based on dynamic inventory.

  • From /etc/hosts to 127.0.0.53: A Sysadmin’s View on DNS Resolution

    If you’ve been managing systems since the days of AT&T Unix System V Release 3 (SVR3), you remember when networking was a manual affair. Name resolution often meant a massive, hand-curated /etc/hosts file and a prayer.

    As the Domain Name System (DNS) matured, the standard consolidated around a single, universally understood text file: /etc/resolv.conf. For decades, that file served us well. But the requirements of modern, dynamic networking, involving laptops hopping Wi-Fi SSIDs, complex VPN split-tunnels, and DNSSEC validation, forced a massive architectural shift in the Linux world, most notably in the form of systemd-resolved.

    Let’s walk through history, with hands-on examples, to see how we got here.

    AT&T SVR3: The Pre-DNS Era

    Released around 1987-88, SVR3 was still rooted in the hosts file model. The networking stacks were primitive, and TCP/IP was available but not always bundled. I still remember that around 1996-97, I used to install AT&T SVR3 version 4.2 using multiple 5.25-inch DSDD floppy disks, then, after installation, use another set of disks to install the TCP/IP stack. DNS support was not native, and we relied on /etc/hosts for hostname resolution. By SVR3.2, AT&T started shipping optional resolver libraries, but these were not standardized.

    # Example /etc/hosts file on SVR3
    127.0.0.1 localhost
    192.168.1.10 svr3box svr3box.local

    If DNS libraries were installed, /etc/resolv.conf could be used:

    # /etc/resolv.conf available when DNS libraries were installed
    nameserver 192.168.1.1
    domain corp.example.com

    dig did not exists then, and we used to use nslookup.

    nslookup svr3box
    Server: 192.168.1.1
    Address: 192.168.1.1#53

    Name: svr3box.corp.example.com
    Address: 192.168.1.10

    Solaris Bridging Classical and Modern

    When I was introduced to Sun Solaris around 2003-2005, I realized that DNS resolution was very well structured (at least compared to the SVR3 systems I had worked on earlier). Mostly, I remember working on Solaris 8 (with a few older SunOS 5.x systems). These systems required both /etc/resolv.conf and /etc/nsswitch.conf

    # /etc/nsswitch.conf
    hosts: files dns nis

    This /etc/nsswitch.conf had only the job of instructing the libc C library to look in /etc/hosts, then DNS, and then NIS. Of course, you can change the sequence.

    The /etc/resolv.conf defined the nameservers –

    nameserver 8.8.8.8
    nameserver 1.1.1.1
    search corp.example.com

    Solaris 11 introduced SMF (Service Management Facility), and this allowed the /etc/resolv.conf to auto-generate based on the SMF profile. Manual edits were discouraged, and we were learning to use:

    svccfg -s dns/client setprop config/nameserver=8.8.8.8
    svcadm refresh dns/client

    For me, this marked the shift from text files to managed services, although I did not work much on these systems.

    BSD Unix: Conservatism and Security

    The BSD philosophy is simplicity, transparency and security-first.

    FreeBSD and NetBSD still rely on /etc/resolv.conf file and the dhclient update the file automatically. This helps in very straightforward debugging.

    cat /etc/resolv.conf
    nameserver 192.168.1.2

    nslookup freebsd.org

    OpenBSD, famous for its “secure by default” stance, includes modern, secure DNS software like unbound in its base installation; its default system resolution behavior remains classical. Unless the OS is explicitly configured to use a local caching daemon, applications on a fresh OpenBSD install still read /etc/resolv.conf and talk directly to external servers. They prioritize a simple, auditable baseline over complex automated magic.

    The Modern Linux Shift

    On modern Linux distributions (Ubuntu 18.04+, Fedora, RHEL 8+, etc.), the old way of simply “echoing” a nameserver into a static /etc/resolv.conf file is effectively dead. The reason for this is that the old model couldn’t handle race conditions. If NetworkManager, a VPN client, and a DHCP client all tried to write to that single file at the same time, the last one to write won.

    In modern linux systems, systemd-resolved acts as a local middleman, a DNS broker that manages configurations from different sources dynamically. The /etc/resolv.conf file is no longer a real file; it’s usually a symbolic link pointing to a file managed by systemd that directs local traffic to a local listener on 127.0.0.53.

    systemd-resolved adds features like –

    • Split-DNS to help route VPN domains seperately.
    • Local-Caching for faster repeated lookups.
    • DNS-over-TLS for encrypted queries.
    ls -l /etc/resolv.conf
    lrwxrwxrwx 1 root root 39 Dec 24 11:00 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

    This complexity buys us features needed for modern mobile computing: per-interface DNS settings, local caching to speed up browsing, and seamless VPN integration.

    The modern linux systems uses dig and resolvectl for diagnostics:

    $ dig @127.0.0.53 example.com

    ; <<>> DiG 9.16.50-Raspbian <<>> @127.0.0.53 example.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17367
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 1232
    ;; QUESTION SECTION:
    ;example.com. IN A

    ;; ANSWER SECTION:
    example.com. 268 IN A 104.18.27.120
    example.com. 268 IN A 104.18.26.120

    ;; Query time: 9 msec
    ;; SERVER: 127.0.0.53#53(127.0.0.53)
    ;; WHEN: Wed Dec 24 12:49:43 IST 2025
    ;; MSG SIZE rcvd: 72

    $ resolvectl query example.com
    example.com: 2606:4700::6812:1b78
    2606:4700::6812:1a78
    104.18.27.120
    104.18.26.120

    -- Information acquired via protocol DNS in 88.0ms.
    -- Data is authenticated: no; Data was acquired via local or encrypted transport: no
    -- Data from: network

    Because editing the file directly no longer works reliably, we must use tools that communicate with the systemd-resolved daemon.

    Suppose you want to force your primary ethernet interface (eth0) to bypass DHCP DNS and use Google’s servers temporarily:

    sudo systemd-resolve --set-dns=8.8.8.8 --set-dns=8.8.4.4 --interface=eth0

    To check what is actually happening—seeing which DNS servers are bound to which interface scopes—run:

    systemd-resolve --status

    and to clear the manual overrides and go back to whatever setting DHCP provided:

    sudo systemd-resolve --revert --interface=eth0

    We’ve come a long way from System V R3. While the simplicity of the classical text-file approach is nostalgic for those of us who grew up on it, the dynamic nature of today’s networking requires a smarter local resolver daemon. It adds complexity, but it’s the price we pay for seamless connectivity in a mobile world.