Ajitabh Pandey's Soul & Syntax

Exploring systems, souls, and stories – one post at a time

Author: Ajitabh

  • When Pi-hole + Unbound Stop Resolving: A DNSSEC Trust Anchor Fix

    I have my own private DNS setup in my home network, powered by Pi-hole running on my very first Raspberry Pi, a humble Model B Rev 2. It’s been quietly handling ad-blocking and DNS resolution for years. But today, something broke.

    I noticed that none of my devices could resolve domain names. Pi-hole’s dashboard looked fine. The DNS service was running, blocking was active, but every query failed. Even direct dig queries returned SERVFAIL. Here’s how I diagnosed and resolved the issue.

    The Setup

    My Pi-hole forwards DNS queries to Unbound, a recursive DNS resolver running locally on port 5335. This is configured in /etc/pihole/setupVars.conf.

    PIHOLE_DNS_1=127.0.0.1#5335
    PIHOLE_DNS_2=127.0.0.1#5335

    And my system’s /etc/resolv.conf points to Pi-hole itself

    nameserver 127.0.0.1

    Unbound is installed with the dns-root-data package, which provides root hints and DNSSEC trust anchors:

    $ dpkg -l dns-root-data|grep ^ii
    ii dns-root-data 2024041801~deb11u1 all DNS root hints and DNSSEC trust anchor

    The Symptoms

    Despite everything appearing normal, DNS resolution failed:

    $ dig google.com @127.0.0.1 -p 5335

    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL

    Even root-level queries failed:

    $ dig . @127.0.0.1 -p 5335

    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL

    Unbound was running and listening:

    $ netstat -tulpn | grep 5335

    tcp 0 0 127.0.0.1:5335 0.0.0.0:* LISTEN 29155/unbound

    And outbound connectivity was fine. I pinged one of the root DNS servers directly to ensure this:

    $ ping -c1 198.41.0.4 
    PING 198.41.0.4 (198.41.0.4) 56(84) bytes of data.
    64 bytes from 198.41.0.4: icmp_seq=1 ttl=51 time=206 ms

    --- 198.41.0.4 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 205.615/205.615/205.615/0.000 ms

    The Diagnosis

    At this point, I suspected a DNSSEC validation failure. Unbound uses a trust anchor, which is simply a cryptographic key stored in root.key. This cryptographic key is used to verify the authenticity of DNS responses. Think of it like a passport authority: when you travel internationally, border agents trust your passport because it was issued by a recognized authority. Similarly, DNSSEC relies on a trusted key at the root of the DNS hierarchy to validate every response down the chain. If that key is missing, expired, or corrupted, Unbound can’t verify the authenticity of DNS data — and like a border agent rejecting an unverified passport, it simply refuses to answer, returning SERVFAIL.

    Even though dns-root-data was installed, the trust anchor wasn’t working.

    The Fix

    I regenerated the trust anchor manually:

    $ sudo rm /usr/share/dns/root.key
    $ sudo unbound-anchor -a /usr/share/dns/root.key
    $ sudo systemctl restart unbound

    After this, Unbound started resolving again:

    $ dig google.com @127.0.0.1 -p 5335

    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR
    ;; ANSWER SECTION:
    google.com. 300 IN A 142.250.195.78

    Why This Happens

    Even with dns-root-data, the trust anchor could become stale — especially if the system missed a rollover event or the file was never initialized. Unbound doesn’t log this clearly, so it’s easy to miss.

    Preventing Future Failures

    To avoid this in the future, I added a weekly cron job to refresh the trust anchor:

    0 3 * * 0 /usr/sbin/unbound-anchor -a /usr/share/dns/root.key

    And a watchdog script to monitor Unbound health:

    $ dig . @127.0.0.1 -p 5335 | grep -q 'status: NOERROR' || systemctl restart unbound

    This was a good reminder that even quiet systems need occasional maintenance. Pi-hole and Unbound are powerful together, but DNSSEC adds complexity. If you’re running a similar setup, keep an eye on your trust anchors, and don’t trust the dashboard alone.

  • Book Review: Shooting Straight: A Military Biography of Lt Gen. Rostum K. Nanavatty by Arjun Subramaniam

    Shooting Straight is more than just a recounting of battles; it is a meticulously researched biography of an exemplary soldier and leader of the Indian Army, Lt Gen. Rostum Kaikhushru Nanavatty. Written by accomplished military historian Arjun Subramaniam, the book aims to capture the essence of soldiering, command, and leadership across five turbulent decades, from the 1960s to the 2000s.

    The book’s subject, Lt Gen. Nanavatty, emerges from the pages as a paragon of the Indian Army—a figure defined by integrity, discipline, and unwavering dedication to the nation. His career was marked by action across key operational areas, including Nagaland, Sri Lanka, Siachen, and Baramulla, establishing him as a decorated and accomplished infantry officer.

    Subramaniam excels in presenting a balanced and comprehensive portrait. The biography not only covers the operational acumen and intellectual brilliance for which Gen. Nanavatty was known but also masterfully integrates his personal life with his professional ups and downs. The writing is simple, engaging, and always provides the necessary context for the reader, ensuring the narrative is accessible even to those not deeply familiar with India’s military history.

    One of the most valuable aspects of Shooting Straight is the wealth of primary source material incorporated into the narrative. Approximately 30% of the book is dedicated to various references and detailed notes taken by Gen. Nanavatty throughout his career. This inclusion offers readers an invaluable, first-hand perspective on contemporary warfare, counterinsurgency, high-altitude operations, and the overall landscape of the Indian Army from the view of one of its most respected commanders.

    The biography truly shines in its depiction of a leader who was unafraid to “speak ‘truth to power.’” By offering insights into every facet of his challenges and triumphs, the book stands as a testament to his resilience and profound commitment.

    For anyone seeking a deep and engaging look at modern Indian military history, command structure, and the qualities of exceptional leadership, Shooting Straight is an essential read. It’s a compelling portrait of a life dedicated to service, captured with skill and objectivity by a masterful historian.

  • Book Review: Marshal Arjan Singh, DFC Life and Times by Group Captain Ranbir Singh (Retd)

    This brief biography is presented as a tribute to one of the most distinguished and legendary figures in the Indian Air Force (IAF). The goal is clear: to chronicle the extraordinary life, remarkable achievements, and lasting impact of Marshal Arjan Singh.

    However, the book takes an interesting narrative turn, evolving into something broader than a focused personal biography.

    While the expectation was a narrative centered on the Marshal’s personal journey from his humble beginnings to becoming the Chief of the Air Staff and the Marshal of the Indian Air Force, the book dedicates considerable time to the broader history of the IAF.

    Specifically, the narrative delves deeply into the history of the No. 1 Squadron, covering its challenges and triumphs both before and after India’s independence, and exploring the wider historical context faced by the Air Force during that formative period. While this material is historically valuable and rich in context, it results in Marshal Arjan Singh himself appearing only intermittently throughout the chapters.

    A Valuable Historical Resource

    For readers already familiar with the history of the Indian Armed Forces, the book may not unearth a wealth of completely new information. However, it does shed light on certain lesser-known details regarding the early days of the IAF and the Indian Navy, military branches whose histories are often less thoroughly documented than that of the Indian Army.

    Ultimately, the book shines as a significant historical resource.

    • For the general reader, it offers a comprehensive and engaging look at the formative years of the Indian Air Force and its subsequent evolution.
    • It highlights the strategic acumen and wartime leadership that shaped the IAF’s future, aligning with the description’s promise of historical depth.

    The meticulous research undertaken by Group Captain Ranbir Singh (Retd.) pays tribute not just to the Marshal Arjan Singh, but to the entire era of the early IAF. Readers looking exclusively for an in-depth, personal biography of Marshal Arjan Singh may find the focus slightly diluted. But for those interested in a detailed account of the history and challenges of the Indian Air Force as viewed through the context of the Marshal’s career, this book is an important and insightful contribution. It captures the spirit of unwavering commitment and indomitable spirit that defined both the man and the institution he led.

  • From Cloud Abstraction to Bare Metal Reality: Understanding the Foundation of Hyperscale Infrastructure

    In today’s cloud-centric world, where virtual machines and containers seem to materialize on demand, it’s easy to overlook the physical infrastructure that makes it all possible. For the new generation of engineers, a deeper understanding of what it takes to build and manage the massive fleets of physical machines that host our virtualized environments is becoming increasingly critical. While the cloud offers abstraction and on-demand scaling, the reality is that millions of physical servers, networked and orchestrated with precision, form the bedrock of these seemingly limitless resources. One of the key technologies that enables the rapid provisioning of these servers is the Preboot Execution Environment (PXE).

    Unattended Setups and Network Booting: An Introduction to PXE

    PXE provides a standardized environment for computers to boot directly from a network interface, independent of any local storage devices or operating systems. This capability is fundamental for achieving unattended installations on a massive scale. The PXE boot process is a series of network interactions that allow a bare-metal machine to discover boot servers, download an initial program into its memory, and begin the installation or recovery process.

    The Technical Details of How PXE Works

    The PXE boot process is a series of choreographed steps involving several key components and network protocols:

    Discovery

    When a PXE-enabled computer is powered on, its firmware broadcasts a special DHCPDISCOVER packet that is extended with PXE-specific options. This packet is sent to port 67/UDP, the standard DHCP server port.

    Proxy DHCP

    A PXE redirection service (or Proxy DHCP) is a key component. If a Proxy DHCP receives an extended DHCPDISCOVER, it responds with an extended DHCPOFFER packet, which is broadcast to port 68/UDP. This offer contains critical information, including:

    • A PXE Discovery Control field to determine if the client should use Multicasting, Broadcasting, or Unicasting to contact boot servers.
    • A list of IP addresses for available PXE Boot Servers.
    • A PXE Boot Menu with options for different boot server types.
    • A PXE Boot Prompt (e.g., “Press F8 for boot menu”) and a timeout.
    • The Proxy DHCP service can run on the same host as a standard DHCP service but on a different port (4011/UDP) to avoid conflicts.

    Boot Server Interaction

    The PXE client, now aware of its boot server options, chooses a boot server and sends an extended DHCPREQUEST packet, typically to port 4011/UDP or broadcasting to 67/UDP. This request specifies the desired PXE Boot Server Type.

    Acknowledgement

    The PXE Boot Server, if configured for the client’s requested boot type, responds with an extended DHCPACK. This packet is crucial as it contains the complete file path for the Network Bootstrap Program (NBP) to be downloaded via TFTP (Trivial File Transfer Protocol).

    Execution

    The client downloads the NBP into its RAM using TFTP. Once downloaded and verified, the PXE firmware executes the NBP. The functions of the NBP are not defined by the PXE specification, allowing it to perform various tasks, from presenting a boot menu to initiating a fully automated operating system installation.

      The Role of PXE in Modern Hyperscale Infrastructure

      While PXE has existed for years, its importance in the era of hyperscale cloud computing is greater than ever. In environments where millions of physical machines need to be deployed and managed, PXE is the first and most critical step in an automated provisioning pipeline. It enables:

      • Rapid Provisioning: Automating the initial boot process allows cloud providers to provision thousands of new servers simultaneously, dramatically reducing deployment time.
      • Standardized Deployment: PXE ensures a consistent starting point for every machine, allowing for standardized operating system images and configurations to be applied fleet-wide.
      • Remote Management and Recovery: PXE provides a reliable way to boot machines into diagnostic or recovery environments without requiring physical access, which is essential for managing geographically distributed data centers.

      Connecting the Virtual to the Physica

      For new engineers, understanding the role of technologies like PXE bridges the gap between the virtual world of cloud computing and the bare-metal reality of the hardware that supports it. This knowledge is not just historical; it is a foundation for:

      • Designing Resilient Systems: Understanding the underlying infrastructure informs the design of more scalable and fault-tolerant cloud-native applications.
      • Effective Troubleshooting: When issues arise in a virtualized environment, knowing the physical layer can be crucial for diagnosing and resolving problems.
      • Building Infrastructure as Code: The principles of automating physical infrastructure deployment are directly applicable to the modern practice of Infrastructure as Code (IaC).

      By appreciating the intricacies of building and managing the physical infrastructure, engineers can build more robust, efficient, and truly cloud-native solutions, ensuring they have a complete picture of the technology stack from the bare metal to the application layer.

    1. Why Systemd Timers Outshine Cron Jobs

      For decades, cron has been the trusty workhorse for scheduling tasks on Linux systems. Need to run a backup script daily? cron was your go-to. But as modern systems evolve and demand more robust, flexible, and integrated solutions, systemd timers have emerged as a superior alternative. Let’s roll up our sleeves and dive into the strategic advantages of systemd timers, then walk through their design and implementation..

      Why Ditch Cron? The Strategic Imperative

      While cron is simple and widely understood, it comes with several inherent limitations that can become problematic in complex or production environments:

      • Limited Visibility and Logging: cron offers basic logging (often just mail notifications) and lacks a centralized way to check job status or output. Debugging failures can be a nightmare.
      • No Dependency Management: cron jobs are isolated. There’s no built-in way to ensure one task runs only after another has successfully completed, leading to potential race conditions or incomplete operations.
      • Missed Executions on Downtime: If a system is off during a scheduled cron run, that execution is simply missed. This is critical for tasks like backups or data synchronization.
      • Environment Inconsistencies: cron jobs run in a minimal environment, often leading to issues with PATH variables or other environmental dependencies that work fine when run manually.
      • No Event-Based Triggering: cron is purely time-based. It cannot react to system events like network availability, disk mounts, or the completion of other services.
      • Concurrency Issues: cron doesn’t inherently prevent multiple instances of the same job from running concurrently, which can lead to resource contention or data corruption.

      systemd timers, on the other hand, address these limitations by leveraging the full power of the systemd init system. (We’ll dive deeper into the intricacies of the systemd init system itself in a future post!)

      • Integrated Logging with Journalctl: All output and status information from systemd timer-triggered services are meticulously logged in the systemd journal, making debugging and monitoring significantly easier (journalctl -u your-service.service).
      • Robust Dependency Management: systemd allows you to define intricate dependencies between services. A timer can trigger a service that requires another service to be active, ensuring proper execution order.
      • Persistent Timers (Missed Job Handling): With the Persistent=true option, systemd timers will execute a missed job immediately upon system boot, ensuring critical tasks are never truly skipped.
      • Consistent Execution Environment: systemd services run in a well-defined environment, reducing surprises due to differing PATH or other variables. You can explicitly set environment variables within the service unit.
      • Flexible Triggering Mechanisms: Beyond simple calendar-based schedules (like cron), systemd timers support monotonic timers (e.g., “5 minutes after boot”) and can be combined with other systemd unit types for event-driven automation.
      • Concurrency Control: systemd inherently manages service states, preventing multiple instances of the same service from running simultaneously unless explicitly configured to do so.
      • Granular Control: Timers offer second-resolution scheduling (with AccuracySec=1us), allowing for much more precise control than cron‘s minute-level resolution.
      • Randomized Delays: RandomizedDelaySec can be used to prevent “thundering herd” issues where many timers configured for the same time might all fire simultaneously, potentially overwhelming the system.

      Designing Your Systemd Timers: A Two-Part Harmony

      systemd timers operate in a symbiotic relationship with systemd service units. You typically create two files for each scheduled task:

      1. A Service Unit (.service file): This defines what you want to run (e.g., a script, a command).
      2. A Timer Unit (.timer file): This defines when you want the service to run.

      Both files are usually placed in /etc/systemd/system/ for system-wide timers or ~/.config/systemd/user/ for user-specific timers.

      The Service Unit (your-task.service)

      This file is a standard systemd service unit. A basic example:

      [Unit]
      Description=My Daily Backup Service
      Wants=network-online.target # Optional: Ensure network is up before running
      
      [Service]
      Type=oneshot # For scripts that run and exit
      ExecStart=/usr/local/bin/backup-script.sh # The script to execute
      User=youruser # Run as a specific user (optional, but good practice)
      Group=yourgroup # Run as a specific group (optional)
      # Environment="PATH=/usr/local/bin:/usr/bin:/bin" # Example: set a custom PATH
      
      [Install]
      WantedBy=multi-user.target # Not strictly necessary for timers, but good for direct invocation
      

      Strategic Design Considerations for Service Units:

      • Type=oneshot: Ideal for scripts that perform a task and then exit.
      • ExecStart: Always use absolute paths for your scripts and commands to avoid environment-related issues.
      • User and Group: Run services with the least necessary privileges. This enhances security.
      • Dependencies (Wants, Requires, After, Before): Leverage systemd‘s powerful dependency management. For example, Wants=network-online.target ensures the network is active before the service starts.
      • Error Handling within Script: While systemd provides good logging, your scripts should still include robust error handling and exit with non-zero status codes on failure.
      • Output: Direct script output to stdout or stderr. journald will capture it automatically. Avoid sending emails directly from the script unless absolutely necessary; systemd‘s logging is usually sufficient.

      The Timer Unit (your-task.timer)

      This file defines the schedule for your service.

      [Unit]
      Description=Timer for My Daily Backup Service
      Requires=your-task.service # Ensure the service unit is loaded
      After=your-task.service # Start the timer after the service is defined
      
      [Timer]
      OnCalendar=daily # Run every day at midnight (default for 'daily')
      # OnCalendar=*-*-* 03:00:00 # Run every day at 3 AM
      # OnCalendar=Mon..Fri 18:00:00 # Run weekdays at 6 PM
      # OnBootSec=5min # Run 5 minutes after boot
      Persistent=true # If the system is off, run immediately on next boot
      RandomizedDelaySec=300 # Add up to 5 minutes of random delay to prevent stampedes
      
      [Install]
      WantedBy=timers.target # Essential for the timer to be enabled at boot
      

      Strategic Design Considerations for Timer Units:

      • OnCalendar: This is your primary scheduling mechanism. systemd offers a highly flexible calendar syntax (refer to man systemd.time for full details). Use systemd-analyze calendar "your-schedule" to test your expressions.
      • OnBootSec: Useful for tasks that need to run a certain duration after the system starts, regardless of the calendar date.
      • Persistent=true: Crucial for reliability! This ensures your task runs even if the system was powered off during its scheduled execution time. The task will execute once systemd comes back online.
      • RandomizedDelaySec: A best practice for production systems, especially if you have many timers. This spreads out the execution of jobs that might otherwise all start at the exact same moment.
      • AccuracySec: Defaults to 1 minute. Set to 1us for second-level precision if needed (though 1s is usually sufficient).
      • Unit: This explicitly links the timer to its corresponding service unit.
      • WantedBy=timers.target: This ensures your timer is enabled and started automatically when the system boots.

      Implementation and Management

      1. Create the files: Place your .service and .timer files in /etc/systemd/system/.
      2. Reload systemd daemon: After creating or modifying unit files: sudo systemctl daemon-reload
      3. Enable the timer: This creates a symlink so the timer starts at boot: sudo systemctl enable your-task.timer
      4. Start the timer: This activates the timer for the current session: sudo systemctl start your-task.timer
      5. Check status: sudo systemctl status your-task.timer; sudo systemctl status your-task.service
      6. View logs: journalctl -u your-task.service
      7. Manually trigger the service (for testing): sudo systemctl start your-task.service

      Conclusion

      While cron served its purpose admirably for many years, systemd timers offer a modern, robust, and integrated solution for scheduling tasks on Linux systems. By embracing systemd timers, you gain superior logging, dependency management, missed-job handling, and greater flexibility, leading to more reliable and maintainable automation. It’s a strategic upgrade that pays dividends in system stability and ease of troubleshooting. Make the switch and experience the power of a truly systemd-native approach to scheduled tasks.