Ajitabh Pandey's Soul & Syntax

Exploring systems, souls, and stories – one post at a time

Tag: HLS

  • Lessons from Running a Live Streaming Setup for More than 7 Years 7 Years

    After seven years of managing high-traffic live streams, you learn that the biggest challenges aren’t usually the video codecs—they are the “invisible” layers: filesystem synchronization, HTTP header inheritance, and metadata consistency.

    When you scale from a single server to a cluster of distribution nodes behind a Load Balancer (LB), the margin for error disappears. Here are the core lessons learned from troubleshooting a production-scale HLS environment.

    1. The “Last-Modified” Lie and LB Skew

    In a multi-server setup (we use 5 distribution nodes), your player is constantly rotating between different IPs. If you use lsyncd or rsync to push files from a source to these nodes, you will encounter Sync Skew.

    Even with a 0-second delay, one server might receive the latest .m3u8 playlist 500ms before another. If a player hits Server A and then Server B, and Server B is slightly behind, the player sees a Last-Modified timestamp that is “older” than the previous one. This triggers Stall Detection in the player (often seen as manifestAgeMs jumping between 20s and 70s), even if the stream is technically healthy.

    The Lesson: Don’t let the player rely on the file’s “birth certificate.” Force the player to judge the stream by its actual content (the Media Sequence) by suppressing metadata headers and using aggressive cache control.

    location /livestream/ {
    alias /var/www/liveout/;

    # HLS Playlists must never be cached by the LB or the Player
    add_header Cache-Control "no-cache, no-store, must-revalidate, max-age=0" always;
    expires -1;

    # Kill the headers that cause false "Stall" detections
    add_header Last-Modified "";
    add_header ETag "";
    if_modified_since off;

    open_file_cache off;
    include cors_support;
    }

    2. The Nginx Inheritance Trap (CORS)

    This is a silent killer. In Nginx, if you define an add_header directive in a parent location and then define any add_header in a nested child location, the child does not inherit the parent’s headers.

    If you optimize your .ts segments for caching but forget to re-include your CORS headers inside that specific block, your player will fetch the playlist successfully but then fail to download the actual media segments due to a CORS error.

    The Lesson: Always re-include your cors_support and use the always flag. The always flag ensures that even if a segment is briefly missing (404), the CORS headers are sent, allowing the player to see the 404 instead of throwing a confusing “CORS blocked” error.

    location ~* \.ts$ {
    # Re-include CORS because we are adding Cache-Control headers here
    include cors_support;

    # Segments are immutable; cache them forever
    add_header Cache-Control "max-age=31536000, public, immutable" always;
    expires 1y;

    # File handle caching is safe for segments
    open_file_cache max=1000 inactive=20s;
    }

    3. The “Two Masters” Conflict in rtmp.conf

    A common mistake is trying to “help” Nginx-RTMP by giving it an application block for every stream type. In our setup, we found that we have an application app_audio block with hls on; while a separate FFmpeg script was writing audio HLS directly to the same disk. This was causing random failures in generating the audio segments.

    Nginx-RTMP has a built-in “Garbage Collector” (hls_cleanup). If it sees files in its hls_path that it didn’t specifically create (because FFmpeg wrote them directly), it will delete them. To the admin, it looks like files are vanishing into thin air.

    The Lesson: If your FFmpeg script is handling the HLS generation (which is often necessary to satisfy strict Apple AVPlayer requirements for audio-only streams), remove the application block from Nginx-RTMP entirely.

    Correct Lean rtmp.conf Logic:

    • Application Ingest: Receives the stream and triggers the script.
    • Application Video: Receives the transcoded RTMP push for video HLS.
    • Audio: No application block. Let FFmpeg own the directory and the filesystem.

    4. The rsync Trap: --size-only

    When syncing HLS manifests to distribution nodes, it is tempting to use --size-only to speed up transfers. Do not do this. An HLS manifest often retains the same file size even when the content changes (e.g., by swapping one 12-second segment URL for another). rsync with --size-only will detect identical byte counts and skip the sync, leaving your distribution nodes with stale playlists.

    The Lesson: Stick to the default mtime (modification time) checks. On a high-performance instance like a DigitalOcean C4 Droplet, the overhead is negligible, but reliability is everything.

    Summary: The Good, the Bad, and the Buffering

    1. Split your caching: Playlists get max-age=0; Segments get immutable.
    2. Explicit CORS: Nginx inheritance is not your friend. Re-include headers in nested blocks.
    3. One Master per Folder: If FFmpeg writes the HLS, Nginx-RTMP should stay out of the way.
    4. Atomic Sync: Use lsyncd with delay = 0 and compress = false for the lowest possible latency across your Load Balancer.

    By following these principles, you ensure that strict players – especially Apple’s AVPlayer – receive a stream that is consistent, fresh, and compliant with the HLS spec.