Linux Level 3 – Advanced Operations

This course moves from administration into deeper system behaviour, control, hardening, incident handling, and recovery. It covers advanced process and memory behaviour, firewalling, kernel module management, permissions models, backup and recovery strategy, and system protection features.

Course purpose

Develop deeper Linux operational judgement so engineers can understand complex host behaviour, harden systems sensibly, handle incidents with structure, and recover services and platforms safely.

Duration

  • 2 days
  • Optionally a third day of labs labs

Target audience

  • senior Linux engineers
  • platform engineers
  • escalation support engineers
  • security-conscious infrastructure teams
  • administrators responsible for recovery and hardening

Prerequisites

  • Linux Engineering or equivalent real-world experience
  • confidence working at shell and service level
  • familiarity with LVM, mounts, boot and logs

Learning outcomes

  • inspect and manage kernel modules and advanced system behaviour
  • understand Linux process and memory pressure behaviour
  • configure and reason about host firewalling
  • explain watchdog and system protection mechanisms
  • work with standard permissions, ACLs, extended attributes and SELinux/AppArmor concepts
  • design sensible backup and recovery approaches
  • troubleshoot complex incidents involving processes, memory, permissions or boot-time protection features

Detailed module structure

Module 1: Kernel modules and advanced kernel interaction

Topics:

  • kernel modules vs built-in functionality
  • loading and unloading modules
  • dependency handling
  • persistent module configuration
  • useful tools:
    • lsmod
    • modprobe
    • modinfo
    • insmod
    • rmmod
  • common reasons to inspect modules
  • driver and module troubleshooting
  • kernel command line awareness
  • sysctl overview as runtime kernel tuning

Module 2: Processes, scheduling and system behaviour

Topics:

  • process lifecycle
  • parent and child relationships
  • foreground and background
  • signals
  • service processes vs user processes
  • CPU scheduling awareness
  • priorities and niceness
  • zombie and defunct process awareness
  • process inspection:
    • ps
    • top
    • htop
    • pstree
    • pgrep
    • kill

Module 3: Memory pressure, OOM and system stability

Topics:

  • virtual memory basics
  • swap behaviour
  • page cache awareness
  • OOM killer
  • how memory exhaustion presents
  • identifying the victim process
  • prevention and tuning considerations
  • cgroup awareness if relevant to your environment

Module 4: Firewalling and packet filtering

Topics:

  • packet filtering concepts
  • chains, tables and rules at a high level
  • iptables
  • firewalld
  • relationship between legacy and modern tooling
  • service-based policy vs direct rule management
  • persistent firewall configuration
  • safe change practices to avoid locking yourself out
  • nftables awareness at a high level

Module 5: Watchdogs, resilience and host protection

Topics:

  • what a watchdog is
  • hardware vs software watchdog
  • when watchdogs are useful
  • failure scenarios
  • risks of misconfiguration
  • automatic recovery concepts
  • host health monitoring awareness

Module 6: Permissions and access control

Topics:

  • standard Unix permissions
  • ownership
  • setuid, setgid and sticky bit
  • ACLs
  • extended attributes
  • SELinux concepts
  • SELinux contexts and enforcement modes
  • AppArmor awareness if relevant to Debian and Ubuntu estates
  • troubleshooting permission denied beyond simple file mode bits

Module 7: Backup, restore and recovery strategy

Topics:

  • file-level backups vs image-level backups
  • consistency considerations
  • what to back up:
    • config
    • application data
    • databases
    • logs where needed
  • restore testing
  • bare-metal recovery concepts
  • single-file vs full-system recovery
  • documenting recovery procedures
  • integrity checking and retention

Module 8: Advanced troubleshooting and incident workflow

Topics:

  • structured troubleshooting
  • identifying whether the issue is:
    • hardware
    • kernel
    • service
    • filesystem
    • permissions
    • network
  • collecting evidence safely
  • avoiding destructive fixes
  • handover notes and post-incident review
Capstone focus: this final module brings together logs, processes, permissions and operational judgement into a single incident workflow.

Labs

  • inspect and load kernel modules
  • diagnose a missing-driver or module problem
  • trace a runaway or stuck process
  • simulate OOM and identify what happened
  • create firewall rules safely and recover from mistakes
  • compare standard permissions, ACLs and SELinux effects
  • perform a restore exercise from backup
  • troubleshoot a composite incident using logs, processes and permissions

Assessment

Advanced operations practical

  • inspect modules, processes and memory symptoms
  • interpret permission and protection-related failures
  • review a firewall change for safety and persistence
  • justify an appropriate backup or recovery action

Incident scenario

Troubleshoot a multi-layer Linux incident involving service instability, memory pressure, permissions confusion or protection features, then document the safest recovery path.

Deeper host understanding - Better recovery decisions - Stronger Linux operational control

Built for engineers who need to harden, troubleshoot and recover Linux systems under real operational pressure