Proxmox Level 3 – Advanced

This module focuses on advanced host-level engineering, maintenance planning, upgrade strategy, disaster handling and low-level recovery tasks. It is for people who need to recover broken systems, plan change safely and work confidently below the GUI layer.

Course purpose

Build the advanced recovery, diagnostics and change-planning skills needed to manage Proxmox VE safely at host level, especially during upgrades, incidents and lower-layer storage or boot failures.

Duration

  • 2 days

Target audience

  • senior infrastructure engineers
  • platform engineers
  • escalation support engineers
  • Proxmox administrators responsible for maintenance and recovery

Prerequisites

  • strong Linux command-line skills
  • working experience with Proxmox clusters
  • ideally completion of Proxmox Level 2 or equivalent experience

Learning outcomes

  • explain the Linux boot process in the context of a Proxmox host
  • use journalctl, udev tools and kernel logs for fault-finding
  • plan minor and major upgrades with appropriate risk assessment
  • recover hosts entering emergency mode
  • repair or inspect VM disks offline
  • handle migration or recovery tasks involving block-backed VM storage
  • make structured operational decisions during host incidents

Detailed module structure

Unit 1: Linux boot process on a Proxmox host

Topics:

  • firmware to bootloader to kernel to init system
  • initramfs role
  • systemd target flow
  • storage and network dependencies during boot
  • how boot failures present in Proxmox environments

Lab ideas:

  • walk through boot stages on a host
  • identify where a storage or mount failure interrupts boot

Unit 2: udev, journald and practical diagnostics

Topics:

  • device discovery and naming
  • persistent device identifiers
  • how udev affects disks and NICs
  • using journalctl for historical troubleshooting
  • kernel ring buffer and boot-time messages
  • filtering logs for storage, network and service failures

Lab ideas:

  • trace a disk/device appearing in the system
  • analyse journal output from a failed service or missing mount

Unit 3: Linux kernel basics for Proxmox engineers

Topics:

  • what matters operationally about the kernel
  • modules, drivers and hardware enablement
  • kernel versions and compatibility concerns
  • interpreting kernel-related symptoms
  • when to suspect kernel issues vs service or config issues

Lab ideas:

  • inspect loaded modules
  • compare behaviour across kernel versions in a safe test case

Unit 4: Minor upgrades — planning and execution

Topics:

  • what counts as a minor upgrade
  • reading release notes and known issues
  • compatibility checks
  • maintenance windows
  • rollback thinking and pre-change validation
  • risk assessment and stakeholder communication

Lab ideas:

  • build a minor-upgrade checklist
  • assess a fictional environment and decide whether to proceed

Unit 5: Major upgrades — strategy, risk and rebuild-led approach

Topics:

  • what counts as a major upgrade
  • when to upgrade and when to defer
  • unsupported or high-risk scenarios
  • why clean rebuilds are often preferable
  • backup of config and operational metadata
  • reinstall and restore approach
  • validation and post-upgrade testing

Lab ideas:

  • draft a major-upgrade runbook
  • compare in-place thinking vs rebuild-and-restore thinking
  • identify critical pre-upgrade artefacts to back up

Unit 6: Emergency mode and boot recovery

Topics:

  • what emergency mode is
  • typical causes
  • mount failures and dependency failures
  • /etc/fstab errors and recovery workflow
  • safely editing mount configuration during recovery
  • verifying recovery before returning to service

Lab ideas:

  • break an fstab entry in a lab
  • recover the host from emergency mode
  • validate restored boot behaviour

Unit 7: Mounting and repairing VM disks offline

Topics:

  • raw vs qcow2 formats
  • loop devices and partition mapping
  • safe mounting of guest disks
  • filesystem checks and repair workflow
  • preserving evidence and minimising risk during recovery
  • common mistakes when attaching guest disks to the host

Lab ideas:

  • mount a qcow2 image read-only
  • mount a raw disk image
  • recover a file from an offline guest disk
  • perform a safe filesystem check in a lab scenario

Unit 8: Working with block-backed storage during hardware failure

Topics:

  • understanding guests stored on block-backed storage
  • recovery options when physical media fails
  • copying guest disks from block-backed storage to directory-based storage
  • staging repair or migration work
  • restoring back to block-backed storage after recovery
  • operational cautions during disk swaps and storage migration
Operational framing: recovering and migrating guests on LVM/LVM-thin or other block-backed storage.

Lab ideas:

  • move a guest disk from block-backed storage to file-backed storage
  • inspect the disk offline
  • migrate it back after repair

Assessment

Scenario-based exercise

  • host boots to emergency mode after storage changes
  • learners diagnose, repair and document actions

Written risk assessment

Decide whether a production environment is suitable for minor or major upgrade.

Advanced recovery - Safer upgrade planning - Stronger host-level troubleshooting

Designed for engineers who need to recover, assess and operate Proxmox under pressure