Transactional Debian Upgrades with ZFS on Nexenta
The Problem explained
There is no ideal software, it always has bugs. Minor, major or security issues will always exist and modern operating systems need to deal with this fact.
"""Today, engineers can design and test something as complex as a Boeing 777 in cyberspace. But paradoxically, that's not possible with big software programs. The physical laws governing how metal behaves when shaped into a plane and hurled through the air are well known. For software, there is no such body of basic science. """ www.businessweek.com
The Solution
What if any software which user installs had the capability to rollback to a previously known successful point and the rollback operation itself took no time?
What if every developer or user had a tool which could checkpoint the operating system and had the capability to revert changes in no time?
This is possible if we will marry two great technologies: ZFS and Debian APT. Both technologies are now part of Nexenta Operating System which is the core foundation for its derivative distributions.
Meet apt-clone(8). The tool which integrates with the NexentaCP system, and keeps track of upgrade checkpoints and allows you to create/destroy/edit checkpoints by request.
Example #1. Recovering from unsuccessful upgrade
In this first example we will try to bring the system up to date with the 'unstable' official repository. The Unstable repository might break stuff, but sometimes it has fixes or features which outweigh the risk of upgrade. Upgrade usually involves some risk, even for well tested software... With Nexenta ZFS's integrated capabilities and the apt-clone utility the risk is minimal - just checkpoint the system before upgrade and roll it back in case of failure or other reason.
Note that initially NexentaCP 1.0 system has such a ZFS layout:
root@myhost:/export/home/erast/apt# zfs list NAME USED AVAIL REFER MOUNTPOINT syspool 1.36G 2.18G 23K none syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy syspool/rootfs-nmu-000@initial 226M - 786M -
where "syspool/rootfs-nmu-000" is the bootable ZFS dataset. Now, lets do an upgrade from unstable repository using apt-clone:
root@myhost:/export/home/erast/apt# apt-clone dist-upgrade This operation will upgrade your system using ZFS capabilities. Proceed ? (y/n) y Updating APT sources ... Downloading upgrades and checking if reboot will be required. This may take a few minutes. Please wait... Verifying free space... Success. Upgrade requires 279.60MB of available free space. This upgrade will require REBOOT. Proceed? (y/n) y Upgrade is in progress. Please DO NOT interrupt... Creating Upgrade Checkpoint... Upgrade Checkpoint has been created: rootfs-nmu-001 Use 'zfs list -r syspool' command to list all available upgrade/rollback checkpoints Extracting templates from packages: 100% Preconfiguring packages ... (Reading database ... 40453 files and directories currently installed.) Preparing to replace nexenta-sunw 5.11.79-1 (using .../nexenta-sunw_5.11.80-1_solaris-i386.deb) ... Unpacking replacement nexenta-sunw ... Setting up nexenta-sunw (5.11.80-1) ... (Reading database ... 40453 files and directories currently installed.) Preparing to replace nexenta-lu 5.11.79-1 (using .../nexenta-lu_5.11.80-1_solaris-i386.deb) ... Initiating NLU protected environment ... Unpacking replacement nexenta-lu ... Setting up nexenta-lu (5.11.80-1) ... ... ... Unpacking replacement sunwwbsup ... Preparing to replace sunwesu 5.11.79-1 (using .../sunwesu_5.11.80-1_solaris-i386.deb) ... Unpacking replacement sunwesu ... Errors were encountered while processing: /var/cache/apt/archives/sunwscmu_5.11.80-1_solaris-i386.deb /var/cache/apt/archives/sunwsmbsu_5.11.80-1_solaris-i386.deb *********************************************************************** * * * Upgrade sequence returned an error. To enter NLU protected * * environment please type 'source /tmp/nlubin/env.sh' * * * *********************************************************************** E: Sub-process /usr/bin/dpkg returned an error code (1) Upgrade failed. Would you like to rollback changes now? (y/n) y All upgrade changes now rolled back.
And indeed, root filesystem didn't change at all:
root@myhost:/export/home/erast/apt# zfs list NAME USED AVAIL REFER MOUNTPOINT syspool 1.36G 2.18G 23K none syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy syspool/rootfs-nmu-000@initial 226M - 786M -
We are back to the starting point, few minutes spent on upgrade and just few seconds on rollback and system in exactly previous state, no reboot required... Not bad!
Example #2. Successful upgrade, but revert back to the previous state
Sometimes we want to go back in time to the previously known state even if a software upgrade went successfully. Let's see how this could be done.
This time - a successful upgrade:
...
Setting up sunwsshdr (5.11.80-1) ...
Setting up class: sshdconfig /etc/ssh/sshd_config
Setting up class: manifest /var/svc/manifest/network/ssh.xml
Setting up sunwsshdu (5.11.80-1) ...
Setting up sunwwbsup (5.11.80-1) ...
Setting up sunwesu (5.11.80-1) ...
Creating ram disk for /tmp/upgrade-attempt.23979
updating /tmp/upgrade-attempt.23979/platform/i86pc/amd64/boot_archive
updating /tmp/upgrade-attempt.23979/platform/i86pc/boot_archive
* * *
SYSTEM NOTICE
The first phase of upgrade has completed successfully:
- created Upgrade Checkpoint 'rootfs-nmu-001'
- the system is ready to reboot into the new checkpoint
- all Zones been checkpointed and upgraded
+------------------------------------------------------------------+
| |
| At this point you have three options: |
| |
| 1. You can reboot now, make sure that system is healthy and |
| then activate the current (i.e., newly created) checkpoint. |
| |
| 2. You can activate the newly created (upgraded) checkpoint |
| right now, and then reboot. |
| |
| 3. Or, you can simply continue using the system as is and |
| do (1) or (2) later. |
| |
+------------------------------------------------------------------+
Would you like to follow the option (1) above and reboot now ? (y/n) y
Activate upgrade command: 'apt-clone -a rootfs-nmu-001'
Rollback changes command: 'apt-clone -r rootfs-nmu-001'
Operation in progress. Please wait...
After the machine is rebooted, lets see what is happened with system pool:
root@myhost:/export/home/erast# zfs list NAME USED AVAIL REFER MOUNTPOINT syspool 1.87G 1.67G 23.5K legacy syspool/rootfs-nmu-000 1.37G 1.67G 1.13G legacy syspool/rootfs-nmu-000@initial 234M - 786M - syspool/rootfs-nmu-000@nmu-001 7.13M - 1.13G - syspool/rootfs-nmu-001 514M 1.67G 1.17G legacy root@myhost:/export/home/erast# apt-clone -l A C BOOTFS TITLE o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial] o rootfs-nmu-001 Upgrade Checkpoint [nmu-001 : Jan 17 04:09:32 2008]
Please notice that the active checkpoint is 'rootfs-nmu-000' while 'rootfs-nmu-001' is currently loaded. Assuming that for some reason (like you discovered that some software does not behave like it used to) we can decide to rollback this upgrade. However, it is not possible to rollback the current checkpoint, and this is understandable - we are currently using it and dataset is locked:
root@myhost:/export/home/erast/apt# apt-clone -r rootfs-nmu-001 This will destroy clone 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y apt-clone.WrongArguments: Can not destroy currently active system folder
So, we reboot, and select the previous checkpoint from GRUB:
After reboot, notice that current bootfs is 'rootfs-nmu-000':
root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
rootfs-nmu-001 Upgrade Checkpoint [nmu-001 : Jan 17 04:09:32 2008]
Now lets revert the previous upgrade:
root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -r rootfs-nmu-001 This will destroy clone 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y Upgrade changes for clone 'syspool/rootfs-nmu-001' now rolled back/destroyed. root@myhost:/export/home/erast/apt/apt-0.6.46.4nexenta12# apt-clone -l A C BOOTFS TITLE o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
As you can see, the dataset and 'rootfs-nmu-001' are deleted and we simply continue to work with the unmodified system:
root@myhost:/export/home/erast/apt# zfs list NAME USED AVAIL REFER MOUNTPOINT syspool 1.36G 2.18G 23K none syspool/rootfs-nmu-000 1.36G 2.18G 1.14G legacy syspool/rootfs-nmu-000@initial 226M - 786M -
Example #3. Installing applications under ZFS supervision
Let's assume we have an application we want to deploy, but the changes involved could be too intrusive and our system could end up to be unusable even after software removal. Think of the Windows registry or UNIX's /etc directory (just for example), but whatever the system it sometimes happens that there is no way back, only complete OS re-installation (if you are lacking a manual backup).
In this example, lets install apache:
root@myhost:/export/home/erast# apt-clone install apache2 This operation will upgrade your system using ZFS capabilities. Proceed ? (y/n) y Updating APT sources ... Downloading upgrades and checking if reboot will be required. This may take a few minutes. Please wait... Verifying free space... Success. Upgrade requires 4.05MB of available free space. Upgrade is in progress. Please DO NOT interrupt... Creating Rollback Checkpoint... Rollback Checkpoint has been created: rootfs-nmu-001 Use 'zfs list -r syspool' command to list all available upgrade/rollback checkpoints Preconfiguring packages ... Selecting previously deselected package libapr0. (Reading database ... 40408 files and directories currently installed.) ... Setting up apache2-mpm-worker (2.0.55-4nexenta2.3) ... Starting apache 2.0 web server.... Setting up apache2 (2.0.55-4nexenta2.3) ...
It is now installed. Let's assume that we modified the system pool to make apache2 work with self-compiled php, and changed some other aspects of the system and suddenly decided that apache2 is not what we wanted, but that we better go back to apache1 setup...
Let's see if the rollback checkpoint was created:
root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-000 Nexenta Core Platform "Elatte" [initial]
rootfs-nmu-001 Rollback Checkpoint [nmu-001 : Jan 17 17:54:45 2008]
root@myhost:/export/home/erast# apt-clone -l
root@myhost:/export/home/erast# zfs list
NAME USED AVAIL REFER MOUNTPOINT
syspool 1.37G 2.17G 23.5K legacy
syspool/rootfs-nmu-000 1.37G 2.17G 1.14G legacy
syspool/rootfs-nmu-000@initial 234M - 786M -
syspool/rootfs-nmu-000@nmu-001 3.75M - 1.14G -
syspool/rootfs-nmu-001 77.5K 2.17G 1.14G legacy
Good, we see that checkpoint 'rootfs-nmu-001' was successfully created, now let's activate it:
root@myhost:/export/home/erast# apt-clone -a rootfs-nmu-001 This will set default GRUB entry to 'syspool/rootfs-nmu-001'. Proceed ? (y/n) y Upgrade changes for clone 'syspool/rootfs-nmu-001' has been activated. Default GRUB entry '0' will boot 'syspool/rootfs-nmu-001' ZFS clone. root@myhost:/export/home/erast# apt-clone -l A C BOOTFS TITLE o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008] o rootfs-nmu-000 Upgrade Checkpoint [nmu-000 : Dec 30 09:46:19 2007]
Yes, the checkpoint 'rootfs-nmu-001' is active now, however we need to reboot to get our previous state. It shouldn't take a long time and in a couple of minutes we get our system back to the checkpointed state:
root@myhost:/export/home/erast# dpkg -l|grep apache2
i.e. no apache2, and system state is back. However, checkpoint 'rootfs-nmu-001' is now active and current:
root@myhost:/export/home/erast# apt-clone -l
A C BOOTFS TITLE
o o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008]
rootfs-nmu-000 Upgrade Checkpoint [nmu-000 : Dec 30 09:46:19 2007]
We can safely destroy our previous checkpoint 'rootfs-nmu-000' now, or keep it, if we want to continue on this setup later on. Let's destroy it, to save some space taken by apache2 modifications:
root@myhost:/export/home/erast# apt-clone -r rootfs-nmu-000 This will destroy clone 'syspool/rootfs-nmu-000'. Proceed ? (y/n) y Upgrade changes for clone 'syspool/rootfs-nmu-000' now rolled back/destroyed. root@myhost:/export/home/erast# apt-clone -l A C BOOTFS TITLE o o rootfs-nmu-001 Nexenta Core Platform [nmu-001 : Jan 17 17:54:45 2008]
The system state is back, but the layout is slightly changed:
root@myhost:/export/home/erast# zfs list NAME USED AVAIL REFER MOUNTPOINT syspool 1.37G 2.17G 23.5K legacy syspool/rootfs-nmu-001 1.36G 2.17G 1.14G legacy syspool/rootfs-nmu-001@initial 234M - 786M -
You can see that 'rootfs-nmu-000' is been used by apt-clone to accomplish an in-place upgrade, that is why the checkpoint been called 'Rollback Checkpoint' which was activated later. However, it was also an option to use 'Upgrade Checkpoint' and install apache2 server into a ZFS cloned filesystem in chroot environment using '-s' option like this:
apt-clone -s install apache2
In that case, the rollback procedure is not much different to what been explained in Examples #1 and #2.
Good Luck and happy checkpointing!
The End
To conclude I would like to mention NexentaStor, which is NexentaCP-based derivative and extends this idea with providing production-quality and integrated with management software upgrades. Read more details here.