Patching a live Solaris 10 system with LU, ZFS, and PCA

Sun have done some work in recent times with liveupgrade – the last time I looked at it, a few years back now, it was rubbish. I thought it was about time I took another look, since a lot of the updates in OpenSolaris were looking good.

The idea was to patch a Solaris 10 update 8 (10/09) machine to the most recent patch levels, whilst the machine was still up and running, going about it’s daily business. Other that the standard Solaris tools, I’d be using pca (Patch Check Advanced) to do the actual patching. The system was installed with a ZFS root, since this actually gets us some great features in LiveUpgrade (LU) – namely ZFS snapshots as boot environments (BEs).

First off, create a BE that will be patched:

solaris:~# lucreate -n patching
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Cloning file systems from boot environment to create boot environment .
Creating snapshot for on .
Creating clone for on .
Setting canmount=noauto for in zone on .
WARNING: split filesystem file system type cannot inherit
mount point options <-> from parent filesystem file
type <-> because the two file systems have different types.
Saving existing file in top level dataset for BE as //boot/grub/menu.lst.prev.
File propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE in GRUB menu
Population of boot environment successful.
Creation of boot environment successful.

If we now take a look at the ZFS filesystems we can see the ‘patching’ snapshot…

solaris:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.08G 5.73G 33K /rpool
rpool/ROOT 1.03G 5.73G 21K legacy
rpool/ROOT/install 1.03G 5.73G 1.03G /
rpool/ROOT/install@patching 59.5K - 1.03G -
rpool/ROOT/patching 120K 5.73G 1.03G /
rpool/dump 560M 5.73G 560M -
rpool/export 44K 5.73G 23K /export
rpool/export/home 21K 5.73G 21K /export/home
rpool/swap 512M 6.14G 100M -

Let’s see what lustatus shows us now…

solaris:~# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
install yes yes yes no -
patching yes no no yes -

So we have two boot environments.

LU has a nice feature of letting you mount a BE to do ‘work’ on it. Let’s see what’s mounted, and then mount our newly created ‘patching’ BE…

solaris:~# lumount
install on /
solaris:~# lumount patching
/.alt.patching
solaris:~# lumount
install on /
patching on /.alt.patching

So now our alternate boot environment is mounted as /.alt.patching, we can go ahead and patch it. pca supports patching to an alternative root with the -R switch, much like Solaris packaging tools…

solaris:~# pca -i -R /.alt.patching
[snip]
------------------------------------------------------------------------------
141505 04 < 07 RS- 28 SunOS 5.10_x86: ipf patch
Looking for 141505-07 (29/84)
Trying SunSolve
Trying https://sunsolve.sun.com/ (1/1)
Done
Installing 141505-07 (29/84)
Unzipping patch
Running patchadd
Done
Reboot recommended
------------------------------------------------------------------------------
[snip]
------------------------------------------------------------------------------
Download Summary: 84 total, 84 successful, 0 skipped, 0 failed
Install Summary : 84 total, 84 successful, 0 skipped, 0 failed

This could take a while.

When the patching is complete, unmount the BE and set it to be the active one on the next reboot…

solaris:~# luumount patching
solaris:~# lumount
install on /
solaris:~# luactivate patching
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE
Saving existing file in top level dataset for BE as //etc/bootsign.prev.
A Live Upgrade Sync operation will be performed on startup of boot environment.
Generating boot-sign for ABE Saving existing file in top level dataset for BE as //etc/bootsign.prev.
Generating partition and slice information for ABE Copied boot menu from top level dataset.
Generating multiboot menu entries for PBE.
Generating multiboot menu entries for ABE.
Disabling splashimage
Re-enabling splashimage
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.
**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:
1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.
2. Mount the Parent boot environment root slice to some directory (like /mnt). You can use the following command to mount:
mount -Fzfs /dev/dsk/c0d0s0 /mnt
3. Run utility with out any arguments from the Parent boot
environment root slice, as shown below:
/mnt/sbin/luactivate
4. luactivate, activates the previous working boot environment and
indicates the result.
5. Exit Single User mode and reboot the machine.
**********************************************************************
Modifying boot archive service
Propagating findroot GRUB for menu conversion.
File propagation successful
File propagation successful
File propagation successful
File propagation successful
Deleting stale GRUB loader from all BEs.
File deletion successful
File deletion successful
File deletion successful
Activation of boot environment successful.

Notice the message about what to do to recover the old session should the boot fail. Personally I keep a copy of that notice to hand, just in case. Evernote is particularly handy I find.

So if we now look at lustatus, we can see our patching BE is the active on reboot…

solaris:~# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
install yes yes no no -
patching yes no yes no -

So let’s go ahead and reboot at a time that suits us. When the system comes back up we can see ‘patching’ is now the active BE…

solaris:~# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
install yes no no yes -
patching yes yes yes no -

And pca shows us there are no patches to be applied, so we’re up to date…

solaris:~# pca -l
Using /var/tmp/patchdiag.xref from Mar/02/10
Host: solaris (SunOS 5.10/Generic_142901-05/i386/i86pc)
List: missing (0/0)

zfs list shows us that the patching snapshot is now using up space too…

solaris:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 2.82G 5.00G 36.5K /rpool
rpool/ROOT 1.77G 5.00G 21K legacy
rpool/ROOT/install 31.3M 5.00G 1.05G /
rpool/ROOT/patching 1.74G 5.00G 1.37G /
rpool/ROOT/patching@patching 377M - 1.03G -
rpool/dump 560M 5.00G 560M -
rpool/export 44K 5.00G 23K /export
rpool/export/home 21K 5.00G 21K /export/home
rpool/swap 512M 5.40G 100M -

Using a recent Solaris 10, with ZFS root, LU and pca gives us a very realistic way of patching systems in a working, production, environment without the pain of downtime and with a workable roll back strategy.

Now, if I had time to write something to centralise this for many hosts, it would make a fantastic enterprise setup :)

Leave a comment ?

12 Comments.

  1. This is fantastic! My only question is, what happens next? You’re now booting into the snapshot, “patching,” forever? How about the next time you need to patch?

    -James

    • Hi James,

      You can delete the original BE and even rename this one if you chose. See man ludelete and lurename.

      Cheers! –Mark

  2. Thanks Mark!

    I think the best use of this, for me, would be to have my first step be something like this:

    lucreate -n patched20100623

    Then if I patch next month, have it be:
    lucreate -n patched20100723

    etc. that way I have specific places to roll back to. Does this make any sense? We have plenty of disk space, so i’m not worried about having zfs snapshots pile up. I was just a little confused about permanently booting off a snapshot. ZFS is a little new to us, and I thought I had it all figured out, but this struck me a little funny.

    Bottom line, though, is that it’s perfectly fine to leave it booting to that “patching” snapshot? And the next time I do the same thing it will create a new patching snapshot from the one I just patched?

    -James

    • Yep, perfectly sensible way of working James. The beauty of snapshots is they don’t take up much space either.

  3. Thanks for the snappy replies, Mark! I’ll be reading through your other blog entries as you seem to be doing a lot of the same tasks as me.

    thanks again for your help!
    -james

  4. my only problem now is that i’m getting “Error 403″ on 297 of the patches that I need. I have two contracts listed in the sunsolve page that I added this afternoon. One of them, I think, is expired. I added that one first. Now it’s telling me that I have access to:

    Solaris9SoftwareUpdates
    Solaris10SoftwareUpdates
    ContractRequired
    OpenSolarisProductionPackage
    Solaris8SoftwareUpdates
    HardwareUpdates
    SolarisSoftwareUpdates
    Public

    but i’m still getting 403. (I got 7, i think, successful the first time I ran it. These 297 were the ones that 403′ed the first time) Have any idea if it takes some time for Sun to update everything? The machine I’m working on does NOT have a contract, by the way. It’s our test machine (prepping for the patching on the contracted machine tomorrow) Perhaps sunsolve knows the machine’s serial? I know this isn’t your area, but figured i’d ask in case you had any experience with it.

    -James

    • Unfortunately I can’t help you there James, sorry. I’m always working on machines with an enterprise subscription to SunSolve (lucky me).

  5. dariusz dolecki

    I am running Solaris 10 with zfs mirrored drives, I tried to patch the drives but had some issues, so I did per the instructions here….I booted of the net into single user mode and tried to mount one of the drives to do an luactivate to get the previous BE, but I got the following error message:

    cannot open ‘/dev/dsk/c0t0d0s0′: invalid dataset name

    is this because the drives were mirrored in zfs?

  6. Someone suggested I follow the instructions they got when they did a liveupgrade:

    Hi Dariusz,

    Below are the messages I get after running luactivate, indicating how to boot back to the old BE, my old BE was called sol10u8, so substitute your BE name:

    ———————-
    In case of a failure while booting to the target BE, the following process
    needs to be followed to fallback to the currently working boot environment:

    1. Boot from the Solaris failsafe or boot in Single User mode from Solaris
    Install CD or Network.

    2. Mount the Parent boot environment root slice to some directory (like
    /mnt). You can use the following commands in sequence to mount the BE:

    zpool import rpool rpool
    zfs inherit -r mountpoint rpool/ROOT/sol10u8 rpool/ROOT/sol10u8
    zfs set mountpoint= rpool/ROOT/sol10u8 rpool/ROOT/sol10u8
    zfs mount rpool/ROOT/sol10u8 rpool/ROOT/sol10u8

    3. Run utility with out any arguments from the Parent boot
    environment root slice, as shown below:

    /sbin/luactivate

    4. luactivate, activates the previous working boot environment and
    indicates the result.

    5. Exit Single User mode and reboot the machine.

    ————————-

    Now, I tried those instructions, but got the following error message:

    cd /mnt/usr/sbin
    ./luactivate
    luactivate: The system is in single user more.

    Welcome to the Live Upgrade out of service administration.
    Activating the previous working boot environment.

    *************************************************************
    Unable to activate the previous working boot environment. You have not
    mounted the correct root slice containing the previous working boot
    environment to be activated. Please mount the proper root slice and
    re-invoke luactivate
    *****************************************************************

    Fallback activation failed.

  7. Hi Mark

    i tried this way for patching. every thing works fine. i patched ABE and after a week i got doen time for my server and activated my ABE and rebooted the server. we lost all the files in /home which are created in these 7 days. all my file systems are under zfs rpool. as far as i know /home and /export are ahred file systems between ABE and CBE. no clue why we miss the recntly generated files in /home.

    rpool 111G 22.3G 97K /rpool
    rpool/ROOT 92.1G 22.3G 21K legacy
    rpool/ROOT/sol10u6_ZFS_BE 16.6G 22.3G 85.2G /
    rpool/ROOT/sol10u9_ZFS_BE 75.5G 22.3G 74.6G /.alt.tmp.b-ZRg.nt
    rpool/ROOT/sol10u9_ZFS_BE@sol10u9_ZFS_BE 910M – 74.1G -
    rpool/dump 2.00G 22.3G 2.00G -
    rpool/swap 16.5G 38.8G 16K -

    my OS version is
    Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC
    Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
    Assembled 11 August 2010

    any sugessions and help would we appriciated

    • Erm, yes – fairly obvious really. The ABE is a clone, so you made a snapshot of a point in time. You rebooted one week later, therefore the changes inbetween were lost. If /home and /export were separate zfs filessystems that’d be a different matter.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackbacks and Pingbacks: