One of the many benefits of UCS is the ease at which firmware upgrades/downgrades can be applied. UCS Manager simplifies this by utilizing an Auto-Install procedure that takes a lot of the manual firmware application out of the equation. UCS Firmware Auto-Install was introduced in UCS version 2.1(1) and includes two stages:
- Infrastructure Firmware – Uses the UCS software bundle to upgrade the infrastructure components such as the fabric interconnects, IOMs, and UCS Manager.
- Server Firmware – Uses the B-Series software bundle to upgrade all blade servers in the UCS domain and/or the C-Series software bundle to upgrade the rack servers.
One question I routinely get is what kind of pre-checks or “homework” should I do before a UCS firmware upgrade is attempted. Oddly enough, I haven’t found any one document that details all the checks that I have been using over the last year(s). So, I thought I would attempt to document the “sanity” checks I use prior to a UCS firmware upgrade. These are in no particular order but definitely steps that I have learned to love over the year.
- How to determine which Firmware version to go to:
The version of firmware is usually dictated by either trying to resolve a defect, taking advantage of new feature, or needing to support new hardware.
A Cisco UCS customer can log into support.cisco.com with their CCO ID and go to the UCS firmware page. Here you will find a list of all available firmware. Cisco has what they call “Suggested” releases. These releases have stars by them which means that Cisco is recommending this release due to software quality, stability and longevity.
- What to do after a Firmware version is selected:
Always check the release notes on Cisco’s website for the version you are upgrading to. For this example, if we are upgrading to version 2.2:
Its important to understand the required minimum software version as this should be the one that you should update to. This depends on what new infrastructure you are adding such as new blade type or VIC module. The 2.2 Family of firmware has several sub-versions. The release notes will have a minimum version table. Always investigate this to ensure you are meeting the minimum requirements. For example the below insert is from the 2.2(6e) release notes:
You also want to check the Capability Catalog to ensure that the server components are supported. The Capability Catalog is found in the release notes. Below is an example:
Lastly I recommend checking the UCS firmware version against the UCS Interoperability Matrix. This will tell you if any additional requirements are needed based on your operating system, etc. One example I have run into is needing to upgrade my enic and fnic drivers in VMware after a UCS firmware upgrade.
- What about those pesky bugs?
The release notes once again are a great place to go to review any resolved caveats or to see if any open caveats still remain. The release notes will give you the Defect ID and a brief description. I always pay attention to the SEV 1 and 2 defects to ensure its something I am aware of, and can tolerate in my environment.
- Review Faults and Open Case with Cisco TAC
Always log into UCS Manager and check any faults for severity and/or criticality. It is extremely important to address and remove (if possible) any critical or major alerts before completing a UCS firmware upgrade. During the upgrade procedure, Auto-Install will notify you if you have critical or major alerts.
- Open a support case with TAC
I always call TAC and open a “pro-active” support case. If any questions/issues arise during the upgrade process, TAC would then be available to jump in and help. Another advantage in being proactive is TAC can run the UCS logs through a scrubbing process and help you determine if any bugs/defects could prevent a successfull upgrade process.
- Run UCS Healthcheck
I am a big fan of this. Out on http://communities.cisco.com you will find several UCS Healthcheck power shell scripts that users have submitted. Most are very good and will give you a HTML based report on the health of your UCS system. For customers that require change control before any firmware is upgraded you will find that these reports are valuable documentation in requesting the change control. In most cases they will identify any critical or health related issues in a single one-stop location.
- Check the bootflash on the Fabric Interconnect
This one is something that Cisco has documented but sometimes overlooked. Among other things, the bootflash on the fabric interconnect contains the firmware packages that have been downloaded. Over time this can get full requiring user intervention to go out and clean up any old/unused packages. You can check the status of the bootflash by going to Equipment > Fabric InterconnectX > Local Storage Information.
Firmware packages can be check/deleted at – Equipment > Firmware Management > Packages. I recommend that you keep and N-1 package (in case rollback is needed) + the package you are running in production.
- Confirm your Maintenance Policy is set to User-Ack.
Go to Servers > Maintenance Policy and confirm that either the default is set to User-Ack or that you have a User-Ack policy defined and in use by your Service Profiles. Failure to do so could cause your blades to reboot without your confirmation……after all one of the beauties of UCS is non-disruptive firmware upgrades. (by default with 2.2, User-Ack is set, but you still need to confirm).
- Disable Call Home (Optional)
When you upgrade a Cisco UCS domain, Cisco UCS Manager restarts the components to complete the upgrade process. This restart causes events that are identical to service disruptions and component failures that trigger Call Home alerts to be sent. If you do not disable Call Home before you begin the upgrade, you can ignore the alerts generated by the upgrade-related component restarts. If your using Smart Call Home, then TAC will start getting alerts and calling into your environment. Remember that once done, you need to turn it back on J.
- Default Host Firmware Policy
After you upgrade Cisco UCS Manager, a new host firmware policy named “default” is created, and assigned to all service profiles that did not already include a host firmware policy. The default host firmware policy is blank. It does not contain any firmware entries for any components. This default policy is also configured for an immediate reboot rather than waiting for user acknowledgment before rebooting the servers. During the upgrade of server firmware, you can add firmware for the blade and rack mount servers in the Cisco UCS domain to the default host firmware policy. To complete the upgrade, all servers must be rebooted.
- Last but not least you can find detailed Auto-Install instructions at:
UCS Firmware is meant to be simple and easy to do. Most customers I have talked to find it this way. Over time however you learn to check certain things, adjust procedures, etc. This is all meant to make the “next time” even better. The above steps are my personal, experience opinion and not reflective of Cisco or any customer.