Applies to SUSE OpenStack Cloud 7

11 SUSE OpenStack Cloud Maintenance

11.1 Keeping the Nodes Up-to-date

Keeping the nodes in SUSE OpenStack Cloud up-to-date requires an appropriate setup of the update and pool repositories and the deployment of either the Updater barclamp or the SUSE Manager barclamp. For details, see Section 5.2, “Update and Pool Repositories”, Section 9.4.1, “Deploying Node Updates with the Updater Barclamp”, and Section 9.4.2, “Configuring Node Updates with the SUSE Manager Client Barclamp”.

If one of those barclamps is deployed, patches are installed on the nodes. Installing patches that do not require a reboot of a node does not come with any service interruption. If a patch (for example, a kernel update) requires a reboot after the installation, services running on the machine that is rebooted will not be available within SUSE OpenStack Cloud. Therefore it is strongly recommended to install those patches during a maintenance window.

Note
Note: No Maintenance Mode

As of SUSE OpenStack Cloud 7 it is not possible to put SUSE OpenStack Cloud into Maintenance Mode.

Consequences when Rebooting Nodes
Administration Server

While the Administration Server is offline, it is not possible to deploy new nodes. However, rebooting the Administration Server has no effect on starting instances or on instances already running.

Control Nodes

The consequences a reboot of a Control Node has, depends on the services running on that node:

Database, Keystone, RabbitMQ, Glance, Nova:  No new instances can be started.

Swift:  No object storage data is available. If Glance uses Swift, it will not be possible to start new instances.

Cinder, Ceph:  No block storage data is available.

Neutron:  No new instances can be started. On running instances the network will be unavailable.

Horizon.  Horizon will be unavailable. Starting and managing instances can be done with the command line tools.

Compute Nodes

Whenever a Compute Node is rebooted, all instances running on that particular node will be shut down and must be manually restarted. Therefore it is recommended to evacuate the node by migrating instances to another node, before rebooting it.

11.2 Service Order on SUSE OpenStack Cloud Start-up or Shutdown

In case you need to restart your complete SUSE OpenStack Cloud (after a complete shut down or a power outage), the nodes and services need to be started in the following order:

Service Order on Start-up
  1. Control Node/Cluster on which the Database is deployed

  2. Control Node/Cluster on which RabbitMQ is deployed

  3. Control Node/Cluster on which Keystone is deployed

  4. For Swift:

    1. Storage Node on which the swift-storage role is deployed

    2. Storage Node on which the swift-proxy role is deployed

  5. For Ceph:

    1. Storage Node on which the ceph-mon role is deployed

    2. Storage Node on which the ceph-osd role is deployed

    3. Storage Node on which the ceph-radosgw and ceph-mds roles are deployed (if deployed on different nodes: in either order)

  6. Any remaining Control Node/Cluster. The following additional rules apply:

    • The Control Node/Cluster on which the neutron-server role is deployed needs to be started before starting the node/cluster on which the neutron-l3 role is deployed.

    • The Control Node/Cluster on which the nova-controller role is deployed needs to be started before starting the node/cluster on which Heat is deployed.

  7. Compute Nodes

If multiple roles are deployed on a single Control Node, the services are automatically started in the correct order on that node. If you have more than one node with multiple roles, make sure they are started as closely as possible to the order listed above.

If you need to shut down SUSE OpenStack Cloud, the nodes and services need to be terminated in reverse order than on start-up:

Service Order on Shut-down
  1. Compute Nodes

  2. Control Node/Cluster on which Heat is deployed

  3. Control Node/Cluster on which the nova-controller role is deployed

  4. Control Node/Cluster on which the neutron-l3 role is deployed

  5. All Control Node(s)/Cluster(s) on which neither of the following services is deployed: Database, RabbitMQ, and Keystone.

  6. For Swift:

    1. Storage Node on which the swift-proxy role is deployed

    2. Storage Node on which the swift-storage role is deployed

  7. For Ceph:

    1. Storage Node on which the ceph-radosgw and ceph-mds roles are deployed (if deployed on different nodes: in either order)

    2. Storage Node on which the ceph-osd role is deployed

    3. Storage Node on which the ceph-mon role is deployed

  8. Control Node/Cluster on which Keystone is deployed

  9. Control Node/Cluster on which RabbitMQ is deployed

  10. Control Node/Cluster on which the Database is deployed

11.3 Upgrading from SUSE OpenStack Cloud 6 to SUSE OpenStack Cloud 7

Upgrading from SUSE OpenStack Cloud 6 to SUSE OpenStack Cloud 7 can either be done via a Web interface or from the command line. Starting with SUSE OpenStack Cloud 7, a non-disruptive update is supported, when the requirements listed at Non-Disruptive Upgrade Requirements are met. The non-disruptive upgrade guarantees a fully functional SUSE OpenStack Cloud operation during the upgrade procedure. The only feature that is not supported during the non-disruptive upgrade procedure is the deployment of additional nodes.

If the requirements for a non-disruptive upgarde are not met, the upgrade procedure will be done in normal mode. When live-migration is set up, instances will be migrated to another node, before the respective Compute Node will get updated to ensure continuous operation. However, you will not be able to access instances during the upgrade of the Control Nodes.

11.3.1 Requirements

When starting the upgrade process, several checks are performed to determine whether the SUSE OpenStack Cloud is in an upgradeable state and whether a non-disruptive update would be supported:

General Upgrade Requirements
  • All nodes need to have the latest SUSE OpenStack Cloud 6 updates and the latest SUSE Linux Enterprise Server 12 SP2 updates installed. If this is not the case, refer to Section 9.4.1, “Deploying Node Updates with the Updater Barclamp” for instructions on how to update.

  • All allocated nodes need to be turned on and have to be in state ready.

  • All barclamp proposals need to have been successfully deployed. In case a proposal is in state failed, the upgrade procedure will refuse to start. Fix the issue or—if possible—remove the proposal.

  • In case the pacemaker barclamp is deployed, all clusters need to be in a healthy state.

  • The dns-server role must be applied to the Administration Server.

  • The following repositories need to be available on a server that is accessible from the Administration Server. The HA repositories are only needed if you have an HA setup. It is recommended to use the same server that also hosts the respective repositories of the current version.

    SUSE-OpenStack-Cloud-7-Pool
    SUSE-OpenStack-Cloud-7-Update
    SLES12-SP2-Pool
    SLES12-SP2-Update
    SLE-HA12-SP2-Pool (for HA setups only)
    SLE-HA12-SP2-Update (for HA setups only)

    Do not add these repositories to the SUSE OpenStack Cloud repository configuration, yet. This needs to be done during the upgrade procedure.

Non-Disruptive Upgrade Requirements
  • All Control Nodes need to be set up highly available.

  • Live-migration support needs to be configured and enabled for the Compute Nodes. The amount of free ressources (CPU and RAM) on the Compute Nodes needs to be sufficient to evacuate the nodes one by one.

11.3.2 Upgrading from the Web Interface

TO BE DONE

11.3.3 Upgrading from the Command line

The upgrade procedure on the command line is performed by using the program crowbarctl. For general help, run crowbarctl help. To get help on a certain subcommand, run crowbarctl COMMAND help.

To review the process of the upgrade procedure, you may call crowbarctl upgrade status at any time. Steps may have three states: pending, running, and passed.

  1. To start the upgrade procedure from the command line, log in to the Administration Server

  2. Perform the preliminary checks to determine whether the upgrade requirements are met:

    crowbarctl upgrade prechecks

    The command's result is shown in a table. Make sure the column Errors does not contain any entries. If not, make sure to fix the errors and restart the precheck command afterwards. Do not proceed before all checks are passed.

    crowbarctl upgrade prechecks
    +-------------------------------+--------+----------+--------+------+
    | Check ID                      | Passed | Required | Errors | Help |
    +-------------------------------+--------+----------+--------+------+
    | network_checks                | true   | true     |        |      |
    | cloud_healthy                 | true   | true     |        |      |
    | maintenance_updates_installed | true   | true     |        |      |
    | compute_status                | true   | false    |        |      |
    | ha_configured                 | true   | false    |        |      |
    | clusters_healthy              | true   | true     |        |      |
    +-------------------------------+--------+----------+--------+------+

    Depending on the outcome of the checks, it is automatically decided whether the upgrade procedure will continue in non-disruptive or in normal mode.

  3. Prepare the nodes by transitioning them into the upgrade state and stopping the chef daemon:

    crowbarctl upgrade prepare

    Depending of the size of your SUSE OpenStack Cloud deployment, this step may take some time. Use the command crowbarctl upgrade status to monitor the status of the process named steps.prepare.status. It needs to be in state passed before you proceed:

    crowbarctl upgrade status
    +--------------------------------+----------------+
    | Status                         | Value          |
    +--------------------------------+----------------+
    | current_step                   | backup_crowbar |
    | current_substep                |                |
    | current_node                   |                |
    | remaining_nodes                |                |
    | upgraded_nodes                 |                |
    | crowbar_backup                 |                |
    | openstack_backup               |                |
    | steps.prechecks.status         | passed         |
    | steps.prepare.status           | passed         |
    | steps.backup_crowbar.status    | pending        |
    | steps.repocheck_crowbar.status | pending        |
    | steps.admin.status             | pending        |
    | steps.database.status          | pending        |
    | steps.repocheck_nodes.status   | pending        |
    | steps.services.status          | pending        |
    | steps.backup_openstack.status  | pending        |
    | steps.nodes.status             | pending        |
    +--------------------------------+----------------+
  4. Create a backup of the existing Administration Server installation. In case something goes wrong during the upgrade procedure of the Administration Server you can restore the original state from this backup with the command crowbarctl backup restore NAME

    crowbarctl upgrade backup crowbar

    To list all existing backups including the one you have just created, run the following command:

    crowbarctl backup list
    +----------------------------+--------------------------+--------+---------+
    | Name                       | Created                  | Size   | Version |
    +----------------------------+--------------------------+--------+---------+
    | crowbar_upgrade_1486116507 | 2017-02-03T10:08:30.721Z | 209 KB | 3.0     |
    +----------------------------+--------------------------+--------+---------+
  5. This step prepares the upgrade of the Administration Server by checking the availability of the update and pool repositories for SUSE OpenStack Cloud 7 and SUSE Linux Enterprise Server 12 SP2. Run the following command:

    crowbarctl upgrade repocheck crowbar
    +---------------------------------+--------------------------------+
    | Status                          | Value                          |
    +---------------------------------+--------------------------------+
    | os.available                    | false                          |
    | os.repos                        | SLES12-SP2-Pool                |
    |                                 | SLES12-SP2-Updates             |
    | os.errors.x86_64.missing        | SLES12-SP2-Pool                |
    |                                 | SLES12-SP2-Updates             |
    | openstack.available             | false                          |
    | openstack.repos                 | SUSE-OpenStack-Cloud-7-Pool    |
    |                                 | SUSE-OpenStack-Cloud-7-Updates |
    | openstack.errors.x86_64.missing | SUSE-OpenStack-Cloud-7-Pool    |
    |                                 | SUSE-OpenStack-Cloud-7-Updates |
    +---------------------------------+--------------------------------+

    All four required repositories are reported as missing, because they have not yet been added to the Crowbar configuration. To add them to the Administration Server proceed as follows.

    Note that this step is for setting up the repositories for the Administration Server, not for the nodes in SUSE OpenStack Cloud (this will be done in a subsequent step).

    1. Start yast repositories and proceed with Continue. Replace the repositories SLES12-SP1-Pool and SLES12-SP1-Updates with the respective SP2 repositories.

      If you prefer to use zypper over YaST, you may alternatively make the change using zypper mr.

    2. Next, replace the SUSE-OpenStack-Cloud-6 update and pool repositories with the respective SUSE OpenStack Cloud 7 versions.

    Once the repository configuration on the Administration Server has been updated, run the command to check the repositories again. If the configuration is correct, the result should look like the following:

    crowbarctl upgrade repocheck crowbar
    +---------------------+--------------------------------+
    | Status              | Value                          |
    +---------------------+--------------------------------+
    | os.available        | true                           |
    | os.repos            | SLES12-SP2-Pool                |
    |                     | SLES12-SP2-Updates             |
    | openstack.available | true                           |
    | openstack.repos     | SUSE-OpenStack-Cloud-7-Pool    |
    |                     | SUSE-OpenStack-Cloud-7-Updates |
    +---------------------+--------------------------------+
  6. Now that the repositories are available, the Administration Server itself will be upgraded. The update will run in the background using zypper dup. Once all packages have been upgraded, the Administration Server will be rebooted and you will be logged out. To start the upgrade run:

    crowbarctl upgrade admin
  7. Starting with SUSE OpenStack Cloud 7, Crowbar uses a PostgreSQL database to store its data. With this step, the database is created on the Administration Server. Alternatively a database on a remote host can be used.

    To create the database on the Administration Server proceed as follows:

    1. Login to the Administration Server.

    2. To create the database on the Administration Server with the default credentials (crowbar/crowbar) for the database, run

      crowbarctl upgrade database new

      To use a different user name and password, run the following command instead:

      crowbarctl upgrade database new \
      --db-username=USERNAME --db-password=PASSWORD
    3. To connect to an existing PostgreSQL database, use the following command rather than creating a new database:

      crowbarctl upgrade database connect --db-username=USERNAME \
      --db-password=PASSWORD --database=DBNAME \
      --host=IP_or_FQDN --port=PORT
  8. After the Administration Server has been successfully updated, the Control Nodes and Compute Nodes will be upgraded. At first the availability of the repositories used to provide packages for the SUSE OpenStack Cloud nodes is tested.

    Note that the configuration for these repositories differs from the one for the Administration Server that was already done in a previous step. In this step the repository locations are made available to Crowbar rather than to libzypp on the Administration Server. To check the repository configuration run the following command:

    crowbarctl upgrade repocheck nodes
    +---------------------------------+--------------------------------+
    | Status                          | Value                          |
    +---------------------------------+--------------------------------+
    | ha.available                    | false                          |
    | ha.repos                        | SLES12-SP2-HA-Pool             |
    |                                 | SLES12-SP2-HA-Updates          |
    | ha.errors.x86_64.missing        | SLES12-SP2-HA-Pool             |
    |                                 | SLES12-SP2-HA- Updates         |
    | os.available                    | false                          |
    | os.repos                        | SLES12-SP2-Pool                |
    |                                 | SLES12-SP2-Updates             |
    | os.errors.x86_64.missing        | SLES12-SP2-Pool                |
    |                                 | SLES12-SP2-Updates             |
    | openstack.available             | false                          |
    | openstack.repos                 | SUSE-OpenStack-Cloud-7-Pool    |
    |                                 | SUSE-OpenStack-Cloud-7-Updates |
    | openstack.errors.x86_64.missing | SUSE-OpenStack-Cloud-7-Pool    |
    |                                 | SUSE-OpenStack-Cloud-7-Updates |
    +---------------------------------+--------------------------------+

    To update the locations for the listed repositories, start yast crowbar and proceed as described in Section 7.4, “Repositories.

    Once the repository configuration for Crowbar has been updated, run the command to check the repositories again to determine, whether the current configuration is correct.

    crowbarctl upgrade repocheck nodes
    +---------------------+--------------------------------+
    | Status              | Value                          |
    +---------------------+--------------------------------+
    | ha.available        | true                           |
    | ha.repos            | SLE12-SP2-HA-Pool              |
    |                     | SLE12-SP2-HA-Updates           |
    | os.available        | true                           |
    | os.repos            | SLES12-SP2-Pool                |
    |                     | SLES12-SP2-Updates             |
    | openstack.available | true                           |
    | openstack.repos     | SUSE-OpenStack-Cloud-7-Pool    |
    |                     | SUSE-OpenStack-Cloud-7-Updates |
    +---------------------+--------------------------------+
    Important
    Important: Product Media Repository Copies

    To PXE boot new nodes, an additional SUSE Linux Enterprise Server 12 SP2 repository—a copy of the installation syste— is required. Although not required during the upgrade procedure, it is recommended to set up this directory now. Refer to Section 5.1, “Copying the Product Media Repositories” for details. If you had also copied the SUSE OpenStack Cloud 6 installation media (optional), you may also want to provide the SUSE OpenStack Cloud 7 the same way.

    Once the upgrade procedure has been successfully finished, you may delete the previous copies of the installation media in /srv/tftpboot/suse-12.1/x86_64/install and /srv/tftpboot/suse-12.1/x86_64/repos/Cloud.

  9. To ensure the status of the nodes does not change during the upgrade process, the majority of the OpenStack services will be stopped on the nodes now. As a result, the OpenStack API will no longer be accessible. The instances, however, will continue to run and will also be accessible. Run the following command:

    crowbarctl upgrade services

    This step takes a while to finish. Monitor the process by running crowbarctl upgrade status. Do not proceed before steps.services.status is set to passed.

  10. The last step before upgrading the nodes is to make a backup of the OpenStack PostgreSQL database. The database dump will be stored on the Administration Server and can be used to restore the database in case something goes wrong during the upgrade.

    crowbarctl upgrade backup openstack
  11. The final step of the upgrade procedure is upgrading the nodes. To start the process, enter:

    crowbarctl upgrade nodes all

    The upgrade process runs in the background and can be queried with crowbarctl upgrade status. Depending on the size of your SUSE OpenStack Cloud it may take several hours, especially when performing a non-disruptive update. In that case, the Compute Nodes are updated one-by-one after instances have been live-migrated to other nodes.

    Instead of upgrading all nodes you may also upgrade the Control Nodes first and individual Compute Nodes afterwards. Refer to crowbarctl upgrade nodes --help for details.

11.4 Upgrading to an HA Setup

There are a few issues to pay attention to when making an existing SUSE OpenStack Cloud deployment highly available (by setting up HA clusters and moving roles to these clusters). To make existing services highly available, proceed as follows. Note that moving to an HA setup cannot be done without SUSE OpenStack Cloud service interruption, because it requires OpenStack components to be restarted.

Important
Important: Teaming Network Mode is Required for HA

Teaming network mode is required for an HA setup of SUSE OpenStack Cloud. If you are planning to move your cloud to an HA setup at a later point in time, make sure to deploy SUSE OpenStack Cloud with teaming network mode from the beginning. Otherwise a migration to an HA setup is not supported.

  1. Make sure to have read the sections Section 1.5, “HA Setup” and Section 2.6, “High Availability” of this manual and taken any appropriate action.

  2. Make the HA repositories available on the Administration Server as described in Section 5.2, “Update and Pool Repositories”. Run the command chef-client afterward.

  3. Set up your cluster(s) as described in Section 10.1, “Deploying Pacemaker (Optional, HA Setup Only)”.

  4. To move a particular role from a regular control node to a cluster, you need to stop the associated service(s) before re-deploying the role on a cluster:

    1. Log in to each node on which the role is deployed and stop its associated service(s) (a role can have multiple services). Do so by running the service's start/stop script with the stop argument, for example:

      rcopenstack-keystone stop

      See Appendix C, Roles and Services in SUSE OpenStack Cloud for a list of roles, services and start/stop scripts.

    2. The following roles need additional treatment:

      database-server (Database barclamp)
      1. Stop the database on the node the Database barclamp is deployed with the command:

        rcpostgresql stop
      2. Copy /var/lib/pgsql to a temporary location on the node, for example:

        cp -ax /var/lib/pgsql /tmp
      3. Redeploy the Database barclamp to the cluster. The original node may also be part of this cluster.

      4. Log in to a cluster node and run the following command to determine which cluster node runs the postgresql service:

        crm_mon -1
      5. Log in to the cluster node running postgresql.

      6. Stop the postgresql service:

        crm resource stop postgresql
      7. Copy the data backed up earlier to the cluster node:

        rsync -av --delete
                   NODE_WITH_BACKUP:/tmp/pgsql/ /var/lib/pgsql/
      8. Restart the postgresql service:

        crm resource start postgresql

      Copy the content of /var/lib/pgsql/data/ from the original database node to the cluster node with DRBD or shared storage.

      keystone-server (Keystone barclamp)

      If using Keystone with PKI tokens, the PKI keys on all nodes need to be re-generated. This can be achieved by removing the contents of /var/cache/*/keystone-signing/ on the nodes. Use a command similar to the following on the Administration Server as root:

      for NODE in NODE1
               NODE2 NODE3; do
        ssh $NODE rm /var/cache/*/keystone-signing/*
      done
  5. Go to the barclamp featuring the role you want to move to the cluster. From the left side of the Deployment section, remove the node the role is currently running on. Replace it with a cluster from the Available Clusters section. Then apply the proposal and verify that application succeeded via the Crowbar Web interface. You can also check the cluster status via Hawk or the crm / crm_mon CLI tools.

  6. Repeat these steps for all roles you want to move to cluster. See Section 2.6.2.1, “Control Node(s)—Avoiding Points of Failure” for a list of services with HA support.

Important
Important: SSL Certificates

Moving to an HA setup also requires to create SSL certificates for nodes in the cluster that run services using SSL. Certificates need to be issued for the generated names (see Important: Proposal Name) and for all public names you have configured in the cluster.

Important
Important: Service Management on the Cluster

After a role has been deployed on a cluster, its services are managed by the HA software. You must never manually start or stop an HA-managed service or configure it to start on boot. Services may only be started or stopped by using the cluster management tools Hawk or the crm shell. See http://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_config_basics_resources.html for more information.

11.5 Backing Up and Restoring the Administration Server

Backing Up and Restoring the Administration Server can either be done via the Crowbar Web interface or on the Administration Server's command line via the crowbarctl backup command. Both tools provide the same functionality.

11.5.1 Backup and Restore via the Crowbar Web interface

To use the Web interface for backing up and restoring the Administration Server, go to the Crowbar Web interface on the Administration Server, for example http://192.168.124.10/. Log in as user crowbar. The password is crowbar by default, if you have not changed it. Go to Utilities › Backup & Restore.

Backup and Restore: Initial Page View
Figure 11.1: Backup and Restore: Initial Page View

To create a backup, click the respective button. Provide a descriptive name (allowed characters are letters, numbers, dashes and underscores) and confirm with Create Backup. Alternatively, you can upload a backup, for example from a previous installation.

Existing backups are listed with name and creation date. For each backup, three actions are available:

Download

Download a copy of the backup file. The TAR archive you receive with this download can be uploaded again via Upload Backup Image.

Restore

Restore the backup.

Delete

Delete the backup.

Backup and Restore: List of Backups
Figure 11.2: Backup and Restore: List of Backups

11.5.2 Backup and Restore via the Command Line

Backing up and restoring the Administration Server from the command line can be done with the command crowbarctl backup. For getting general help, run the command crowbarctl --help backup, help on a subcommand is available by running crowbarctl SUBCOMMAND --help. The following commands for creating and managing backups exist:

crowbarctl backup create NAME

Create a new backup named NAME. It will be stored at /var/lib/crowbar/backup.

crowbarctl backup [--yes] NAME

Restore the backup named NAME. You will be asked for confirmation before any existing proposals will get overwritten. If using the option --yes, confirmations are tuned off and the restore is forced.

crowbarctl backup delete NAME

Delete the backup named NAME.

crowbarctl backup download NAME [FILE]

Download the backup named NAME. If you specify the optional [FILE], the download is written to the specified file. Otherwise it is saved to the current working directory with an automatically generated file name. If specifying - for [FILE], the output is written to STDOUT.

crowbarctl backup list

List existing backups. You can optionally specify different output formats and filters—refer to crowbarctl backup list --help for details.

crowbarctl backup upload FILE

Upload a backup from FILE.

Print this page