Administration and Operations Guide¶
This section has information on the administration and operation of SUSE Containerized Openstack.
Using run.sh¶
The primary means for running deployment, update, and cleanup actions in SUSE Containerized OpenStack is run.sh, a bash script that acts as a convenient wrapper around Ansible playbook execution. All of the commands below should be run from the root of the socok8s directory.
Deployment Actions¶
The run.sh deploy
command:
- Performs all necessary setup actions
- Deploys all Airship UCP components and OpenStack services
- Configures the inventory, extravars file, and appropriate environment variables as described in the Deployment Guide
./run.sh deploy
It may be desirable to redeploy only OpenStack services while leaving all Airship components in the UCP untouched. In these use cases, run:
./run.sh update_openstack
Cleanup Actions¶
In addition to deployment, run.sh can be used to perform environment cleanup actions.
To clean up the deployment and remove SUSE Containerized OpenStack entirely, run the following command in the root of the socok8s directory:
./run.sh remove_deployment
This will delete all Helm releases, all Kubernetes resources in the ucp
and
openstack
namespaces, and all persistent volumes that were provisioned for
use in the deployment. After this operation is complete, only the original
Kubernetes services deployed by the SUSE CaaS Platform will remain.
Testing¶
The run.sh script also has an option to deploy and run OpenStack Tempest tests. To begin testing, review Verify OpenStack Operation in the deployment section and then run the following command:
./run.sh test
Note
Please read the Deployment Guide for further information about configuring and running OpenStack Tempest tests in SUSE Containerized OpenStack.
Scaling in/out¶
Adding or removing compute nodes¶
To add a compute node, the node must be running SUSE CaaS Platform v3.0, has been accepted into the cluster and bootstrapped using the Velum dashboard. After the node is bootstrapped, add its host details to the “airship-openstack-compute-workers” group in your inventory in ${WORKSPACE}/inventory/hosts.yaml. Run the following command from the root of the socok8s directory:
./run.sh add_openstack_compute
Note
Multiple new compute nodes can be added to the inventory at the same time.
It can take a few minutes for the new host to initialize and show in the OpenStack hypervisor list.
To remove a compute node, run the following command from the root of the socok8s directory:
./run.sh remove_openstack_compute ${NODE_HOSTNAME}
Note
NODE_HOSTNAME must be same as host name in ansible inventory.
Compute nodes must be removed individually. When the node has been successfully removed, the host details must be manually removed from “airship-openstack-compute-workers” group in the inventory.
Control plane horizontal scaling¶
SUSE Containerized OpenStack provides two built-in scale profiles:
- minimal, the default profile, deploys a single Pod for each service
- ha deploys a minimum of two Pods for each service. Three or more Pods are suggested for services that will be heavily utilized or require a quorum.
Change scale profiles by adding a “scale_profile” key to ${WORKSPACE}/env/extravars and specifying a profile value:
scale_profile: ha
The built-in profiles are defined in playbooks/roles/airship-deploy-ucp/files/profiles and can be modified to suit custom use cases. Additional profiles can be created and added to this directory following the file naming convention in that directory.
We recommend using at least three controller nodes for a highly available control plane for both Airship and OpenStack services. To add new controller nodes, the nodes must:
- be running SUSE CaaS Platform v3.0
- have been accepted into the cluster
- be bootstrapped using the Velum dashboard.
After the nodes are bootstrapped, add the host entries to the ‘airship-ucp-workers’, ‘airship-openstack-control-workers’, airship-openstack-l3-agent-workers, and ‘airship-kube-system-workers’ groups in your Ansible inventory in ${WORKSPACE}/inventory/hosts.yaml.
To apply the changes, run the following command from the root of the socok8s directory:
./run.sh deploy
Updates¶
SUSE Containerized OpenStack is delivered as an RPM package. Generally it can be updated by updating the RPM package to the latest version and redeploying with the necessary steps in the Deployment Guide. This is the typical update path and will incorporate all recent changes. It will also automatically update component chart and image versions.
It is also possible to update services and components directly using the procedures below.
Updating OpenStack Version¶
To make a global change to the OpenStack version used by all component images, create a key in ${WORKSPACE}/env/extravars called “suse_openstack_image_version” and set it to the desired value. For example, to use the “stein” version, add the following line to the extravars file:
suse_openstack_image_version: "stein"
It is also possible to update an individual image or subset of images to a different version rather than making a global change. To do this, it is necessary to manually edit the versions.yaml file located in socok8s/site/soc/software/config/. Locate the images to be changed in the “images” section of the file and modify the line to include the desired version. For example, to use the “stein” version for the heat_api image, change the following line in versions.yaml from
heat_api: "{{ suse_osh_registry_location }}/openstackhelm/heat:{{ suse_openstack_image_version }}"
to
heat_api: "{{ suse_osh_registry_location }}/openstackhelm/heat:stein"
Updating OpenStack Service Configuration¶
Certain use cases may require the addition or modification of OpenStack service configuration parameters. To update the configuration for a particular service, parameters can be added or modified in the ‘conf’ section of that service’s chart. For example, to change the logging level of the Keystone service to ‘debug’, locate the ‘conf’ section of the Keystone chart located at socok8s/site/soc/software/charts/osh/openstack-keystone/keystone.yaml and add the following lines, beginning with the ‘logging’ key:
conf:
logging:
logger_root:
level: DEBUG
logger_keystone:
level: DEBUG
Note
Information about the supported configuration parameters for each service can generally be found in the OpenStack Configuration Guides for each release, but determining the correct keys and values to include in each service’s chart may require examining the OpenStack Helm chart’s values.yaml file. In the above Keystone logging example, the names and proper locations for the logging keys were determined by studying the ‘logging’ section in /opt/openstack/openstack-helm/keystone/values.yaml, then copying those keys to socok8s/site/soc/software/charts/osh/openstack-keystone/keystone.yaml and providing the desired values.
Once the desired parameters have been added to each chart requiring changes, the configuration updates can be applied by changing to the root of the socok8s directory and running
./run.sh update_openstack
Updating Individual Images and Helm Charts¶
The versions.yaml file can also be used for more advanced update configurations such as using a specific image or Helm chart source version.
Note
Changing the image registry location from its default value or using a custom or non-default image will void any product support by SUSE.
To specify the use of an updated or customized image, locate the appropriate image name in socok8s/site/soc/software/config/versions.yaml and modify the line to include the desired image location and tag. For example, to use a new heat_api image, modify its entry with the new image location:
heat_api: "registry_location/image_directory/image_name:tag"
Similarly, the versions.yaml file can be used to retrieve a specific version of any Helm chart being deployed. To do so, it is necessary to provide a repository location, type, and a reference. The reference can be a branch, commit ID, or a reference in the repository, and will default to “master” if not specified. As an example, to use a specific version of the Helm chart for Heat, add the following information to the “osh” section under “charts”:
heat:
location: https://git.openstack.org/openstack/openstack-helm
reference: ${REFERENCE}
subpath: heat
type: git
Note
When specifying a particular version of a Helm chart, it may be necessary to first create the appropriate subsection under “charts”. Airship components such as Deckhand and Shipyard belong under “ucp”, OpenStack services belong under “osh”, and infrastructure components belong under “osh_infra”.
Reboot Compute Host¶
Before reboot compute host, shutdown all Nova VM(s) from that compute host.
After reboot the compute host, it is possible that the pods when started come up out of order. If this happens, you might see symptoms of the Nova VM(s) not getting an ip address. To address this problem, run the following commands:
kubectl get pods -o wide | grep ovs-agent | grep <compute name>
kubectl delete pod -n openstack <ovs-agent pod name>
This should restart the Neutron OVS agent pod and reconfigure the vxlan tunnel network configuration.
Troubleshooting¶
Viewing Shipyard Logs¶
The deployment of OpenStack components in SUSE Containerized OpenStack is directed by Shipyard, the Airship platform’s directed acyclic graph (DAG) controller, so Shipyard is one of the best places to begin troubleshooting deployment problems. The Shipyard CLI client authenticates with Keystone, so the following environment variables must be set before running any commands:
export OS_USERNAME=shipyard
export OS_PASSWORD=$(kubectl get secret -n ucp shipyard-keystone-user \
-o json | jq -r '.data.OS_PASSWORD' | base64 -d)
Note
The Shipyard user’s password can be obtained from the contents of ${WORKSPACE}/secrets/ucp_shipyard_keystone_password
The following commands are run from the /opt/airship/shipyard/tools directory. If no Shipyard image is found when the first command is executed, it is downloaded automatically.
To view the status of all Shipyard actions, run:
./shipyard.sh get actions
Example output:
Name Action Lifecycle Execution Time Step Succ/Fail/Oth Footnotes
update_software action/01D9ZSVG70XS9ZMF4Z6QFF32A6 Complete 2019-05-03T21:33:27 13/0/1 (1)
update_software action/01DAB3ETP69MGN7XHVVRHNPVCR Failed 2019-05-08T06:52:58 7/0/7 (2)
To view the status of the individual steps of a particular action, copy its action ID and run the following command:
./shipyard.sh describe action/01DAB3ETP69MGN7XHVVRHNPVCR
Example output:
Name: update_software
Action: action/01DAB3ETP69MGN7XHVVRHNPVCR
Lifecycle: Failed
Parameters: {}
Datetime: 2019-05-08 06:52:55.366919+00:00
Dag Status: failed
Context Marker: 18993f2c-1cfa-4d42-9320-3fbd70e75c21
User: shipyard
Steps Index State Footnotes
step/01DAB3ETP69MGN7XHVVRHNPVCR/action_xcom 1 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/dag_concurrency_check 2 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/deployment_configuration 3 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/validate_site_design 4 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_build 5 failed
step/01DAB3ETP69MGN7XHVVRHNPVCR/decide_airflow_upgrade 6 None
step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_get_status 7 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_post_apply 8 upstream_failed
step/01DAB3ETP69MGN7XHVVRHNPVCR/skip_upgrade_airflow 9 upstream_failed
step/01DAB3ETP69MGN7XHVVRHNPVCR/upgrade_airflow 10 None
step/01DAB3ETP69MGN7XHVVRHNPVCR/deckhand_validate_site_design 11 success
step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_validate_site_design 12 upstream_failed
step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_get_releases 13 failed
step/01DAB3ETP69MGN7XHVVRHNPVCR/create_action_tag 14 None
To view the logs from a particular step such as armada_build, which has failed in the above example, run:
./shipyard.sh logs step/01DAB3ETP69MGN7XHVVRHNPVCR/armada_build
Viewing Logs From Kubernetes Pods¶
To view the logs from any Pod in the Running or Completed state, run
kubectl logs -n ${NAMESPACE} ${POD_NAME}
To view logs from a specific container within a Pod in the Running or Completed state, run:
kubectl logs -n ${NAMESPACE} ${POD_NAME} -c ${CONTAINER_NAME}
If logs cannot be retrieved due to the Pod entering the Error or CrashLoopBackoff state, it may be necessary to use the -p option to retrieve logs from the previous instance:
kubectl logs -n ${NAMESPACE} ${POD_NAME} -p
Recover Controller Host Node¶
If deployment failed with error of controller host not reachable ( has entered maintenance mode )
Go to maintenance mode on controller host and run following commands:
mounted_snapshot=$(mount | grep snapshot | gawk 'match($6, /ro.*@\/.snapshots\/(.*)\/snapshot/ , arr1 ) { print arr1[1] }')
btrfs property set -ts /.snapshots/$mounted_snapshot/snapshot ro false
mount -o remount, rw /
mkdir /var/lib/neutron
btrfs property set -ts /.snapshots/$mounted_snapshot/snapshot ro true
reboot
Recover Compute Host Node¶
If deployment failed with error of compute host not reachable ( has entered maintenance mode )
Go to maintenance mode on compute host and run following commands:
mounted_snapshot=$(mount | grep snapshot | gawk 'match($6, /ro.*@\/.snapshots\/(.*)\/snapshot/ , arr1 ) { print arr1[1] }')
btrfs property set -ts /.snapshots/$mounted_snapshot/snapshot ro false
mount -o remount, rw /
mkdir /var/lib/libvirt
mkdir /var/lib/nova
mkdir /var/lib/openstack-helm
mkdir /var/lib/neutron
btrfs property set -ts /.snapshots/$mounted_snapshot/snapshot ro true
reboot
Recovering from Node Failure¶
Kubernetes clusters are generally able to recover from node failures by performing a number of self-healing actions, but it may be necessary to manually intervene occasionally. Recovery actions vary depending on the type of failure. Some common scenarios and their solutions are outlined below.
Pod Status of NodeLost or Unknown¶
If a large number of Pods show a status of NodeLost or Unknown, first determine which nodes may be causing the problem by running:
kubectl get nodes
If any of the nodes show a status of NotReady but they still respond to ping and can be accessed via SSH, it may be that either the kubelet or docker service has stopped running. This can be confirmed by checking the “Conditions” section for the message “Kubelet has stopped posting node status” after running:
kubectl describe node ${NODE_NAME}
Log into the affected nodes and check the status of these services by running:
systemctl status kubelet
systemctl status docker
If either service has stopped, start it by running:
systemctl start ${SERVICE_NAME}
Note
The kubelet service requires Docker to be running. So if both services are stopped, Docker should be restarted first.
These services should start automatically each time a node boots up and should be running at all times. If either service has stopped, examine the system logs to determine the root cause of the failure. This can be done by using the journalctl command:
journalctl -u kubelet
Frequent Pod Evictions¶
If Pods are frequently being evicted from a particular node, it may be a sign that the node is unhealthy and requires maintenance. Check that node’s conditions and events by running:
kubectl describe node ${NODE_NAME}
If the cause of the Pod evictions is determined to be resource exhaustion, such as NodeHasDiskPressure or NodeHasMemoryPressure, it may be necessary to remove the node from the cluster temporarily to perform maintenance. To gracefully remove all Pods from the affected node and mark it as not schedulable, run:
kubectl drain ${NODE_NAME}
After maintenance work is complete, the node can be brought back into the cluster by running:
kubectl uncordon ${NODE_NAME}
which will allow normal Pod scheduling operations to resume. If the node was decommissioned permanently while offline and a new node was brought into the CaaSP cluster as a replacement, it is not necessary to run the uncordon command. A new schedulable resource will be created automatically.
Kubernetes Operations¶
Kubernetes has documentation for troubleshooting typical problems with applications and clusters.
Tips and Tricks¶
Display all images used by a component¶
Using Neutron as an example:
kubectl get pods -n openstack -l application=neutron -o \
jsonpath="{.items[*].spec.containers[*].image}"|tr -s '[[:space:]]' '\n' \
| sort | uniq -c
Remove dangling Docker images¶
Useful after building local images:
docker rmi $(docker images -f "dangling=true" -q)
Setting the default context¶
To avoid having to pass “-n openstack” all the time:
kubectl config set-context $(kubectl config current-context) --namespace=openstack