VRealize Operations Manager Best practices Supplemental Guide Version 7.x SEPTEMBER 2018 VERSION 1.4
vRealize Operations Manager Best Practices Supplemental Guide Version 7.x S E P T E M B E R 2 0 1 8 V E R S I O N 1 . 4
Table of contents Introduction Best Practices Concepts Areas of Best practices Platform best practices Storage Approach General guidelines… 555666677 Architecture… High Availability(HA)…… Remote Collectors…. anders Deployment 899 Backup& Restore.… 144 Disaster Recovery Self-Monito API and Integration End Point Operations Manager Sizing. Deployment…… Content best practices 16 Metrics Alerts& Symptoms Review Out-Of-The-Box(OOTB) 16 Views and reports Super Metrics…… Polic Account and roles Maintenance Schedule vRealize Operations Manager Best Practices /2
vRealize Operations Manager Best Practices /2 Table of Contents Introduction......................................................................................................................................5 Best Practices Concepts...............................................................................................................5 Areas of Best Practices............................................................................................................5 Platform Best Practices....................................................................................................................6 Sizing...........................................................................................................................................6 Storage Approach ....................................................................................................................6 General Guidelines..................................................................................................................6 Architecture .................................................................................................................................7 High Availability (HA)............................................................................................................7 Remote Collectors...................................................................................................................8 Load Balancers........................................................................................................................8 Deployment .................................................................................................................................9 Upgrade .......................................................................................................................................9 Cluster ...................................................................................................................................11 Backup & Restore......................................................................................................................13 Backup...................................................................................................................................13 Restore...................................................................................................................................14 Disaster Recovery......................................................................................................................14 Self-Monitoring .........................................................................................................................14 API and Integration....................................................................................................................15 End Point Operations Manager..................................................................................................15 Sizing.....................................................................................................................................15 Deployment ...........................................................................................................................15 Content Best Practices....................................................................................................................16 Metrics.......................................................................................................................................16 Alerts & Symptoms ...................................................................................................................16 Review Out-Of-The-Box (OOTB) ........................................................................................16 Dashboards ................................................................................................................................17 Views and Reports.....................................................................................................................19 Views.....................................................................................................................................19 Reports...................................................................................................................................19 Super Metrics.............................................................................................................................19 Policies.......................................................................................................................................20 Account and Roles.....................................................................................................................20 Maintenance Schedule ...............................................................................................................20 Grouping....................................................................................................................................20
Work load Placement(WLP) Predictive Distributed Resource Scheduler (pDRS) Operations Best Practices SDDC Monitoring Additional best practices Documentation links vRealize Operations Manager Best Practices /3
vRealize Operations Manager Best Practices /3 Work Load Placement (WLP)....................................................................................................21 Predictive Distributed Resource Scheduler (pDRS)..................................................................21 Operations Best Practices...............................................................................................................22 SDDC Monitoring .....................................................................................................................22 Additional Best Practices...............................................................................................................23 Documentation Links.................................................................................................................23
Revision History DATE DESCRIPTION Updates with rEalize Op April 2018 13 Updates March 2018 Updates March 201 February 2017 Initial version vRealize Operations Manager Best Practices /4
vRealize Operations Manager Best Practices /4 Revision History DATE VERSION DESCRIPTION September 2018 1.4 Updates with vRealize Operations Manager 7.0 April 2018 1.3 Updates March 2018 1.2 Updates March 2017 1.1 Updates February 2017 1.0 Initial version
Introduction This document describes the best practices and recommendations for VMware vRealize Operations Manager. This document is not an installation guide, but a guide that supplements the vRealize Operations Manager installation and documentation available in the v realize There are additional best practices outlined in the product documentation, therefore, existing information may not be displayed in this document. Please refer to the product documentation for additional best practices This information is for the following products and versions. PRODUCT VERSION DOCUMENTATION vRealize Operations Manager.6,6.7,7.0 https://docs.vmware.com/en/vrealize-operations-manager/index.html Best Practices Concepts This document provides information based on development, test, field, and customer interaction. Each environment is unique and the way vRealize Operations Manager is used may vary; hence, this information provides general principles or techniques that, when applied, will produce results that are superior to those achieved by other means or by standar n certain cases, it may not be practical to apply best practice methods nor is there a requirement to use all best practices available. The area of best practice should be applied appropriately based on the environment, the user and the way that rEalize Operations Manager is being used Following are the advantages of applying best practices with rEalize Operations Manager Proven results insistency · Greater Stability Areas of Best Practices Applying best practices for vRealize Operations Manager focuses on three key areas: · Platform( product The technical portion of the product, which includes architecture and sizing, deployment, cluster, high availability, emote collector, APl, interoperability and integration, backup& restore, and disaster recovery · Content(product The functional part of the product, meaning the content that""sits on"the platform. Content includes policies, dashboards, alerts, reports, super metrics, groups, and actions The how you use the product in your operations. This includes working with other roles in Operations(e.g. NOC Storage, and Management). Examples of Operations are processes, roles, groups, tenants vRealize Operations Manager Best Practices/5
vRealize Operations Manager Best Practices /5 Introduction This document describes the best practices and recommendations for VMware vRealize Operations Manager. This document is not an installation guide, but a guide that supplements the vRealize Operations Manager installation and configuration documentation available in the vRealize Operations Manager Documentation Center. There are additional best practices outlined in the product documentation; therefore, existing information may not be displayed in this document. Please refer to the product documentation for additional best practices. This information is for the following products and versions. PRODUCT VERSION DOCUMENTATION vRealize Operations Manager 6.6, 6.7, 7.0 https://docs.vmware.com/en/vRealize-Operations-Manager/index.html Best Practices Concepts This document provides information based on development, test, field, and customer interaction. Each environment is unique and the way vRealize Operations Manager is used may vary; hence, this information provides general principles or techniques that, when applied, will produce results that are superior to those achieved by other means or by standard use. In certain cases, it may not be practical to apply best practice methods nor is there a requirement to use all best practices available. The area of best practice should be applied appropriately based on the environment, the user and the way that vRealize Operations Manager is being used. Following are the advantages of applying best practices with vRealize Operations Manager: • Proven Results • Consistency • Enhanced Performance • Improved usability • Greater Stability Areas of Best Practices Applying best practices for vRealize Operations Manager focuses on three key areas: • Platform (product) The technical portion of the product, which includes architecture and sizing, deployment, cluster, high availability, remote collector, API, interoperability and integration, backup & restore, and disaster recovery. • Content (product) The functional part of the product, meaning the content that “sits on” the platform. Content includes policies, dashboards, alerts, reports, super metrics, groups, and actions. • Operations The how you use the product in your operations. This includes working with other roles in Operations (e.g. NOC, Storage, and Management). Examples of Operations are processes, roles, groups, tenants
Platform Best Practices The Platform is the technical portion of the product. The best practices applied here are to help provide the most optimal options for the platform to provide a stable running environment for daily operational use. Before deployment of vRealize Operations Manager, the first step is to size the environment. This section will cover sizing and backup& restore or disaster recovery. These best practices will help ensure that the platform, v Realize OperalonsRp deploying the product. Additional best practices are included for administration tasks such Manager, is properly sized, running and able to handle the monitored load efficiently Sizing Storage Approach Size the deployment with twelve to eighteen months of infrastructure growth When an environment outgrows the original deployment size, performance degradation and usability problems may become present. Planning for infrastructure growth of twelve to eighteen months will allow the system to continue functioning without the need to immediately resize or scale out the deployment. For example, if you nticipate a 10%annual growth, increase the initial size by 15% to obtain an eighteen-month sizing Review the sizing guidelines frequently and often during the growth of the environment(resizing To keep the environment running with optimal parameters, it is important to review the sizing guidelines and resize the deployment if necessary. Even with expected growth, reviewing the sizing guidelines regularly will proactively prevent performance and usability problems typically associated with undersized environments General Guidelin Validate the sizing guidelines with your actual environment The sizing guidelines provide general estimates and requires confirmation with the actual environment. For example, the data entered into the sizing calculator may yield additional objects not captured in the actual Calculate only the components which will be monitored It is possible that some components do not need to be monitored; therefore, exclude those components in the sizing calculations Size the cluster There are multiple sizes for analytics nodes, extra small, small, medium, large and extra-large. It is best to use the least number of nodes when possible. For example, if the recommendation is to have 10 large nodes or 4 ex large nodes, use the lesser extra-large nodes to minimize communication across more nodes Size the remote collectors There are two sizes for default remote collectors, standard and large. Use the correct size remote collector based nvironment The default setting for data retention is six months. If three months is all that is needed, lower the default value. nderstand what you gain when using long data retention periods. It may not necessarily help having longer tention periods. Depending on your deployments needs, configure the retention period to suit your requirements vRealize Operations Manager Best Practices /6
vRealize Operations Manager Best Practices /6 Platform Best Practices The Platform is the technical portion of the product. The best practices applied here are to help provide the most optimal options for the platform to provide a stable running environment for daily operational use. Before deployment of vRealize Operations Manager, the first step is to size the environment. This section will cover sizing and recommendations after deploying the product. Additional best practices are included for administration tasks such as backup & restore or disaster recovery. These best practices will help ensure that the platform, vRealize Operations Manager, is properly sized, running and able to handle the monitored load efficiently. Sizing Storage Approach • Size the deployment with twelve to eighteen months of infrastructure growth When an environment outgrows the original deployment size, performance degradation and usability problems may become present. Planning for infrastructure growth of twelve to eighteen months will allow the system to continue functioning without the need to immediately resize or scale out the deployment. For example, if you anticipate a 10% annual growth, increase the initial size by 15% to obtain an eighteen-month sizing recommendation. • Review the sizing guidelines frequently and often during the growth of the environment (resizing) To keep the environment running with optimal parameters, it is important to review the sizing guidelines and resize the deployment if necessary. Even with expected growth, reviewing the sizing guidelines regularly will proactively prevent performance and usability problems typically associated with undersized environments. General Guidelines • Validate the sizing guidelines with your actual environment The sizing guidelines provide general estimates and requires confirmation with the actual environment. For example, the data entered into the sizing calculator may yield additional objects not captured in the actual environment or vice versa. • Calculate only the components which will be monitored It is possible that some components do not need to be monitored; therefore, exclude those components in the sizing calculations. • Size the Cluster There are multiple sizes for analytics nodes, extra small, small, medium, large and extra-large. It is best to use the least number of nodes when possible. For example, if the recommendation is to have 10 large nodes or 4 extralarge nodes, use the lesser extra-large nodes to minimize communication across more nodes. • Size the Remote Collectors There are two sizes for default remote collectors, standard and large. Use the correct size remote collector based on collected data. If necessary, use multiple remote collectors to ensure proper sizing of remote collectors for the environment. • Adjust the time series data retention to keep data for a timeline which data is truly needed The default setting for data retention is six months. If three months is all that is needed, lower the default value. Understand what you gain when using long data retention periods. It may not necessarily help having longer retention periods. Depending on your deployments needs, configure the retention period to suit your requirements
Consider additional storage and io requirements for longer data retention For those times when longer data retention periods are required, consider additional storage and increased I0 requirements. For example, retail businesses may need to keep more than one year to account for seasonal peaks Leverage the additional time series retention to keep longer historical data while minimizing the time series data The default setting for additional time series retention is thirty-Six mon Adjust the default value to a necessary Only install Management Packs that are available on the VMware Solution Exchange There are several management packs available for vRealize Operations Manager, however, only management packs certified and supported by VMware are available on the VMware Solution Exchange Before adding Management Packs, verify the additional metrics they will providing The metric name may look correct but may not al ways mean it is what you want. Be sure that the metrics from management packs are what you really need and used properly; otherwise, disable unnecessary metrics Architecture High Availability(HA) Understand what HA provides(or does not provide) before enabling(or disabling) Enabling HA may require double the resources, as data is stored redundantly in two nodes as opposed to only on one node when Ha is disabled. Since the data is being stored in two nodes, this limits the total capacity by 50% For example, a deployment of 6 extra-large nodes will support the maximum number of objects rEalize HA DIsabled HA Enabled 6.6 180.000 240.000 120,000 7.0 HA will allow losing only one data node for the cluster to remain functional. It is important to understand and Enable ha only after all nodes in the cluster have been added and are online fit the appropriate sizing and then enable HA. If adding new data nodes to an existing cluster, add as many dal o dd all data nodes to the cluster before enabling HA. On new deployments, add data nodes to build the cluster nodes as necessary, then enable HA. The goal is to minimize the number of times for enabling HA; the process to enable Ha can be very disruptive so perform only when necessary Deploy analytics cluster nodes on separate hosts for redundancy and isolation If possible, establish a 1: I mapping for nodes to hosts. This will protect the cluster if one host goes down, then vRealize Operations Manager Best Practices/
vRealize Operations Manager Best Practices /7 • Consider additional storage and IO requirements for longer data retention For those times when longer data retention periods are required, consider additional storage and increased IO requirements. For example, retail businesses may need to keep more than one year to account for seasonal peaks. • Leverage the additional time series retention to keep longer historical data while minimizing the time series data retention period. The default setting for additional time series retention is thirty-six months. Adjust the default value to a necessary period and lower the time series data retention period to save on the amount of data being retained. • Only install Management Packs that are available on the VMware Solution Exchange There are several management packs available for vRealize Operations Manager; however, only management packs certified and supported by VMware are available on the VMware Solution Exchange. • Before adding Management Packs, verify the additional metrics they will providing The metric name may look correct but may not always mean it is what you want. Be sure that the metrics from management packs are what you really need and used properly; otherwise, disable unnecessary metrics. Architecture High Availability (HA) • Understand what HA provides (or does not provide) before enabling (or disabling) Enabling HA may require double the resources, as data is stored redundantly in two nodes as opposed to only on one node when HA is disabled. Since the data is being stored in two nodes, this limits the total capacity by 50%. For example, a deployment of 6 extra-large nodes will support the maximum number of objects: vRealize Operations Manager HA Disabled HA Enabled 6.6 180,000 90,000 6.7 240,000 120,000 7.0 240,000 120,000 • HA will allow losing only one data node for the cluster to remain functional. It is important to understand and weigh the cost of the extra resources to the benefits that HA provides. • Enable HA only after all nodes in the cluster have been added and are online Add all data nodes to the cluster before enabling HA. On new deployments, add data nodes to build the cluster to fit the appropriate sizing and then enable HA. If adding new data nodes to an existing cluster, add as many data nodes as necessary, then enable HA. The goal is to minimize the number of times for enabling HA; the process to enable HA can be very disruptive so perform only when necessary. • Deploy analytics cluster nodes on separate hosts for redundancy and isolation If possible, establish a 1:1 mapping for nodes to hosts. This will protect the cluster if one host goes down, then
only one node is lost and the cluster remains functional. If it is not possible to establish a 1: I mapping for nodes to host, make sure to separate the master node and master replica node on different hosts. This will safeguard the cluster if one of these hosts were to go down. Use anti-affinity rules that keep nodes on specific hosts in the vSphere cluster To keep nodes separately on different hosts, use anti-affinity rules to prevent grouping of nodes on specific hosts The idea is to prevent multiple nodes from going down if hosted on one node Roles may change for nodes so statically naming a node a specific name may be confusing. For example, a node named" may no longer be the actual master node after promoting the replica node. This will avoid user confusion associated with poor naming convention. Ha is not a Disaster Recovery(dr)strategy Hsed os e ealize hopera tons Ma nager is not a disaster revere mechanism sh a separate a wis alton me st ontinue running if either the master node, the replica node or one data node fails. The entire cluster does not recover if multiple nodes fail at the same time · Hosts need to be on th For performance and consistency, use of the same storage is required Remote Collectors Consider using Remote Collectors for local collections with larger vEnters(7K objects) Using remote collectors will help to reduce bandwidth across data centers and reduce the load on the rEalize Operations Manager analytics cluster Create collector groups when using multiple Remote Collectors When utilizing multiple remote collectors for one vCenter, create a collector group to provide high availability and Deploy or update Remote Collectors to the same version of the Analytics nodes Do not utilize mixed versions of Remote Collectors and Analytics nodes. Not only is a cluster running mixed versions unsupported, it may exhibit potential problems Use Remote Collectors when using End Point Operations Manager(EPOps)agents Use remote collectors to isolate collection from End Point Operations Manager agents and reduce the load on the Remote Collectors based on number of collecting objects/metrics remote collectors using the default sizing of standard and large nodes to accommodate the number of objects Remote Collectors are necessary to be included in the backup strategy nclude all remote collectors when taking a backup to restore the entire cluster health. Load Balancers Use load balancers to provide a single Ul entry for users vRealize Operations Manager Best Practices/s
vRealize Operations Manager Best Practices /8 only one node is lost and the cluster remains functional. If it is not possible to establish a 1:1 mapping for nodes to host, make sure to separate the master node and master replica node on different hosts. This will safeguard the cluster if one of these hosts were to go down. • Use anti-affinity rules that keep nodes on specific hosts in the vSphere cluster To keep nodes separately on different hosts, use anti-affinity rules to prevent grouping of nodes on specific hosts. The idea is to prevent multiple nodes from going down if hosted on one node. • Name nodes independent of role Roles may change for nodes so statically naming a node a specific name may be confusing. For example, a node named ‘Master’ may no longer be the actual master node after promoting the replica node. This will avoid user confusion associated with poor naming convention. • HA is not a Disaster Recovery (DR) strategy HA for vRealize Operations Manager is not a disaster recovery mechanism so a separate DR solution must be used. See https://www.vmware.com/support/pubs/vmware-vrealize-suite-pubs.html . HA will allow the cluster to continue running if either the master node, the replica node or one data node fails. The entire cluster does not recover if multiple nodes fail at the same time. • Hosts need to be on the same storage For performance and consistency, use of the same storage is required. Remote Collectors • Consider using Remote Collectors for local collections with larger vCenters (>7K objects) Using remote collectors will help to reduce bandwidth across data centers and reduce the load on the vRealize Operations Manager analytics cluster. • Create collector groups when using multiple Remote Collectors When utilizing multiple remote collectors for one vCenter, create a collector group to provide high availability and redundancy. • Deploy or update Remote Collectors to the same version of the Analytics nodes Do not utilize mixed versions of Remote Collectors and Analytics nodes. Not only is a cluster running mixed versions unsupported, it may exhibit potential problems. • Use Remote Collectors when using End Point Operations Manager (EPOps) agents Use remote collectors to isolate collection from End Point Operations Manager agents and reduce the load on the vRealize Operations Manager analytics cluster. • Size Remote Collectors based on number of collecting objects/metrics Size remote collectors using the default sizing of standard and large nodes to accommodate the number of objects and metrics, which it will be collecting. • Remote Collectors are necessary to be included in the backup strategy Include all remote collectors when taking a backup to restore the entire cluster health. Load Balancers • Use load balancers to provide a single UI entry for users
Use of a load balancer to provide multiple users a single URL for accessing the vRealize Operations Manager Use a load balance to group multiple remote collectors when using End Point Operations Manager agents to igh availability and redundan Depl The Windows installer is no longer an available deployment option after rEalize Operations Manager 6.4. The RHEL installer is available in vRealize Operations Manager 6.5 but is now deprecated in vRealize Operations Manager 6.6. There is no migration path from either Windows or RhEL to the Virtual Appliance Do not modify or install third party applications on the appliance When using the virtual appliance, installation or modifications of third party applications is unsupported and may ause problems to rEalize Operations Manager · Deploy the Va with FQDN for the vRealize Operations Manager node. Simply using hostname may communication problems with the node Use Thick Provisioning Eager Zeroed When deploying nodes, set disk provisioning to"Thick Provision Eager Zeroed for most optimum performance When deploying Medium size nodes, increase the VM hardware level The default hardware is set to"7 and limits the number of vCPUs per node. To increase the number of vCPUs when scaling a medium node to a large node, as example, the hw level must be set to a higher value Leverage remote collectors Use remote collectors where possible to navigate firewalls, reduce bandwidth across data centers, connect to remote data sources, or reduce the load on the vRealize Operations Manager analytics cluste Upgrade If upgrading to rEalize Operations Manager 6.7 or vRealize Operations Manager 7.0, run the appropriate versioned Pre-Upgrade Assessment Tool on your current vRealize Operations Manager before performing the grade to view the possible impact on your custom content and to plan appropriate maintenance efforts for Seehttps://www.vmware.com/products/vrealize-operations/upgrade-center.html Ensure the environment is fully functional before starting an upgrade. It is recommended to make a list of what works(or does not work ) to confirm the same functionality post upgrade Customized content should be backed up and saved for any potential overwrites or losses during upgrade Snapshot VMs with cluster offline before upgrading vRealize Operations Manager Best Practices/s
vRealize Operations Manager Best Practices /9 Use of a load balancer to provide multiple users a single URL for accessing the vRealize Operations Manager cluster alleviates the need for users to remember logging into separate node names and accessing specific nodes. • Use load balancers to provide high availability for remote collectors with End Point Operations Manager agents Use a load balance to group multiple remote collectors when using End Point Operations Manager agents to provide high availability and redundancy. Deployment • Use the Virtual Appliance (VA) The Windows installer is no longer an available deployment option after vRealize Operations Manager 6.4. The RHEL installer is available in vRealize Operations Manager 6.5 but is now deprecated in vRealize Operations Manager 6.6. There is no migration path from either Windows or RHEL to the Virtual Appliance. • Do not modify or install third party applications on the appliance When using the virtual appliance, installation or modifications of third party applications is unsupported and may cause problems to vRealize Operations Manager. • Deploy the VA with FQDN Register a fully qualified domain name for the vRealize Operations Manager node. Simply using hostname may not properly resolve and may experience communication problems with the node. • Use Thick Provisioning Eager Zeroed When deploying nodes, set disk provisioning to “Thick Provision Eager Zeroed” for most optimum performance. • When deploying Medium size nodes, increase the VM hardware level The default hardware is set to “7” and limits the number of vCPUs per node. To increase the number of vCPUs when scaling a medium node to a large node, as example, the HW level must be set to a higher value. • Leverage Remote Collectors Use remote collectors where possible to navigate firewalls, reduce bandwidth across data centers, connect to remote data sources, or reduce the load on the vRealize Operations Manager analytics cluster. Upgrade • If upgrading to vRealize Operations Manager 6.7 or vRealize Operations Manager 7.0, run the appropriate versioned Pre-Upgrade Assessment Tool on your current vRealize Operations Manager before performing the upgrade to view the possible impact on your custom content and to plan appropriate maintenance efforts for adjusting impacted custom content. See https://www.vmware.com/products/vrealize-operations/upgrade-center.html. • Verify existing functionality before upgrading Ensure the environment is fully functional before starting an upgrade. It is recommended to make a list of what works (or does not work) to confirm the same functionality post upgrade. • Backup customized content before upgrade Customized content should be backed up and saved for any potential overwrites or losses during upgrade. • Snapshot VMs with cluster offline before upgrading
After verifying functionality and backing up customized content, snapshot all the analytics VMs within the cluster failsafe in event of an upgrade failure. Check interoperability of management packs before upgrade It may be possible that some management packs will not be supported in the new product version and render the management pack inoperable. Before encountering this situation, confirm interoperability of management packs with the new product version. Perform the upgrade outside of DT / QIC Backup process times Perform backups of the vRealize Operations Manager cluster outside of dynamic threshold or capacity calculations or during backups to avoid capturing high stress states Setup blackout for maintenance to avoid false alerts When performing maintenance, such as upgrade, schedule a maintenance window to account for the perfor activity to avoid receiving false alerts and notifications. Examine the validation Check recommendations performing the upgrade There is a pre-check upgrade validation script that runs before performing the actual upgrade. Address any failures and warnings before continuing to upgrade or the upgrade may fai Enable option to reset Default Content Select the option to reset default content and bring in new content. This will overwrite existing content to a newer ersion provided by the update. User modifications to DEFAULT Alert Definitions, Symptoms, commendations, Policy Definitions, Views, Dashboards, widgets and Reports will be overwritten; therefore, lone or backup the content before you proceed Upgrade the Os PAK prior to upgrading the virtual appliance (VA)PAK To ensure a solid base os before upgrading v Realize Operations Manager, upgrade the Os of the virtual appliance first before upgrading the vRealize Operations Manag Pre-distribute PAk files to minimize downtime during upgrade One of the longest steps of the upgrade process is the distribution of the Pak files across all the nodes. To minimize this time, pre-distribute the Pak files to all nodes before starting the upgrade Seehttps://kb.vmwarecom/kb/2127895 Upgrade in order of v Realize Operations Manager platform EPOps agents Management Packs Upgrade the vRealize Operations Manager platform first before upgrading the end point Operations Manager agents. Upgrade the End point Operations Manager agents from the admin Ul using a PAK file. Lastly, upgrade any corresponding management packs Verify functionality after upgrade Validate that the same functionality exists when the upgrade completed as compared before the upgrade started Remove VM snapshots when upgrade completed Remove all VM snapshots post upgrade and verification of the environment as maintaining snapshots will cause Remote collectors may be located in distant locations to the vRealize Operations Manager cluster so consider potential latency and performance issues before performing an upgrade. Ensure that the remote collectors meet the latency requirements of less than 200ms. If they do not meet latency requirements, remove those remote collectors from the cluster one-by-one vRealize Operations Manager Best Practices/10
vRealize Operations Manager Best Practices /10 After verifying functionality and backing up customized content, snapshot all the analytics VMs within the cluster for failsafe in event of an upgrade failure. • Check interoperability of management packs before upgrade It may be possible that some management packs will not be supported in the new product version and render the management pack inoperable. Before encountering this situation, confirm interoperability of management packs with the new product version. • Perform the upgrade outside of DT / QIC / Backup process times Perform backups of the vRealize Operations Manager cluster outside of dynamic threshold or capacity calculations or during backups to avoid capturing high stress states. • Setup blackout for maintenance to avoid false alerts • When performing maintenance, such as upgrade, schedule a maintenance window to account for the performed activity to avoid receiving false alerts and notifications. Examine the Validation Check recommendations before performing the upgrade. • There is a pre-check upgrade validation script that runs before performing the actual upgrade. Address any failures and warnings before continuing to upgrade or the upgrade may fail. • Enable option to reset Default Content Select the option to reset default content and bring in new content. This will overwrite existing content to a newer version provided by the update. User modifications to DEFAULT Alert Definitions, Symptoms, Recommendations, Policy Definitions, Views, Dashboards, Widgets and Reports will be overwritten; therefore, clone or backup the content before you proceed. • Upgrade the OS PAK prior to upgrading the virtual appliance (VA) PAK To ensure a solid base OS before upgrading vRealize Operations Manager, upgrade the OS of the virtual appliance first before upgrading the vRealize Operations Manager. • Pre-distribute PAK files to minimize downtime during upgrade One of the longest steps of the upgrade process is the distribution of the PAK files across all the nodes. To minimize this time, pre-distribute the PAK files to all nodes before starting the upgrade. See https://kb.vmware.com/kb/2127895. • Upgrade in order of vRealize Operations Manager platform → EPOps agents → Management Packs Upgrade the vRealize Operations Manager platform first before upgrading the End Point Operations Manager agents. Upgrade the End Point Operations Manager agents from the admin UI using a PAK file. Lastly, upgrade any corresponding management packs. • Verify functionality after upgrade Validate that the same functionality exists when the upgrade completed as compared before the upgrade started. • Remove VM snapshots when upgrade completed Remove all VM snapshots post upgrade and verification of the environment as maintaining snapshots will cause performance problems. • Be mindful when upgrading Remote collectors Remote collectors may be located in distant locations to the vRealize Operations Manager cluster so consider potential latency and performance issues before performing an upgrade. Ensure that the remote collectors meet the latency requirements of less than 200ms. If they do not meet latency requirements, remove those remote collectors from the cluster one-by-one