Category Archives: Operations Management

Clouds and beyond

The question on most IT professionals’ mind seems to be “what’s the next paradigm shift in the datacenter space”

A decade ago, most organizations heavily adopted Virtualization and incorporated a “Virtualization first” approach. This was an inflection point deviating from the customary process to evaluate workload suitability for virtualization to a default virtualize first unless there are compelling reasons not to. Application vendors did their bit by providing reference architecture and best practices to ease the transition. The benefits of this transit were obvious and aplenty – cost, flexibility, availability, and so on.

Net-net, the datacenters consolidated and optimized. To help the cause there’s Moore’s law, hardware vendors refined the physical boundaries to accommodate the transit.

While in the initial days, there were skepticism and rebellion of sorts. Certain industries chose to remain physical citing their reasons, eventually, we saw almost all verticals from banks to defense organizations adopt virtualization.

An important thing to note is that, as the density of hypervisor vendors increased, the value proposition was no more “Virtualization” rather who could serve it best, (i.e) the actual competition was who can provide a better quality of features on top of a virtualized platform.

One can also perceive that “virtualization” turned into a commodity and how it can be delivered, maintained, or managed were the deciding factors.

In the meanwhile, there were interesting developments above and below the virtualization layer. There was hyper-convergence, storage & network virtualization and in the application stack, modernization of apps to move away from legacy models to cloud-native models. Putting the pieces together, we have a mixed bag of workloads. Some on-prem, some that can be run on the public cloud, and some hybrid.

From an organizational standpoint, CIOs would build a cloud strategy with a set of policies that will govern the placement of workloads 

The considerations,

Private Cloud = Increased Capex – On-PREM but better control and compliance

Public Cloud = Increased Opex – Off-PREM but predictable expenditure and less IT management complexities such datacenter costs – power, cooling, hardware maintenance, etc.

Over a while we witnessed each layer in the datacenter (bottom-up) getting commoditized.

Gartner states that by 2020, the concept of a “no-cloud” policy would be rare. 

An increasing emphasis continues to be on a hybrid model with shifting balances.

Feel empowered with the all new Log Insight 3.3.x


The all new vRealize Log Insight(vRLI) version 3.3.x comes with great new enhancements , but two key features hog the limelight,


#1 New Product Licensing

vRealize Log Insight 3.3 includes the ability to use 25 OSI available licenses at no additional cost with the use of a vCenter Server STD installation.



This means that you get vRLI clubbed with vCenter license and you get a good insight into its advanced capabilities. More details on how the licensing work is outlined here FAQs



#2 Importer Utility

A new importer utility is available to support importing old logs and support bundles via the Log Insight ingestion API. This utility is available as an executable for Windows and Linux, supports a manifest file that is almost identical to an agent configuration file (only difference is the directory option), can ingest messages based on their timestamp (requires authentication) and supports compressed (zip/gzip/tar) as well as recursive directory imports.



This new feature lets you work with offline logs, i.e.  you can process log bundles extracted from product, as opposed to configuring production servers to direct logs to vRLI.

Hence if there were impediments in deploying vRLI in the environment such as security approvals , budget approvals or business justification to procure the product and leverage it, the above two features helps to get things moving by building out an isolated setup & loading the logs for offline analysis.

This can be the first line of attack for administrators before engaging VMware Support.

I will follow up this blog with some sample & guidelines on setting it up.


Happy troubleshooting…




vCOPs: Error 500 The call failed on the server; see server log for details (StatusCode: 500)

Error 500 is rather annoying and quite generic “catch-all” error handler  for Web servers. VROPS is no different in this context.

In this blog , I aim to resolve one of the conditions with Analytics component of VROPS . Check if the symptoms and logs are relevant before you go ahead with the steps.


1- Login through ssh to the admin page of vrops


2- Check the status of all the services


In this case you observe that VCOPS Manager Service & Analytics Service stopped

3-This issue may also manifest as below , but you have not changed the ip address of either of the VMs(UI or Analytics)

The UI VM running at <IP address> but cannot to the Analytics VM at <IP address >, make sure it is running and reachable from <IP address> If the IP address of either VM has changes, then login to Administration interface that will guide you through the steps to restore connectivity between the two VMs.


4- Check logs $ALIVE_BASE/user/log/analytics.log

2015-11-12 16:38:41,225 ERROR [Thread-1] com.integrien.alive.dbaccess.AnalyticsFastLoaderCache.loadActiveAlarmsNative – Error while loading active alarms org.postgresql.util.PSQLException: PANIC: checksum mismatch: disk has 0x41ea3421, should be 0xda4b0281 filename pg_tblspc/16385/PG_9.0_201106101/16386/16435, BlockNum 171234, block specifier 16385/16386/16435/0/171234

2015-11-12 16:38:42,245 INFORMATION [Thread-1] – Analytics is stopping
2015-11-12 16:38:42,265 INFORMATION [Thread-1] – AnalyticsService has been stopped  

If these symptoms match, proceed to the next steps,

Root cause : For some reason the postgres db used by the analytics engineer encountered an error and needs you to correct it.


1- Login to Analytics VM

2- Connect to the DB

su postgres

3- Execute ” pg_ctl” stop -m smart -D /data/pgsql/data

** Force it using immediate switch if required/previous command errors out

4- Execute the following command and replace bolded text with the actual tablespace/database/relation/fork/blockNum recorded in analytics log

postgres –single -D $PGDATA -c fix_block_checksum= “16385/16386/16435/0/171234″

This should return with an message as ” ..* fixed”

5- pg_ctl start -D /data/pgsql/data

6- Go ahead and restart the vcops services

Happy Monitoring …