Error 500 is rather annoying and quite generic “catch-all” error handler for Web servers. VROPS is no different in this context.
In this blog , I aim to resolve one of the conditions with Analytics component of VROPS . Check if the symptoms and logs are relevant before you go ahead with the steps.
Symptoms
1- Login through ssh to the admin page of vrops
2- Check the status of all the services
In this case you observe that VCOPS Manager Service & Analytics Service stopped
3-This issue may also manifest as below , but you have not changed the ip address of either of the VMs(UI or Analytics)
The UI VM running at <IP address> but cannot to the Analytics VM at <IP address >, make sure it is running and reachable from <IP address> If the IP address of either VM has changes, then login to Administration interface that will guide you through the steps to restore connectivity between the two VMs.
4- Check logs $ALIVE_BASE/user/log/analytics.log
2015-11-12 16:38:41,225 ERROR [Thread-1] com.integrien.alive.dbaccess.AnalyticsFastLoaderCache.loadActiveAlarmsNative – Error while loading active alarms org.postgresql.util.PSQLException: PANIC: checksum mismatch: disk has 0x41ea3421, should be 0xda4b0281 filename pg_tblspc/16385/PG_9.0_201106101/16386/16435, BlockNum 171234, block specifier 16385/16386/16435/0/171234
2015-11-12 16:38:42,245 INFORMATION [Thread-1] com.integrien.analytics.AnalyticsMain.stop – Analytics is stopping…
2015-11-12 16:38:42,265 INFORMATION [Thread-1] com.integrien.analytics.AnalyticsMain.stop – AnalyticsService has been stopped
If these symptoms match, proceed to the next steps,
Root cause : For some reason the postgres db used by the analytics engineer encountered an error and needs you to correct it.
Resolution
1- Login to Analytics VM
2- Connect to the DB
su postgres
3- Execute ” pg_ctl” stop -m smart -D /data/pgsql/data
** Force it using immediate switch if required/previous command errors out
4- Execute the following command and replace bolded text with the actual tablespace/database/relation/fork/blockNum recorded in analytics log
postgres –single -D $PGDATA -c fix_block_checksum= “16385/16386/16435/0/171234″
This should return with an message as ” ..* fixed”
5- pg_ctl start -D /data/pgsql/data
6- Go ahead and restart the vcops services
Happy Monitoring …