Today's Tech Headaches: August 2014

Monday, August 25, 2014

Maximo startup problems!

Our infrastructure utilizes WebSphere MQ as our Maximo queue backend. Our automation framework (consisting mainly of jython scripts injected into wsadmin) sets up CQIN and SEQIN queues, as well as activation specifications and whatnot, with 1 click of a button, so our margin for error is pretty low once it's off the ground running.
So when this error appeared in SystemOut.log on Maximo startup it was quite discomforting:

[8/25/14 15:53:39:542 EST] 000003d9 SystemOut O 25 Aug 2014 15:53:39:510 [ERROR] [MXServer] [] java.lang.NullPointerException at psdi.iface.jms.JMSContQueueProcessor.processMessage(JMSContQueueProcessor.java:253) at psdi.iface.jms.JMSListenerBean.onMessage(JMSListenerBean.java:203) at com.ibm.ejs.container.WASMessageEndpointHandler.invokeJMSMethod(WASMessageEndpointHandler.java:138) at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1146) at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invoke(MessageEndpointHandler.java:844) at com.sun.proxy.$Proxy33.onMessage(Unknown Source) at com.ibm.mq.connector.inbound.MessageEndpointWrapper.onMessage(MessageEndpointWrapper.java:131) at com.ibm.mq.jms.MQSession$FacadeMessageListener.onMessage(MQSession.java:125) at com.ibm.msg.client.jms.internal.JmsSessionImpl.run(JmsSessionImpl.java:2747) at com.ibm.mq.jms.MQSession.run(MQSession.java:950) at com.ibm.mq.connector.inbound.ASFWorkImpl.doDelivery(ASFWorkImpl.java:88) at com.ibm.mq.connector.inbound.AbstractWorkImpl.run(AbstractWorkImpl.java:216) at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:668) at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1862)

This error was flooding the logs every few milliseconds, and causing CPU starvation!
It seemed to suggest the "JMSContQueue" was trying to "processMessages" (duh). The problem: The WebSphere MQ infrastructure (which we didn't own) did not have the queue yet. For some reason Maximo polls infinitely for the queue...so I changed "intjmsact" to point at a queue which did exist, and voila!

But that wasn't the end of it! When the queue was finally created and "intjmsact" was configured back to point at the original queue, same error message!
This time, the problem was that there were messages already on the queue, which Maximo did not recognize. Maximo picked them up, rejected them, and put them back on the queue, causing yet another infinite cycle. Deleting the messages resolved the issue.

Sunday, August 17, 2014

Splunk and lookups

The client upgraded Splunk from 5.0.8 to 6.1.2, worthwhile upgrade imho. But it messed up my query, possible bug.

Given this query: (not exact for commerical reasons)


index=prod sourcetype=wps.log module="PXY_*" (`transaction_filter`)
  | dedup host _raw
  | eval timestamps=_time
  | convert timeformat="%s" ctime(_time) as TimeStamp
  | search [| inputlookup outages | eval StartTime = strftime(strptime(Start,"%d/%m/%Y, %H:%M"),"%s")
            | eval EndTime = strftime(strptime(End,"%d/%m/%Y, %H:%M"),"%s")
            | eval search = "(TimeStamp < \""+StartTime+"\" OR TimeStamp > \""+EndTime+"\")"
            | fields search | mvcombine search | eval search = "(" + mvjoin(search, " ") + ")"]

I had used this in v5 to filter out results that fell within an outage period. The pre-req for this is a lookup table called 'outages'.

The result of the subsearch looked like this.

((TimeStamp < "1398949200" OR TimeStamp > "1398974400") (TimeStamp < "1399554000" OR TimeStamp > "1399575600") (TimeStamp < "1399726800" OR TimeStamp > "1399748400") (TimeStamp < "1399986000" OR TimeStamp > "1400011200") (TimeStamp < "1400072400" OR TimeStamp > "1400097600") (TimeStamp < "1400418000" OR TimeStamp > "1400443200") (TimeStamp < "1400504400" OR TimeStamp > "1400529600") (TimeStamp < "1400763600" OR TimeStamp > "1400788800") (TimeStamp < "1400763600" OR TimeStamp > "1400778000") (TimeStamp < "1400936400" OR TimeStamp > "1400958000") (TimeStamp < "1401282000" OR TimeStamp > "1401307200") (TimeStamp < "1401454800" OR TimeStamp > "1401516000") (TimeStamp < "1401541200" ))

Before the upgrade, it just worked as it should've. After upgrade, nada. Defect perhaps?

Monday, August 11, 2014

Websphere Messaging Engine not starting

In trying to automate WebSphere installation, we ran into the titled problem.
As with my other posts, we've got corporate DBAs who we engage to create user accounts and databases for us. Our initial guess was the user account we had created for us didn't have the right privileges, but there were no SQL exceptions in FFDCs. When starting the messaging engine, we'd get this error:

The messaging engine "ME_name" cannot be started as there is no runtime initialized for it yet, retry the operation once it has initialized. For the runtime to successfully initialize the hosting server must be started, have its 'SIB service' already enabled, and dynamic configuration reload enabled. If this is a newly configured messaging engine and it is the first messaging engine to be hosted on this server, then it is most likely the 'SIB service' was not previously enabled and thus the server will need to be restarted. The messaging engine runtime might not be initializing because of an error while trying to start, examine the SystemOut.log of the hosting server to check for error messages indicating the problem

The node server SystemOut.log revealed pretty much nothing. The nodeagent had a number of FFDCs. So I thought perhaps it was a firewall problem, was on the right track...
We found:

port 9420 was new to us. We were used to WebSphere v7, and looking through serverindex.xml we noticed a port called Status Update Listener: More info
netstat on the node server and all the ports listening were not matching what we got opened through firewalls. So we changed them.
the FFDCs had an "UnknownHostException: *". The application server wasn't starting properly either, so this error pointed me in the right direction. The host needs to be defined for at least the SOAP_CONNECTOR_ADDRESS, and IPC Connector port we set to localhost
I got the messaging engine running by setting the schema (under Bus > Messaging Engine > Message Store > Schema) and the user to the same value.

Thursday, August 7, 2014

Firewalls in the corporate

Jeez getting a ZIP file to where I needed it today was such a pain! I got WinSCP onto a Windows VM where a copy of Maximo was installed, to SCP the directory structure to a Linux box (Maximo admins: you have to do this because someone decided Maximo could only be installed on Windows). Turns out after a bit of debugging the Windows box wasn't in the same VLAN segment as the Linux boxes. So if this happens to you:
telnet 22 (do this both ways, you get timeout)
WinSCP times out
tracert times out
Turning off iptables on the Linux box does nothing (/etc/init.d/iptables stop)

...then you probably have your "Windows" machine in the wrong place.