Console Proxy VM disconnects but is still running
Yes, seeing this lately. When this happens, is the agent process on the CPVM still running? Did you check agent logs to see what information is logged during that time?
Has anyone seen the console VM disconnect from Cloud Platform Management randomly? We are seeing it about once every week or so. It does not notify of the agent disconnect or cause any other issues; if I restart the Console Proxy VM it restarts and reconnects as it should.
On CloudPlatform 4.3.0.1 on CentOS and XenServer 6.2 hypers.
In the management log (IPs removed):
2015-01-05 05:35:05,334 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-8:ctx-98238002) host (x.x.x.x) has been successfully pinged, returning that host is up
2015-01-05 05:35:05,334 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-98238002) ping from (1) to agent's host ip address (x.x.x.x) successful, returning that agent is disconnected
2015-01-05 05:35:05,335 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-98238002) PingInvestigator was able to determine host 3 is in Disconnected
2015-01-05 05:35:05,335 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) The state determined is Disconnected
2015-01-05 05:35:05,335 WARN [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) Agent is disconnected but the host is still up: 3-v-1-VM
2015-01-05 05:35:05,336 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) Host 3 is disconnecting with event AgentDisconnected
2015-01-05 05:35:05,357 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) The next status of agent 3is Alert, current status is Up
2015-01-05 05:35:05,358 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) Deregistering link for 3 with state Alert
2015-01-05 05:35:05,358 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-98238002) Remove Agent : 3
2015-01-05 05:35:05,358 DEBUG [c.c.a.m.ConnectedAgentAttache] (AgentTaskPool-8:ctx-98238002) Processing Disconnect.
It then proceeds to send disconnect info to the listeners.
Thanks
Yes, seeing this lately. When this happens, is the agent process on the CPVM still running? Did you check agent logs to see what information is logged during that time?
Yes, it looks like it either loses connection back to the load balancer but not sure what the "Unsupported record version" message is related to. There is no record of the load balancer itself having any issues.
2015-01-05 05:35:02,798 DEBUG [utils.nio.NioConnection] (Agent-Selector:null) Location 1: Socket Socket[addr=xxxx.com/x.x.x.x,port=8250,localport=34848] closed on read. Probably -1 returned: Unsupported record version Unknown-0.0
2015-01-05 05:35:02,798 DEBUG [utils.nio.NioConnection] (Agent-Selector:null) Closing socket Socket[addr=xxxx.com/x.x.x.x,port=8250,localport=34848]
2015-01-05 05:35:03,038 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) Clearing watch list: 1
2015-01-05 05:35:05,543 DEBUG [cloud.consoleproxy.ConsoleProxyGCThread] (Console Proxy GC Thread:null) connMap={}
2015-01-05 05:35:05,944 INFO [utils.exception.CSExceptionErrorCode] (Console Proxy GC Thread:null) Could not find exception: com.cloud.exception.AgentControlChannelException in error code list for exceptions
2015-01-05 05:35:05,945 ERROR [resource.consoleproxy.ConsoleProxyResource] (Console Proxy GC Thread:null) Unable to send out load info due to Unable to post agent control request as link is not available
com.cloud.exception.AgentControlChannelException: Unable to post agent control request as link is not available
at com.cloud.agent.Agent.postRequest(Agent.java:687)
at com.cloud.agent.Agent.postRequest(Agent.java:675)
at com.cloud.agent.resource.consoleproxy.ConsoleProxyResource.reportLoadInfo(ConsoleProxyResource.java:457)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at com.cloud.consoleproxy.ConsoleProxy.reportLoadInfo(ConsoleProxy.java:230)
at com.cloud.consoleproxy.ConsoleProxyGCThread.run(ConsoleProxyGCThread.java:102)
2015-01-05 05:35:05,945 DEBUG [cloud.consoleproxy.ConsoleProxyGCThread] (Console Proxy GC Thread:null) Report load change : {
"connections": []
}
Looks like the underlying socket connection gets broken. Quite likely a network entity (including the LB) could be terminating the connection due to idle time out.
Also, have you tried to destroy the existing CPVM and create a new one? Just making sure the CPVM has the latest scripts/systemvmiso.
It has been destroyed and recreated with last patches.
I also just found a Java heap OutOfMemory error a few lines above what looks like successful connections.
2015-01-05 05:34:49,967 DEBUG [cloud.consoleproxy.ConsoleProxyGCThread] (Console Proxy GC Thread:null) Report load change : {
"connections": []
}
2015-01-05 05:34:51,084 WARN [utils.nio.NioConnection] (Agent-Selector:null) Caught an exception but continuing on.
java.lang.OutOfMemoryError: Java heap space
2015-01-05 05:34:54,967 DEBUG [cloud.consoleproxy.ConsoleProxyGCThread] (Console Proxy GC Thread:null) connMap={}
2015-01-05 05:35:00,320 DEBUG [cloud.consoleproxy.ConsoleProxyGCThread] (Console Proxy GC Thread:null) connMap={}
I grep'd in a few more log files to see if it occurred previously but they would have archived off. I want to see if we get the same Heap error prior to recreating again.
Ask, Discuss, Answer