Community
 
 
 

CloudPlatform 3.x

343フォロワー
 
Avatar
Erik Godin

Instance rebooted, volume gone.

I ran into a problem with CloudPlatform 3.0.6 and I'm curious to know if anyone experienced a similar problem.

Someone (who doesn't have access to CP GUI) was logged in to an instance and suddenly got disconnected (the RDP session timed out) and he couldn't reconnect. Someone with access to the GUI confirmed the VM was unavailable via console proxy (unfortunately the error message wasn't documented) so the instance was rebooted. When the instance was back online, it still wasn't accessible via RDP so it was accessed via the web console. The web console showed a Windows mini-setup screen which leads me to believe the VM was somehow re-attached to the template used to create it originally.

I've reviewed the management-server log for the hour of the incident and the hour preceding it and I did find multiple copies of the same entry which has me worried:

There are no Consoles available to the vm : Clones of this template will automatic
ally provision their storage when first booted and then reconfigure themselves with the optimal settings for Windows Server 2008 R2 (64-bit)



The error message was repeated several times throughout the day and appears to have stopped within the same hour the event occurred.

TIA for the help.


6件のコメント
0

サインインしてコメントを残してください。

 
 

Previous 6件のコメント

Avatar
Pankaj Paliwal
Avatar

Instance rebooted, volume gone.

Erik,

I haven't seen anything similar before.

I think there is a lot of information required before we could understand the situation like what hypervisor and storage is being used, etc. In addition, we'll need to thoroughly look at the management server and hypervisor logs to find out exactly what happened.

I recommend engaging Citrix support to troubleshoot this further.

Regards,
Somesh


Somesh Naidu CITRIX EMPLOYEES
コメントアクション パーマリンク
Avatar
Pankaj Paliwal
Avatar

What do you mean by volume gone? The volume of the instance was restored to the default template?

What is the primary storage in your environment? Is it local storage?


Prashanth Reddy Mandadi CITRIX EMPLOYEES
コメントアクション パーマリンク
Avatar
Erik Godin

Hi prashanthr2,

You hit the nail on the head, the volume of the instance was restored to the original template. To answer your question, the backend is an EMC storage offering an NFS share which is mounted by multiple compute nodes. The data is stored in a VHD file.

I've advised my co-workers next time something like this happens not to click through the Windows Mini Welcome screen and to raise me through the phone so I can have a look at things. I theorize that one of the following might be true:

* Maybe the original volume is still attached and for some reason there's a process somewhere which decided to attach the oroginal template
* Maybe the original volume is missing and a process (within CloudPlatform or XenServer) notices and re-attaches the original template to compensate.

It would be helpful if I could revie a historical record of which instance a volume USED to be attached to.

TIA

Erik


コメントアクション パーマリンク
Avatar
Pankaj Paliwal
Avatar

The issue ( root volume restored to original template) can happen if someone has performed the resetVM operation from CloudPlatform UI. To confirm this look at the entries in the event table.

mysql> select * from event where type='VM.RESTORE';

If you see any entries from the above query then it indicates that the resetVM was done on few VM's. Unfortunately from the entries you cannot confirm the VM on which the resetVM was done but you can look at the date in created column to see if it matches when the issue was reported by your users

Erik --> It would be helpful if I could revie a historical record of which instance a volume USED to be attached to.

You may have the historical record in cloud DB if the volume was replaced as a part of resetVM operation, but the GC on the xenserver might have already destroyed the old volume

You can try the following query

select * from volumes where name='ROOT-#';

Replace # with the Id of the VM, look for the value in "path" column from the above output, this will give you the name of the VHD that were used as ROOT volumes. The entry with removed column as null indicates that it is currently being used.


Prashanth Reddy Mandadi CITRIX EMPLOYEES
コメントアクション パーマリンク
Avatar
Erik Godin

Thank you for your help prashanthr2, indeed it looks like our helpdesk mistook "Reset VM" to be the same as "Reboot". The cloud DB does indeed confirm a ResetVM operation occurred. I looked for the VHD, unfortunately it's destroyed :/

Thanks again!

Erik


コメントアクション パーマリンク
Avatar
Erik Godin

User error, someone clicked on "ResetVM" instead of "Reboot"


コメントアクション パーマリンク

Top Contributors