Saturday, September 1, 2012

Citrix XenServer 6, iSCSI and Compellent – Part 1

We have been using Citrix XenServer for the last 2 and half years to run our Citrix infrastructure on.  For the most part I really like it, it works pretty good and it is free to run since we have XenDesktop licenses.  We upgraded to version 6 a while ago without ever thinking about it. Everything seemed to work without a problem with our Compellent and the software iSCSI initiator.

We were having a lot of problems with our Compellent storage causing Vmware lock ups, XenServer lock ups and replication issues. One of the things that Compellent Co-Pilot recommended was to convert from legacy ports to virtual ports.  This would mark the beginning of the end with our relationship with Compellent.

You see Compellent doesn’t take this conversion lightly, to convert to virtual ports requires a license.  The license is free, but they put you through a series of questions and evaluations before they will give you the license.  In our case it took several weeks to get this approved.  It started with Co-Pilot recommending it, then my business partner had to put in the order for the license, then the local SE had to fill out some paperwork and approve the configuration.  Even after that I had to send them pages of documentation on our system setup and configuration. Things like what kind of servers we were running, firmware and bios versions of cards, etc..  I even tried to rush the order since this conversion on iSCSI requires downtime and I already had downtime scheduled for the upcoming weekend.  I was told this would not be a problem and I would have it shortly. Well that didn’t happen.

At this point we have had a ticket open with Co-Pilot for 5 weeks. They knew our systems, they knew the configuration and they knew we were having problems with XenServer.

Fast forward to the next weekend when we had more downtime scheduled to convert to virtual ports and enable multi-pathing with dual subnets and dual switches.  We started this process at 8am.  Shut all the systems down, configured 2 new switches, install and cabled as well. We turned on the VMware servers and made the configuration changes for virtual ports and multi-pathing.  We were done with this in about 3 hours.  Then we started working on XenServer.

We also did not take this conversion lightly, I had called Co-Pilot multiple times to hammer out the details, ask for documentation, etc..  I thought we had all our bases covered.  They even sent me a XenServer 6 best practices guide which showed us how to configure everything for software or hardware iSCSI.

To make a long long story a little shorter, it didn’t work.  We called Co-pilot 3 times that day asking for help and clarification. We thought we were doing everything right, it just wasn’t working. Finally around 9:30pm we called Co-pilot again, the co-pilot named Grant was not very helpful. He told me over and over again that we had a networking problem. I asked how that was possible if Vmware worked fine.  He told he again that it was a networking problem. I had asked three times to ask someone else if they had any ideas. He refused. I asked for the ticket to be escalated and he finally did that. After being on hold for some time he came back and said multi-pathing wasn’t support on XenServer with the software iscsi initiator. They even sent me a CSTA describing the problem.

Ok, Fine. At this point I didn’t care. All we had to do was change it back to a single path, get the stuff up and go home.  We had already been working on this for over 12 hours.

Fast forward another hour, we made all the changes and it still doesn’t work. I look at the CSTA they sent me and actually read it. XenServer software iscsi doesn’t work with multi-pathing OR virtual ports!

I called Co-Pilot again asking about the CSTA and virtual ports. I was put on hold for a while, they came back and said yes “It does not work with multipathing OR virtual ports.”  WTF, how is this possible? They knew we had XenServer, they knew we had software iSCSI and THEY are the one who recommended this! They even sent me a 60 page guide on how to configure this, how does it magically not work now?  I was told that it was a problem in XenServer 5.6 and Citrix told them it was resolved in 6.0  But it wasn’t. The problem is still there.  So the next question I asked was “How to I go back to legacy ports”  The Answer “You can’t” WTF again.  Then they were asking me where I got the Xenserver 6 Best practices guide and who sent it to me.  HELLO McFly, my shit still doesn’t work, how about getting it back online instead of trying to cover your ass’s.  I was told to expect a call back in 20 minutes.  I hung up, went outside and cooled down a little.  I have this phone call recorded if anyone wants to hear it.

Well 20 minutes later I got a call back from a Lead Enterprise Engineer Chris.  He was great and we starting working on the problem immediately.  He told me we could in fact revert to legacy ports but it would require a lot of work.  I really didn’t want to go that route since Co-Pilot was the ones that recommended converting to virtual ports to fix our issues.  The only other option was use a hardware iSCSI HBA’s.  At this point it was after midnight and I don’t think any place in St Louis sells $1,200 iSCSI HBA’s anyways.

As part of trying to isolate the replication issues we also installed 2 new iSCSI HBA’s in the dual controller system.  It was my idea to take a controller down, take a HBA out and put it in the Xenserver host.  Chris agreed that was our best shot and he worked with us for over 2 hours to get it to work. I am sure it would have went faster but we have never used iSCSI HBA’s before and were not sure what we were doing.

Remember we started this at 8am.  We were done with the switches and vmware by 11am.  It was now 3am and we had 1 of our XenServers up running all the load.  Time to go home.

Chris also dispatched another card to us from the parts depot so we could get both controllers up and running again.  On Sunday afternoon one of guys met the courier at the office and put that card in the second controller. He called co-pilot and they ran a post health check and it all looked ok.  But it wasn’t, the controllers were out of sync and the ports would not re-balance.  I had to call Co-Pilot AGAIN to figure out this problem.  Looks like some of the replications were causing a problem.  We deleted the replications and the ports re-balanced just fine.  I then re-created the replications and all was good.

After we got the controllers back up and running I never really back from Compellent regarding our Xenserver issues. I even sent a detailed email to my business partner and the local Compellent team.  I got call from my business partner almost immediately apologizing for all the problems. I never, even to this day, got a call from my local Compellent team.

Wow, I never intended to blog this much about our issues.  But now you have the background for part 2 of the story :)

No comments:

There was an error in this gadget