Friday, September 28, 2012

SRM 5 works! What a letdown.

This week we successfully tested SRM 5 and guess what? It worked.  We failed over, everything worked without a problem.  We failed back and guess what?  It worked again with no problems.  Not just the SRM "Test" we did a full-on "fail over" and "fail back."

This is all very anti-climatic for me. Our road to a working DR site has been a very rough, long, stressful, frustrating experience. We dealt with so many problems along the way; staff cuts, SRA problems, professional services problems, replication problems, failed WAN upgrades, etc... I should be excited, I should be jumping for joy.  A huge weight has been lifted off my shoulders. It's not fully configured yet, but the fact the we tested with no problems is a huge step in the right direction. However, i'm not super excited. I'm more aggravated that it took so long to get to this point.  Maybe after we have it fully configured and run production out of the DR site for a day or two will I be excited.

So what's the key to this working with no problems?  Nimble Storage. Yep, I replaced my storage to get all this to work without a problem.  It's all because how they perform snapshots and therefore replication. With our previous vendor we were 1-2 days behind on critical data.  Now with the Nimble we are 1-2 HOURS behind on critical data and 1 days behind on all data.  Yes, we are replicating our entire infrastructure daily.  All over the same 10mb MPLS pipe which is limited to 3mb during the day for replication traffic.

I would highly suggest you look at Nimble even if your not looking for a SAN replacement.  I wasn't looking to replace my SAN, in fact I was pretty dedicated to Compellent before my latest round of problems. Nimble does everything they say it does, it does it well and at a cost that is about the same as keeping my Compellent over the next 3 years.

If you are looking for a good deal on some Compellent gear let me know.  I have three Series 30 controllers,  1 shelf of Sata, 1 shelf of fiber and 2 shelves of SAS.

Wednesday, September 12, 2012

VMWare maximum VMDK file size with snapshots

Here is a problem we recently ran into with our new Nimble install.

We did our research, we asked the questions, we looked up VMWare KB articles and Nimble best practices guide.  Everything we read said the maximum file size is 2TB minus 512bytes.  Even VMWare support told us this.

This is all true, UNLESS you want to snapshot the machine that has that VMDK associated to it.

Then the maximum VMDK file size is 1.9845TB. To be safe we now make VMDK's 1.98tb.

This was very very frustrating for us because the error message VMware was throwing was not descriptive at all.  We were getting this.

File is larger than the maximum size supported by datastore '

What was also extremely frustrating is that we called Vmware several times on this issue and no one really knew the answer until is was escalated up the chain. We were told repeatedly that we should be able to take a snapshot on a 1.99tb vmdk.

Here is the Vmware KB article they sent us explaining the problem.  

Things like this really make me question using VMWare and paying for premium support with them.  In my opinion this should have been resolved in less than 30 minutes, not 6 hours.

Once the Nimble install is done, I'll start looking at Hyper-V in 2012 and see what it offers.  XenServer has too many problems to replace VMWare in my environment.

Tuesday, September 4, 2012

Compellent to Nimble Day 1 – Install and Update


Today was kind of a anti-climatic.  The Nimble’s were supposed to arrive by 10:30, they didn’t arrive until 3pm.
pallet
We waited all day for them to arrive, once they did it was go time.
rails
In 1 hour we had both array’s unboxed, racked, stacked, cabled and powered up.
nimble
Two Nimble CS240’s and a Dual Series 30 Compellent with 2 shelves of disks.
cabled
The cabling looks terrible, but that’s temporary for the DR array to seed data.
Within 2 hours of these array’s arriving they we racked with an initial config and upgraded to version 1.4
Tomorrow we will get into the VMWare setup, create some volumes, protection policies, replication schedules, etc…

Monday, September 3, 2012

Citrix XenServer 6, iSCSI and Compellent – Part 2

It’s safe to say after our last experience with Co-Pilot I was pretty disappointed, then I got my quote for 1 year of support, the first time we are paying for support since we purchased them, and I was shocked.  I wasn’t going to pay that much money for that kind of support.  I have been told over and over again that this is not typical for Co-Pilot, however it is typical of my experience with Co-Pilot.  If I get around to it I will blog about my experience with Compellent and SRM 5 last December. Also around the same time that I got the quote for support one of my friends was talking about a new SAN he bought called Nimble.  It sounded interesting and the price was amazing compared to Compellent. Seriously I starting looking at the Nimble just because of the price.

I decided to start looking at other options to see if it was even possible to replace the Compellent.  I never ever thought I would replace the Compellent, even though I was very disappointed with them. I started putting out some feelers on LinkedIn to see if anyone would buy my systems and to see what my support options were. Luckily I found a couple of business partners that are interested in purchasing all my gear and I got a lot of good feedback about my support options.

I even had a regional Compellent sales rep ask why I was considering not renewing support.  He asked if I was having problems with the product or with support. He also asked if my local Compellent team was aware of the problems. I replied with “Yes, I have problems with the product and with support.”  “Yes, my business partner is aware of the problems and so is my local Compellent team and they have done nothing to resolve the problems.”  Well apparently that was the quote that set Compellent off.
Within days of that appearing on LinkedIn I got a call from Compellent Co-Pilot management asking about our issues and they were determined to resolve the issues. Just a little more history on this ticket.  This case was opened on on June 18th, the day we converted to virtual ports was July 21st. So a month between opening a case where servers were locking up and replications failing until they suggested and we could implement a resolution. After the 21st I sent an email to my business partner and local Compellent team. No calls, no contact, no email, nothing.  Then after the LinkedIn postings I got a call on August 2nd. It’s now September 2nd.

Compellent has been all over us since that day.  They have assigned a tier 3 tech to our case, they call us almost every other with suggestions and to check on our progress.  However, they still can’t solve the main problem.  XenServer 6, iSCSI and multi-pathing.

The main problem is this. Compellent says multi-pathing with Xenserver 6 and iSCSI HBA’s works.  However, we can’t get the Qlogic 4062C iSCSI HBA to display both ports in XenServer 6. It works fine in 5.6, just not 6.0.  So we called our Citrix business partner and explain the problem.  They tell us that it is not supported and we are wasting time trying to troubleshoot it.  We tell this to Compellent and they insist it works and to open a ticket with Citrix. We call our Citrix business partner again and have them open a ticket with Citrix.  They call us back and say that card is not supported in XenServer 6, so they can’t help us.  It turns out, NO ISCSI HBA’s ARE CERTIFIED ON XENSERVER 6.0.

In the mean time we had been investigating the Nimble array quite heavily.  The price is right and it all sounds great and looks awesome on paper. Compellent wasn’t making any progress at the time and the Nimble sounds like it will solve our replication issues and improve performance.  Over the next 3 years buying two new Nimble’s was about the same price as paying Compellent support and upgrading my series 30 controllers. We decided to participate in a Nimble proof of concept and it starts next week.  If it works, they will replace the Compellent.  This was all decided and signed before we found the real problem, no iSCSI HBA’s supported on XenServer 6.0.  Since virtual ports doesn’t work with software iSCSI, we are kind of screwed.

Since this problem is with XenServer and not Compellent we have made some decisions.
  1. We are no longer trying to get multi-pathing to work in XenServer
  2. We have decided to flatten our iSCSI network to one subnet on two switches
  3. We have decided the long term solution is to replace XenServer with more Vmware licenses
I plan on blogging about the Nimble install and data conversion as well.  So stay tuned for more posts on that.

Saturday, September 1, 2012

Dell Storage Forum 2012 - My long delayed recap

I had the pleasure of attending the Dell Storage Forum in Boston this year as well as participate in the customer advisory panel.

We were a happy Compellent customer and have two of their arrays, one at our HQ and one at our DR Site. We had been having problems with them for months and I was pretty vocal about it on twitter and Linkedin. One day I get a DM on twitter from a Dell representative asking me for my email address so they could send me info about an upcoming event. I thought nothing of it and figured it was some local sales thing.  A couple of days later I got an email inviting me to be part of the customer advisory panel held after DSF. I gladly accepted since it also included admission to DSF, something I was interested in attending anyways. I did attend some NDA sessions so I hope I talk about anything I'm not supposed to.
 
Overall I thought the conference was great and the CAP was exceptional.  I made a lot of new friends in the process and learned so many things. Here are some big take aways.

1. Don't go to Boston for 2 nights for a work conference and expect to see anything touristy. I would have loved to have seen more of the city. Oh well, this just means I need to go back some day.

2. Dell did an awesome job at this event, the branding was great, the people were great, the sessions were amazing.  Attendance was around 800-1000 is my guess and it was a great size.

3. The future of Compellent sessions were great, they gave me a lot of insight into where Dell was taking them. Unfortunately, they gave me too much information that landed up biting them in the ass. More on that to follow.

4. The Compellent engineers were awesome to talk to. Ask them anything and they knew the answer and were more than willing to tell you why.  Kind of like going into the labs or going to "Ask the Developers" at Lotussphere.

5. I saw the most amazing Keynote session on Wednesday morning. I was an interactive white boarding session that brought up the best and brightest people from each company and showed how they are all going to talk to each other and integrate in the future.

6. The customer advisory panel was excellent and kudo's to the entire Dell social media team that put it on. I wonder where Cappy is now?

Compellent has some awesome things coming in the next 2 years.  Like server based cache, AppAssure integration, and many other things I know I am not supposed to say.

However, I learned many things that didn't make be very happy.  Here are some of them.

1. Series 30 controllers will never get the 6.0 code
2. The last release of the 5.x code is coming later this year with 5.6 and that will be the last major release.  - So that means no new features after this year, so that means budget for controller upgrades next year.
3. My drive shelves are no longer going to be sold after the middle of this year.
4. The drives that go in the shelf will no longer be able to be bought at the end of this year
5. Dell changed the SAS connector on the SC8000 controllers, so I can't use the SAS shelf's I own with the new controllers.
6. AppAssure will be integrated into Compellent at some point and is application aware. Only if that app is made my Microsoft.
7. The server based looks amazing, IF you have R810 servers and IF you have a 10gb network on the backend to support it.  I have neither.

In general though I left the conference pretty happy and excited about the future of Compellent.  Yes, I have had my issues with them, and they were working on it, but in general I was confident in my decision to use them as our primary storage and to deploy a DR site with them.



The customer advisory panel was great, we all had a chance to voice our opinion on every question.  Greg Schulz @storageio did an awesome job driving the interaction between him and the other attendees.

Here is a list of some of the other attendees at the CAP.
@JeffHengesbach
@NerdBlurt
@shmick
@rogerlund
@petergavink

You can follow me on twitter at @dheinle

I did get to say my peace about Co-Pilot before and after the Dell acquisition. It seemed like the other attendees did not have the same issue with them that I have been having.  After that I expected a call from Co-Pilot to resolve our last issue.  Nope, no call, no e-mail no nothing.

I do have Co-pilot bending over backwards now to try and resolve a couple more problems.  But that came from another post on LinkedIn. I'll leave it for another blog post.


Full disclosure: Dell paid for my hotel, airfare and conference fee.

Citrix XenServer 6, iSCSI and Compellent – Part 1

We have been using Citrix XenServer for the last 2 and half years to run our Citrix infrastructure on.  For the most part I really like it, it works pretty good and it is free to run since we have XenDesktop licenses.  We upgraded to version 6 a while ago without ever thinking about it. Everything seemed to work without a problem with our Compellent and the software iSCSI initiator.

We were having a lot of problems with our Compellent storage causing Vmware lock ups, XenServer lock ups and replication issues. One of the things that Compellent Co-Pilot recommended was to convert from legacy ports to virtual ports.  This would mark the beginning of the end with our relationship with Compellent.

You see Compellent doesn’t take this conversion lightly, to convert to virtual ports requires a license.  The license is free, but they put you through a series of questions and evaluations before they will give you the license.  In our case it took several weeks to get this approved.  It started with Co-Pilot recommending it, then my business partner had to put in the order for the license, then the local SE had to fill out some paperwork and approve the configuration.  Even after that I had to send them pages of documentation on our system setup and configuration. Things like what kind of servers we were running, firmware and bios versions of cards, etc..  I even tried to rush the order since this conversion on iSCSI requires downtime and I already had downtime scheduled for the upcoming weekend.  I was told this would not be a problem and I would have it shortly. Well that didn’t happen.

At this point we have had a ticket open with Co-Pilot for 5 weeks. They knew our systems, they knew the configuration and they knew we were having problems with XenServer.

Fast forward to the next weekend when we had more downtime scheduled to convert to virtual ports and enable multi-pathing with dual subnets and dual switches.  We started this process at 8am.  Shut all the systems down, configured 2 new switches, install and cabled as well. We turned on the VMware servers and made the configuration changes for virtual ports and multi-pathing.  We were done with this in about 3 hours.  Then we started working on XenServer.

We also did not take this conversion lightly, I had called Co-Pilot multiple times to hammer out the details, ask for documentation, etc..  I thought we had all our bases covered.  They even sent me a XenServer 6 best practices guide which showed us how to configure everything for software or hardware iSCSI.

To make a long long story a little shorter, it didn’t work.  We called Co-pilot 3 times that day asking for help and clarification. We thought we were doing everything right, it just wasn’t working. Finally around 9:30pm we called Co-pilot again, the co-pilot named Grant was not very helpful. He told me over and over again that we had a networking problem. I asked how that was possible if Vmware worked fine.  He told he again that it was a networking problem. I had asked three times to ask someone else if they had any ideas. He refused. I asked for the ticket to be escalated and he finally did that. After being on hold for some time he came back and said multi-pathing wasn’t support on XenServer with the software iscsi initiator. They even sent me a CSTA describing the problem.

Ok, Fine. At this point I didn’t care. All we had to do was change it back to a single path, get the stuff up and go home.  We had already been working on this for over 12 hours.

Fast forward another hour, we made all the changes and it still doesn’t work. I look at the CSTA they sent me and actually read it. XenServer software iscsi doesn’t work with multi-pathing OR virtual ports!

I called Co-Pilot again asking about the CSTA and virtual ports. I was put on hold for a while, they came back and said yes “It does not work with multipathing OR virtual ports.”  WTF, how is this possible? They knew we had XenServer, they knew we had software iSCSI and THEY are the one who recommended this! They even sent me a 60 page guide on how to configure this, how does it magically not work now?  I was told that it was a problem in XenServer 5.6 and Citrix told them it was resolved in 6.0  But it wasn’t. The problem is still there.  So the next question I asked was “How to I go back to legacy ports”  The Answer “You can’t” WTF again.  Then they were asking me where I got the Xenserver 6 Best practices guide and who sent it to me.  HELLO McFly, my shit still doesn’t work, how about getting it back online instead of trying to cover your ass’s.  I was told to expect a call back in 20 minutes.  I hung up, went outside and cooled down a little.  I have this phone call recorded if anyone wants to hear it.

Well 20 minutes later I got a call back from a Lead Enterprise Engineer Chris.  He was great and we starting working on the problem immediately.  He told me we could in fact revert to legacy ports but it would require a lot of work.  I really didn’t want to go that route since Co-Pilot was the ones that recommended converting to virtual ports to fix our issues.  The only other option was use a hardware iSCSI HBA’s.  At this point it was after midnight and I don’t think any place in St Louis sells $1,200 iSCSI HBA’s anyways.

As part of trying to isolate the replication issues we also installed 2 new iSCSI HBA’s in the dual controller system.  It was my idea to take a controller down, take a HBA out and put it in the Xenserver host.  Chris agreed that was our best shot and he worked with us for over 2 hours to get it to work. I am sure it would have went faster but we have never used iSCSI HBA’s before and were not sure what we were doing.

Remember we started this at 8am.  We were done with the switches and vmware by 11am.  It was now 3am and we had 1 of our XenServers up running all the load.  Time to go home.

Chris also dispatched another card to us from the parts depot so we could get both controllers up and running again.  On Sunday afternoon one of guys met the courier at the office and put that card in the second controller. He called co-pilot and they ran a post health check and it all looked ok.  But it wasn’t, the controllers were out of sync and the ports would not re-balance.  I had to call Co-Pilot AGAIN to figure out this problem.  Looks like some of the replications were causing a problem.  We deleted the replications and the ports re-balanced just fine.  I then re-created the replications and all was good.

After we got the controllers back up and running I never really back from Compellent regarding our Xenserver issues. I even sent a detailed email to my business partner and the local Compellent team.  I got call from my business partner almost immediately apologizing for all the problems. I never, even to this day, got a call from my local Compellent team.

Wow, I never intended to blog this much about our issues.  But now you have the background for part 2 of the story :)

There was an error in this gadget