Vpshive / Gigenet Cloud VPS Down , anyone else or just me

Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > Vpshive / Gigenet Cloud VPS Down , anyone else or just me

Posted by Krazy, 06-17-2010, 03:55 AM
Our machine at gigenet cloud has been down for last 1hr + , anyone else experiencing issues with them, last week or so, there was maintenance window which took the machine down for 4hrs, tonight again down without any notification, Support ticket has been opened, shows looking at issue.

cloud supposed have higher uptime than a dedi or vps with hardware failover capability, not sure what is happening
Posted by SaaSMX, 06-17-2010, 03:59 AM
Gigenet Cloud is down.

There is no info on their blog, neither from support and their forums and client portal not loading either.

I wonder how long will they keep experimenting with their "cloud" before it can be stable and reliable.

Getting tired of it.
Posted by WireNine, 06-17-2010, 04:01 AM
Were you notified of the maintenance window last time or did it suddenly occur without any notification both times?

Tried loading their forums, but they don't load forums. thegcloud. com
Posted by Indichosts.net, 06-17-2010, 04:02 AM
I thought they had it fixed in the last maintenance.
Posted by Krazy, 06-17-2010, 04:03 AM
last time there was notification, today no notification
Posted by SaaSMX, 06-17-2010, 04:04 AM
Appearently not.

Anyway, it seems to be coming up now.
Posted by Krazy, 06-17-2010, 04:05 AM
online now,
Posted by Winstyn, 06-17-2010, 04:15 AM
Hello,

The chassis on one of our distribution routers had a fabric failure. This failure caused certain modules to show a variety of errors. After debugging the issue and finding the fabric malfunction we quickly replaced the chassis itself with a spare. The network is now back fully on-line.

No machines were rebooted, this was purely network related. Only a certain portion of the cloud was affected by this. We have ordered a replacement chassis this will give us another spare until we migrate to the chassis that has been ordered.

GigeNET has not had a network outage in over 2 years. It is unfortunate that our new cloud product suffered isolated network loss from this event. Network interruption is not a common event here and we plan to take every precaution to prevent this kind of event in the future.

Our emergency network response team was able to replace the chassis in record time and we thank them for resolving this outage in a very timely manner.

We thank you for your continued patronage.

Best Regards
Posted by layer0, 06-17-2010, 04:53 AM
I can attest that Gigenet has been very solid over the past few years, an outage like this is a rare occurrence and it appears that it was handled quite well.
Posted by RSkeens, 06-17-2010, 08:37 AM
Merged the threads to keep things in one place.
Posted by nibb, 09-07-2010, 07:22 PM
Thats funny. Cloud suppose to have better uptime? Guess again.

I signed a test account with Gigenet like 7 days ago and today its down like for the last 5 hours. Thats like 90% or less uptime so far. Glad its only a test server. They said problems with the SAN. I would think 10 times before putting something on a cloud, it seems the story its the same with almost all providers that have some type of cloud services, it seems people just publish virtual servers and call them cloud which they are not. Amazon is true cloud, even when they did had downtime, its not like a san failure takes you down, thats what happens with virtual servers, its really not distributed among devices, its just elastic.

I guess cloud is really not ready yet, people with single servers or vps experience better uptimes and I can know speak from personal experience as well. But then again some providers think SAN are miracle devices where you can put 100 servers on it, the truth is far for real, real benchmarks shows that after putting 15-20 servers on a SAN you experience slower access then a single server using just raid. Most providers that have downtimes with their clouds always seems to rely on the storage back end. Even if its netapp or just any other fancy storage its just a big raid with lots of fast disks, eventually they are oversold as well very fast. Im not sure what kind of a problem can put their SAN down for hours, and they said they would just disconnect it for improving performance which is bizarre putting customers down for hours to get a little more speed improvements. Ok they are cheap, but I suppose you get what you pay for.
Posted by bsh, 09-07-2010, 07:28 PM

Quote:

Originally Posted by nibb

Thats funny. Cloud suppose to have better uptime? Guess again.

I signed a test account with Gigenet like 7 days ago and today its down like for the last 5 hours. Thats like 90% or less uptime so far. Glad its only a test server. They said problems with the SAN. I would think 10 times before putting something on a cloud, it seems the story its the same with almost all providers that have some type of cloud services, it seems people just publish virtual servers and call them cloud which they are not. Amazon is true cloud, even when they did had downtime, its not like a san failure takes you down, thats what happens with virtual servers, its really not distributed among devices, its just elastic.

I guess cloud is really not ready yet, people with single servers or vps experience better uptimes and I can know speak from personal experience as well. But then again some providers think SAN are miracle devices where you can put 100 servers on it, the truth is far for real, real benchmarks shows that after putting 15-20 servers on a SAN you experience slower access then a single server using just raid. Most providers that have downtimes with their clouds always seems to rely on the storage back end. Even if its netapp or just any other fancy storage its just a big raid with lots of fast disks, eventually they are oversold as well very fast. Im not sure what kind of a problem can put their SAN down for hours, and they said they would just disconnect it for improving performance which is bizarre putting customers down for hours to get a little more speed improvements. Ok they are cheap, but I suppose you get what you pay for.

The cloud is more than ready. Issues, regardless of the hosting platform, are bound to happen. Being that the term and underlying technology is somewhat new depending on what's being used and/or adopted, results are bound to vary based on the provider and aforementioned.
Posted by nibb, 09-07-2010, 07:33 PM

Quote:

Originally Posted by bsh

The cloud is more than ready. Issues, regardless of the hosting platform, are bound to happen. Being that the term and underlying technology is somewhat new depending on what's being used and/or adopted, results are bound to vary based on the provider and aforementioned.

You mean the real cloud or just virtual servers with metered prices on a san storage that can scale only to what ever the node supports? What ever cloud means to everyone as its as old as xen and virtualization exist. Its just seems new because Amazon and Google just took Xen and deployed it on thousands of servers to be scaled.
Posted by Winstyn, 09-07-2010, 07:35 PM

Quote:

Originally Posted by bsh

The cloud is more than ready. Issues, regardless of the hosting platform, are bound to happen. Being that the term and underlying technology is somewhat new depending on what's being used and/or adopted, results are bound to vary based on the provider and aforementioned.

What happened today was a result of a hardware failure and the redundancy in place has recovered which in a normal situation would have been a complete loss of data.

Furthermore, without redundant SANs there would have been grave data loss in a situation like this and VMs would need to be restored from customer backups.

Keep in mind that while clouds offer better redundancy in several areas they are also used more efficiently, which causes a higher hardware turnover rate. With specific regards to hard disks. Thus, the chances of having to recover from a complete hardware failure is much more likely.

In our case today only a small portion of our cloud experienced an outage while a good majority of our customers experience little or no problems.

Thanks
Posted by ZKuJoe, 09-07-2010, 09:29 PM
My VM wasn't down that long. As for the redundancy in question, had the redundancy not been in place the SAN would have went offline without warning, at least with the redundancy in place they were able to schedule the downtime to when it was convenient for them to fix it instead of scrambling in the middle of the night to get things working.

Trust me, GigeNET is continuously improving their cloud, I have been with them for almost 4 months and the longer I'm with them the better they get.
Posted by Winstyn, 09-07-2010, 11:10 PM

Quote:

Originally Posted by ZKuJoe

My VM wasn't down that long. As for the redundancy in question, had the redundancy not been in place the SAN would have went offline without warning, at least with the redundancy in place they were able to schedule the downtime to when it was convenient for them to fix it instead of scrambling in the middle of the night to get things working.

Trust me, GigeNET is continuously improving their cloud, I have been with them for almost 4 months and the longer I'm with them the better they get.

At this point all but a very very small group is still being recovered. Without our SAN redundancy there would definitely be a lot more extended downtime today.

We are also starting to look into SAN technology that can offer us better redundancy and fault tolerance to avoid events like this in the future. The goal is to withstand complete hardware failure with no interruptions.

Thanks
Posted by Krazy, 09-08-2010, 12:39 AM
Hi, what would be ETA for restoration, our system is down too
Posted by SaaSMX, 09-08-2010, 02:02 AM
Down for 10 hours and counting.

You can talk wonders about your service/platform, but they meant nothing if the reality shows other way.

Don't tell us, show us.

I have experience with 6 VPS providers (5 top and 1 budget) and your cloud, and you have the worst uptime so far, getting tired already. I really think your platform is still in BETA.

I'm looking elsewhere already.
Posted by ZKuJoe, 09-08-2010, 02:10 AM
Have you tried opening a ticket or calling them by chance? They may be able to provide you an update specifically relating to your VM. From the looks of it there are only a few remaining VMs affected by this so you can try contact them for more information. Not to mention Chris (VP of Operations) has already sent out an e-mail stating that all affected users are able to get a credit for the downtime they've experienced.
Posted by SaaSMX, 09-08-2010, 02:26 AM
I opened a ticket the minute it went down and they have no info, tell me to look at their forums for updates, there is not much there either. The last update says that they are at 65% of the process, 65% at hour tenth !
Posted by ZKuJoe, 09-08-2010, 02:30 AM
I would definitely call them if I were you. At the very least it might bump your VM to the top of their list.
Posted by SaaSMX, 09-08-2010, 02:37 AM
This is the last update I got on my ticket:

Quote:

Hello,

Our developers are still working on this at this time, as soon as we have more information we will update you, We do apologize for any inconvenience this may cause.

They have nothing after 10 hours of downtime. NICE.
Posted by ZKuJoe, 09-08-2010, 02:41 AM
I wouldn't call bringing VMs back online 10-20 minutes after a complete SAN failure "nothing".
Posted by SaaSMX, 09-08-2010, 02:58 AM

Quote:

Originally Posted by ZKuJoe

I wouldn't call bringing VMs back online 10-20 minutes after a complete SAN failure "nothing".

What is this ? Do you work for them ?

My VM has been down almost 11 hours now. What are you talking about ?

All I care is about the facts, I don't care about they doing the Tlaloc Dance with their feet tied.
Posted by ZKuJoe, 09-08-2010, 03:03 AM
My VM was down for about 10-20 minutes before it came back online, which I find impressive considering the problem was SAN related which never usually ends well.

It seems that they are still working on a few VMs but it sounds like the majority of them never even went down and of those that did only a few remain offline.
Posted by SaaSMX, 09-08-2010, 03:08 AM

Quote:

Originally Posted by ZKuJoe

My VM was down for about 10-20 minutes before it came back online, which I find impressive considering the problem was SAN related which never usually ends well.

It seems that they are still working on a few VMs but it sounds like the majority of them never even went down and of those that did only a few remain offline.

You are just doing PR here without any facts.

They haven't updated the thread in the forum and their last update states they are at 65% and you have read my updated ticket as well, please show proof about your statements because to me you have no clue of what is going on.
Posted by realvaluehosting, 09-08-2010, 03:52 AM

Quote:

Originally Posted by VS-fam

What is this ? Do you work for them ?

My VM has been down almost 11 hours now. What are you talking about ?

All I care is about the facts, I don't care about they doing the Tlaloc Dance with their feet tied.

Joe, could you please let us know if you work for them?
Posted by rezilient, 09-08-2010, 05:57 AM
My VM is still down. Going on 14 hours. I woke up this morning and was upset to see no updates on the GigeNet forum or my ticket.
Posted by rezilient, 09-08-2010, 05:59 AM

Quote:

Originally Posted by ZKuJoe

I wouldn't call bringing VMs back online 10-20 minutes after a complete SAN failure "nothing".

10-20 minutes? You must be the only one bud.

This guy is either a GigeNet fan boy, or works for them.
Posted by rezilient, 09-08-2010, 06:16 AM

Quote:

Originally Posted by rezilient

10-20 minutes? You must be the only one bud.

This guy is either a GigeNet fan boy, or works for them.

I just confirmed, ZKuJoe works for GigeNet. On his site he says he is part of their forum admin staff.

http://www.jmd.cc/Thread-Woot-I-m-no...ET-forum-staff

Quote:

Woot! I'm now a part of the GigeNET forum staff!
I just wanted to add that I was promoted to forum admin a while back and was put in charge of changing the forums around which I did.

So you can obviously strike his comments from the record as being biased.
Posted by rezilient, 09-08-2010, 06:19 AM
Originally this is the notification I got after my server was already down:

"We are performing maintenance on one of our SANs to improve speed, and will be pausing affected VMs briefly while we switch back to primary storage."

Coming from an ITIL background, there is a big difference between "maintenance.. to improve speed" and an "outage". If the former, it should have been done during a scheduled change window.

At this point it's blatantly obvious the GigeNet team was trying to pass this off as maintenance, while in reality it is a serious outage, and it should have been communicated as such.
Posted by SaaSMX, 09-08-2010, 11:05 AM
I never got informed about this as a maintenance.

The time I've been with them they've had more maintenances than any other provider.

My last payment was on the 6th, I'm moving away as soon as I get access to my info, so I asked for refund on that last payment already. There is no credit enough to make me (or anybody else I think) stay with such an unreliable service.

I don't have words to describe this situation, 19 hours of no service ! And what is worse, they clearly don't have a clue of what is going on still.

Like I said before, their cloud is still in BETA. Their platform is a joke with a nice panel, that's all.
Posted by Krazy, 09-08-2010, 11:34 AM
time for pack up
Posted by SaaSMX, 09-08-2010, 01:20 PM
Anyone one has anything on this ?

They have no information whatsoever here, their forums or tickets.

Awful.
Posted by rezilient, 09-08-2010, 01:45 PM
My machine is up and running right now, but preparing for the worst. I am currently planning to purchase another VPS with another provider as a failover. Something I was planning, but never got around to. Somehow these situations always give ya the kick in the pants to do smart things like that.
Posted by SaaSMX, 09-08-2010, 01:56 PM
I'm still down, I won't setup a failover, I will move to a provider with a solid and well tested platform, I'm done with gigenet experiments on cloud.
Posted by rezilient, 09-08-2010, 02:00 PM
I don't mean to hijack the thread, but can someone PM me their VPS provider suggestions? I am looking for 512MB dedicated, preferably running Xen so I can setup my iptables the way I want.
Posted by Winstyn, 09-08-2010, 02:03 PM

Quote:

Originally Posted by VS-fam

I'm still down, I won't setup a failover, I will move to a provider with a solid and well tested platform, I'm done with gigenet experiments on cloud.

Sorry to hear your experience hasnt been inline with many others. I would like to note that there was no experiment here. A failure caused a certain portion of the system to need manual recovery.

As stated in several places. This affected a very small portion of the cloud and there has been zero data loss. Anyone who has experienced extended downtime due to the nature of issue, please contact us and we will do what we can for you.
Posted by ZKuJoe, 09-08-2010, 02:05 PM
For the record I do not work for GigeNET. I volunteer on their forum but I can assure you that I wouldn't stick around if I didn't think they were the best host for me and my clients. I get no special perks other than the ability to remove spam from their forum.

My VM was affected by this SAN issue and was up within 10-20 minutes. That is the facts.
Posted by Winstyn, 09-08-2010, 02:06 PM

Quote:

Originally Posted by VS-fam

I never got informed about this as a maintenance.

The time I've been with them they've had more maintenances than any other provider.

My last payment was on the 6th, I'm moving away as soon as I get access to my info, so I asked for refund on that last payment already. There is no credit enough to make me (or anybody else I think) stay with such an unreliable service.

I don't have words to describe this situation, 19 hours of no service ! And what is worse, they clearly don't have a clue of what is going on still.

Like I said before, their cloud is still in BETA. Their platform is a joke with a nice panel, that's all.

This was not a maintenance, what at first was thought of as slow down in the system which we are going to attempt fix turned out to be a complete hardware failure on a portion of one of the SANs. We have recovered everyones machines at this time.

Of course we are not happy about the amount of time this has taken. Any platforms can experience this kind of problem and many providers would have been restoring from backups in a case like this.

We have been working around the clock to restore what was down with this problem and will continue to do so until all remaining problems are rectified.
Posted by Winstyn, 09-08-2010, 02:09 PM

Quote:

Originally Posted by ZKuJoe

For the record I do not work for GigeNET. I volunteer on their forum but I can assure you that I wouldn't stick around if I didn't think they were the best host for me and my clients. I get no special perks other than the ability to remove spam from their forum.

My VM was affected by this SAN issue and was up within 10-20 minutes. That is the facts.

I can confirm this for Joe, he just helps on our boards and acts as a liaison when we need.

As a test to his facts, the downtime yesterday was varied for many people and was isolated in certain parts of the cloud. Many of our customers experienced no downtime. Also we are doing what we can to take care of anyone who ended up experiencing longer downtime that others.

We apologize for what has happened and are looking for new horizons to prevent this in the future.

Thanks
Posted by Winstyn, 09-08-2010, 02:12 PM

Quote:

Originally Posted by rezilient

Originally this is the notification I got after my server was already down:

"We are performing maintenance on one of our SANs to improve speed, and will be pausing affected VMs briefly while we switch back to primary storage."

Coming from an ITIL background, there is a big difference between "maintenance.. to improve speed" and an "outage". If the former, it should have been done during a scheduled change window.

At this point it's blatantly obvious the GigeNet team was trying to pass this off as maintenance, while in reality it is a serious outage, and it should have been communicated as such.

At first we were only trying to correct a slowdown of the system, so it was considered a maintenance. During the attempt to rectify this what really had happened showed through. The redundancy of the SANs had allowed the system to stay up even after the failure. When we attempted to fail completely over to the backup SAN we lost the map between VMs and drives. The extra downtime was spent recreating this map.

In the future backups of this map will be taken and a disaster recovery plan for this sort of event in the future has been created.
Posted by SaaSMX, 09-08-2010, 02:23 PM

Quote:

Originally Posted by Winstyn

Of course we are not happy about the amount of time this has taken. Any platforms can experience this kind of problem and many providers would have been restoring from backups in a case like this.

Without having any technical details on the situation and the damages, I have never experienced something like that (so far) with any provider (restoring from user backups). And the longest outage I have experienced has been for 8 hours. Call me lucky if you want.

The only one I can think of (restoring from user backups) was a low budget shared hosting company some years ago.
Posted by ZKuJoe, 09-08-2010, 02:31 PM
VS-fam, have you been receiving the e-mail updates? The one from their VP had very detailed information about the situation. If you didn't receive the e-mails I'd be happy to post them for you.

As for the restoring from backups, there was just a high-end provider last week that lost a lot of their client data AND backups even with having an expansive backup system in place with redundancy.
Posted by SaaSMX, 09-08-2010, 02:36 PM

Quote:

Originally Posted by ZKuJoe

VS-fam, have you been receiving the e-mail updates? The one from their VP had very detailed information about the situation. If you didn't receive the e-mails I'd be happy to post them for you.

As for the restoring from backups, there was just a high-end provider last week that lost a lot of their client data AND backups even with having an expansive backup system in place with redundancy.

No I haven't, I have two email registered at their portal, one being on the dead VM and a GMail one. I have nothing on GMail.

I think it will help everybody if you post them.

Thanks
Posted by chrisarmer, 09-08-2010, 03:33 PM
The email we had sent out was as follows:

Today we experienced a problem effecting part of our Cloud services. The problem was due to a hardware failure on one of the SANs. The failover to the secondary SAN did not function as expected and we are currently working with the vendor to ensure that this does not happen again. Luckily we are redundant on all levels so restoration is a relatively easy task. However, due to the fact the drive mappings were lost we have to go through and manually fix the machines 1 by 1 from the pool of servers effected. There is no way to determine the order in which the VMs will be restored because of the drive mappings. All of our support staff and programming department are working to get all the machines properly restored in a timely fashion. An estimation for recovery will be anywhere from now and 1 hour from now.

In the immediate future (now) we will have the mappings backed up so if this event happens again the failover should only take a few minutes. We are also working with the vendor to ensure that the failover functions correctly with no downtime at all for all future failovers.

We apologize for the unfortunate turn of events that has caused this downtime and would gladly offer compensation to any and all customers effected by it. Please email us back with any questions and/or concerns you might have about these events.
Posted by Winstyn, 09-08-2010, 06:25 PM

Quote:

Originally Posted by chrisarmer

The email we had sent out was as follows:

Today we experienced a problem effecting part of our Cloud services. The problem was due to a hardware failure on one of the SANs. The failover to the secondary SAN did not function as expected and we are currently working with the vendor to ensure that this does not happen again. Luckily we are redundant on all levels so restoration is a relatively easy task. However, due to the fact the drive mappings were lost we have to go through and manually fix the machines 1 by 1 from the pool of servers effected. There is no way to determine the order in which the VMs will be restored because of the drive mappings. All of our support staff and programming department are working to get all the machines properly restored in a timely fashion. An estimation for recovery will be anywhere from now and 1 hour from now.

In the immediate future (now) we will have the mappings backed up so if this event happens again the failover should only take a few minutes. We are also working with the vendor to ensure that the failover functions correctly with no downtime at all for all future failovers.

We apologize for the unfortunate turn of events that has caused this downtime and would gladly offer compensation to any and all customers effected by it. Please email us back with any questions and/or concerns you might have about these events.

I would like to point out that at the time of writing we had predicted the remap process to go much quicker than it did. Towards the end of the process the SAN became heavily loaded from startups and day-to-day operations which lengthened the time it took our staff to recover from the situation.

I am glad to say we are finally out of the rabbit hole.