Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > JaguarPC VPS now down over 3 days.. Seriously?


JaguarPC VPS now down over 3 days.. Seriously?




Posted by milkmycow, 01-09-2013, 10:48 PM
My entire VPS has been down for DAYS.

http://forums.jaguarpc.com/server-ne...ect-com-2.html

They first posted at 8:25 central on the 6th of January. It's now 8:47pm on the 9th!!!!

Seriously?!

I was patient day one.. day two.. most of day three... but now.. this is just ridiculous

Is this normal or acceptable to anyone?

Posted by Zachary McClung, 01-09-2013, 10:55 PM
Quote:
Originally Posted by milkmycow
My entire VPS has been down for DAYS.

http://forums.jaguarpc.com/server-ne...ect-com-2.html

They first posted at 8:25 central on the 6th of January. It's now 8:47pm on the 9th!!!!

Seriously?!

I was patient day one.. day two.. most of day three... but now.. this is just ridiculous

Is this normal or acceptable to anyone?
I'm very sorry for the current outage. We are restoring from backup as quickly as possible. Thank you for your patience.

Posted by milkmycow, 01-09-2013, 10:59 PM
Quote:
Originally Posted by Zachary McClung
Thank you for your patience.
My patience is gone. There shouldn't be so much data to restore that it takes OVER THREE DAYS.

Posted by Zachary McClung, 01-09-2013, 11:04 PM
Quote:
Originally Posted by milkmycow
My patience is gone. There shouldn't be so much data to restore that it takes OVER THREE DAYS.
You have every right to be upset. I completely understand that. Thank you for the patience so far. They are huge arrays and a lot of data even with our gige private network. We will get this resolved for you as soon as possible. If you PM your CID, I'll make sure you receive a more then fair SLA credit.

Posted by CD Burnt, 01-09-2013, 11:25 PM
{moved to outages}

reminder: this thread is now only for the customers and the provider.

Posted by milkmycow, 01-10-2013, 06:59 PM
STILL down. unbelievable.

Posted by ModelWebHost, 01-11-2013, 07:51 AM
Very frustrated situation. Server is down for more than 6 days. Sometimes, they say we are restoring from backups and sometimes say they are rebuilding the RAID. Moreover, they said 2 days ago that they restored the 17 gb data and now after 2 days, nothing has been restored.
I have been hosting with other server providers but I have not seen a situation like JPC have ever in my 7 years webhosting business.
Morever, I can't understand that if they keep backups then why don't they restore it? Backups are taking 6 days to be restored? Last time they took more than 19 days for restoration on friday node.
Zach is also not doing anything for us, as we are in dark and don't know when the server will be online?

Posted by Suldominios, 01-11-2013, 09:05 AM
19 Days?
I'll loose my job monday!
im an employee responsible for the site of my working place and the owner is inviting me to leave at monday if the site and email system don't come up. Nice isn't it?

Posted by ModelWebHost, 01-11-2013, 09:30 AM
Quote:
Originally Posted by Suldominios
19 Days?
I'll loose my job monday!
im an employee responsible for the site of my working place and the owner is inviting me to leave at monday if the site and email system don't come up. Nice isn't it?
Yes. I am serious. Check this thread and you will find the truth.
http://forums.jaguarpc.com/server-ne....nocdirect.com

I was expecting a big change and that's why I continued the services but I will not continue this time. Zach, do only one thing, just restore the VPS so that I can take my data to elsewhere.

Posted by Zachary McClung, 01-11-2013, 10:36 AM
Quote:
Originally Posted by Hostinpk-CEO
Very frustrated situation. Server is down for more than 6 days. Sometimes, they say we are restoring from backups and sometimes say they are rebuilding the RAID. Moreover, they said 2 days ago that they restored the 17 gb data and now after 2 days, nothing has been restored.
I have been hosting with other server providers but I have not seen a situation like JPC have ever in my 7 years webhosting business.
Morever, I can't understand that if they keep backups then why don't they restore it? Backups are taking 6 days to be restored? Last time they took more than 19 days for restoration on friday node.
Zach is also not doing anything for us, as we are in dark and don't know when the server will be online?
As we have attempted to explain multiple times in our 30+ e-mails and our forum threads. The array on the server is a huge array. We pulled as much of the recent data off the old Raid Array as possible in order to provide you the freshest data possible. We then rebuilt the server and the raid array because the controller went back. We then restored the backed up data from the drive and we are now syncing the difference between the drive data and our backup data to make sure you have all your data in the latest form possible. Once those syncs are completed which they should be completing soon, we will start to bring the VEs online.

In regards to starting your restore, we did start the restore of your data and it did get to 17 GB complete; however, the server killed the process in the middle of the copy due to the fact the array was still not fixed. We are doing everything possible to bring this server back online as soon as possible.

Posted by CFoxHost, 01-11-2013, 11:03 AM
This makes me nervous,since I have a VPS with them :\

Posted by ckpeter, 01-11-2013, 11:13 AM
Quote:
Originally Posted by Zachary McClung
As we have attempted to explain multiple times in our 30+ e-mails and our forum threads. The array on the server is a huge array. We pulled as much of the recent data off the old Raid Array as possible in order to provide you the freshest data possible. We then rebuilt the server and the raid array because the controller went back. We then restored the backed up data from the drive and we are now syncing the difference between the drive data and our backup data to make sure you have all your data in the latest form possible. Once those syncs are completed which they should be completing soon, we will start to bring the VEs online.

In regards to starting your restore, we did start the restore of your data and it did get to 17 GB complete; however, the server killed the process in the middle of the copy due to the fact the array was still not fixed. We are doing everything possible to bring this server back online as soon as possible.
Zachary, can you provide the details on exactly what went wrong?

I have been patient, and I continue to be patient, with the outage. However, it is getting difficult to understand what happened that caused the extended downtime.

From the initial updates, it seem like only a disk went bad, which in a RAID 10 setup, should be trivial to fix. I was told that the service restoration is coming "very soon" for the last two days, but it hasn't happened for me. There is also talk of backup, which is unexpected, because in a RAID setup, there shouldn't be any backup, just swapping out of a bad disk.

Also, can you comment on data loss or corruption potentials? What exactly is the nature of the backup that is being restored, and what kind of data loss can we expect between the latest backup and the time when the server went down?

Posted by Zachary McClung, 01-11-2013, 11:56 AM
Quote:
Originally Posted by ckpeter
Zachary, can you provide the details on exactly what went wrong?

I have been patient, and I continue to be patient, with the outage. However, it is getting difficult to understand what happened that caused the extended downtime.

From the initial updates, it seem like only a disk went bad, which in a RAID 10 setup, should be trivial to fix. I was told that the service restoration is coming "very soon" for the last two days, but it hasn't happened for me. There is also talk of backup, which is unexpected, because in a RAID setup, there shouldn't be any backup, just swapping out of a bad disk.

Also, can you comment on data loss or corruption potentials? What exactly is the nature of the backup that is being restored, and what kind of data loss can we expect between the latest backup and the time when the server went down?
Thank you for your patience. Initially it was thought to be simply a drive failure. We were going to replace the drive, do an FSCK and be back online in shortly. The FSCK started running long and ended up failing. The server was rebooted and then wouldn't come back online at all. One of the data center techs noticed that the raid controller went bad which ended up being the entire cause of the issue.

Once the new controller was installed we attempted to FSCK again hoping the raid wasn't lost and it would be a quick process. That FSCK started shooting out errors, that is when we knew it was time to start restoring. In order to provide the latest and provide the quickest resolution possible, we began backing up the savable information from the array. Which allowed us to do the process much quicker then R1Soft allows. Once that backups was complete and restored, we than started syncing data with the backups we have of the node which will drastically reduce the time of the overall restore (preventing 19 days down).

As far as data loss is concerned, there could be some in the case with any backup/restore. Like I said above, we tried to pull as much good data off the raid array as possible to prevent as much data loss as possible between the last backup of Dec. 18th and when the server went down.

As far as VE coming back online, several VE are already back online. As the R1Soft sync completes we are turning them. While I'd love to give an ETA on each one; however, each one takes a different amount of time. Depending on whether they are big files or small files makes a difference too.

Posted by SoftWareRevue, 01-11-2013, 12:44 PM
Quote:
Originally Posted by Zachary McClung
... We are doing everything possible to bring this server back online as soon as possible.
Quote:
Originally Posted by ckpeter
Zachary, can you provide the details on exactly what went wrong?...
Quote:
Originally Posted by Zachary McClung
Thank you for your patience. Initially it was thought to be simply a drive failure...
Sorry for jumping in when I'm not involved with this service, but when I see providers that are helpful and detailed, I have to say something.

These are very good responses, Zach. Good communication goes a long ways. Keep it up!

Posted by ModelWebHost, 01-11-2013, 01:09 PM
Quote:
Originally Posted by Zachary McClung
As we have attempted to explain multiple times in our 30+ e-mails and our forum threads. The array on the server is a huge array. We pulled as much of the recent data off the old Raid Array as possible in order to provide you the freshest data possible. We then rebuilt the server and the raid array because the controller went back. We then restored the backed up data from the drive and we are now syncing the difference between the drive data and our backup data to make sure you have all your data in the latest form possible. Once those syncs are completed which they should be completing soon, we will start to bring the VEs online.

In regards to starting your restore, we did start the restore of your data and it did get to 17 GB complete; however, the server killed the process in the middle of the copy due to the fact the array was still not fixed. We are doing everything possible to bring this server back online as soon as possible.
OK. I can understand that drive went bad but check your forum and you will see several threads regarding the same problem. Each time your drives becomes failed ones.
So, why don't you bring the change and so far I remember, last time you said that we are going to bring the change in the whole system of r1soft backups but still its same.

I will suggest you to change the whole structure of your backup system and specially your DRIVES that are mostly failed and restore takes weeks and weeks.

Posted by Suldominios, 01-11-2013, 01:21 PM
Quote:
Originally Posted by SoftWareRevue
Sorry for jumping in when I'm not involved with this service, but when I see providers that are helpful and detailed, I have to say something.

These are very good responses, Zach. Good communication goes a long ways. Keep it up!
I understand all those points of view. There is at least two kind of "responses" to an issue, one of them is the detail level in the support guy speech, bringing to us (customers) a "panorama" of the situation itself, and the other one is the technical team real "Response" to the problem itself (putting hands at work to solve it). I believe that both responses are necesary to make a good quality of service.
Im sure that all providers face problems and thats why the warranty must be 99% and not a fake 100%, thats ok, but if a server get down for days something is wrong and a good text in the forum will not represent a satisfactory RESPONSE to the problem.
If this is take like a normal thing, me and all the other customers, will naturally start to think that all nodes in jaguarpc are time bombs ready to let us with no service for days in the next minutes.
pd: You don't give me an ETA, i don't give to my customer an ETA, my customer don't give me more money and this breaks the chain for ever.

Posted by ckpeter, 01-11-2013, 01:31 PM
Quote:
Originally Posted by Zachary McClung
Thank you for your patience. Initially it was thought to be simply a drive failure. We were going to replace the drive, do an FSCK and be back online in shortly. The FSCK started running long and ended up failing. The server was rebooted and then wouldn't come back online at all. One of the data center techs noticed that the raid controller went bad which ended up being the entire cause of the issue.

Once the new controller was installed we attempted to FSCK again hoping the raid wasn't lost and it would be a quick process. That FSCK started shooting out errors, that is when we knew it was time to start restoring. In order to provide the latest and provide the quickest resolution possible, we began backing up the savable information from the array. Which allowed us to do the process much quicker then R1Soft allows. Once that backups was complete and restored, we than started syncing data with the backups we have of the node which will drastically reduce the time of the overall restore (preventing 19 days down).

As far as data loss is concerned, there could be some in the case with any backup/restore. Like I said above, we tried to pull as much good data off the raid array as possible to prevent as much data loss as possible between the last backup of Dec. 18th and when the server went down.

As far as VE coming back online, several VE are already back online. As the R1Soft sync completes we are turning them. While I'd love to give an ETA on each one; however, each one takes a different amount of time. Depending on whether they are big files or small files makes a difference too.
First of all, Zachary, thank you for the details.

The difficulty for me is that, without the RAIL controller failure situation you just provided, it was not possible for me to respond appropriately to my own clients. Initially, my thought was that this was a simple drive failure, and I quoted to my client of resolution within 24 hours. This has gone on for around 100 hours now, and it is understandable for a RAID controller failure with potential data loss, but I did not know about this until now.

I would have liked to know about the RAID controller failure two days earlier, where I could have scheduled my own responses earlier.

Another issue is the data loss. From experience, as soon as I saw the outage going on for more than a day with "backup" involved, I knew this was not a typical outage, and I asked in the ticket system, twice, about potentials for data loss. Both times I was told that there was no data loss or corruption expected.

But clearly, restoring a backup from two weeks ago, then needing to sync with a corrupted file system, implies real potentials for data loss.

Zachary, I have been with JPC for more than 7 years, and I trust your team is going their very best in bringing service online.

Posted by ModelWebHost, 01-11-2013, 01:35 PM
Quote:
Originally Posted by ckpeter
First of all, Zachary, thank you for the details.

The difficulty for me is that, without the RAIL controller failure situation you just provided, it was not possible for me to respond appropriately to my own clients. Initially, my thought was that this was a simple drive failure, and I quoted to my client of resolution within 24 hours. This has gone on for around 100 hours now, and it is understandable for a RAID controller failure with potential data loss, but I did not know about this until now.

I would have liked to know about the RAID controller failure two days earlier, where I could have scheduled my own responses earlier.

Another issue is the data loss. From experience, as soon as I saw the outage going on for more than a day with "backup" involved, I knew this was not a typical outage, and I asked in the ticket system, twice, about potentials for data loss. Both times I was told that there was no data loss or corruption expected.

But clearly, restoring a backup from two weeks ago, then needing to sync with a corrupted file system, implies real potentials for data loss.

Zachary, I have been with JPC for more than 7 years, and I trust your team is going their very best in bringing service online.
Same here. I have been with JPC for more than a year and only this is the point I think to leave JPC. They should bring changes into their backup system.

Posted by Zachary McClung, 01-11-2013, 02:11 PM
Thank you for your responses, we do appreciate the understanding.

The bad drives that were identified as a bad batched that had caused the Friday node issue have been identified and replaced company wide already. This particular issue was a standard hardware failure which occures from time to time with all providers. We catch a lot of these ahead of time with our monitoring team who check servers weekly manually and our software that monitors them 24/7. Unfortunately, this one did not provide signals of failure.

In regards to the changes in the R1Soft backups changes, we are in the middle of upgrading all of our servers from the current version to v5. When you have the number of servers we do, upgrades do take time. We look to complete the upgrades as quickly as possible.

As far as the lack of communication by our tech support team on specifics of the issue, that is my fault and I do apologize. The general support team was briefed quickly on the issue and our special team that handles restores went to work. They have spent 24 hours per day managing this and I have spent 18 hours day working with them and customers who have been reaching out directly. As I said before, we initially thought this was a simple outage that could be resolved quickly.

We will continue to work to restore the VE and turn them on as they become fully restored. Thank you for your patience.

Posted by ckpeter, 01-11-2013, 06:17 PM
Zachary, can you give an estimate (even if rough) on when the VPS will be fully restored. I have a ticket in the system asking the same question for my own VPS, but have not been replied to.

Also, I am confused on the progress and updates.

At 4 AM, status was "609GB of data has been restored so far. Since small files are being restored at this time it's much slower for small files. "

At 6 AM, the status was "612 GB of data has been restored so far. "

At 3 PM, 9 hours later, the status was "Total restored is at 587G, we are continuing to monitor the restore process. "

Seems like we are making negative progress? Also, it took 2 hours to restore 3 GB of data?

Are you guys restoring in parallels? It sounds like you may be doing the restoration one-at-a-time. How about restoring the downed VPS to multiple host servers and speeding the process up by using multiple host IO controllers?

Normally I am fairly patient, even for a multi-day outage, but at this point, I would appreciate some clear updates and some ETA time frame. It will be helpful to at least know if we are talking about restoration in the next few hours, over the weekend, or another week.

Thank you.

Posted by Zachary McClung, 01-11-2013, 06:47 PM
Quote:
Originally Posted by ckpeter
Zachary, can you give an estimate (even if rough) on when the VPS will be fully restored. I have a ticket in the system asking the same question for my own VPS, but have not been replied to.

Also, I am confused on the progress and updates.

At 4 AM, status was "609GB of data has been restored so far. Since small files are being restored at this time it's much slower for small files. "

At 6 AM, the status was "612 GB of data has been restored so far. "

At 3 PM, 9 hours later, the status was "Total restored is at 587G, we are continuing to monitor the restore process. "

Seems like we are making negative progress? Also, it took 2 hours to restore 3 GB of data?

Are you guys restoring in parallels? It sounds like you may be doing the restoration one-at-a-time. How about restoring the downed VPS to multiple host servers and speeding the process up by using multiple host IO controllers?

Normally I am fairly patient, even for a multi-day outage, but at this point, I would appreciate some clear updates and some ETA time frame. It will be helpful to at least know if we are talking about restoration in the next few hours, over the weekend, or another week.

Thank you.
There was a typo in the update. We are at 626 GB of data restored. I hate to give rough estimates simply for the fact that folks start making assumptions based on it and then start yelling when it doesn't happen on time. Based off the size of the restore we are 68 - 70% of the way complete. Customers are coming online after each VE restore. We have had VEs coming online throughout the day and a majority of customers have already come back online. Depending on the make up of the remaining files we could be looking at late tonight/early Saturday morning or late Sunday night for the very last customer. It simply depends on the mix of files and how much of the 30% remaining wasn't recovered by the backup from the old raid array.

R1Soft is doing the restore and on the version we are currently on, it only allows one restore at a time. The latest version that we will be rolling out in the coming months apart of our revamp will allow us to complete multiple restores at the same time resulting in a much faster restore.

Posted by milkmycow, 01-12-2013, 12:24 AM
4:39pm
"Total restore is currently at 626G and still progressing."

9:36pm
"Total amount of restore data has reached 627G and rest is still being done. Thank you."


1gb in 5 hours? I hope this is another typo. Now down over 5 days. There shouldn't be so many vps accounts on a single server that it takes FIVE DAYS or more to restore.

Posted by milkmycow, 01-12-2013, 12:30 AM
Also, I have yet to see one person say their vps on dalinar is functional again. So, anyone who is, please let it be known.

Posted by (Stephen), 01-12-2013, 03:00 AM
Zach, has your team given thought to changing RAID types? Even going with more raid arrays, even if on the same controller, but less risk when a drive fails, and spreading your VMs over them?

We had 4+ years ago a number of adaptec and LSI raid failures that led to problems, after this point we decided to start using only RAID1 or RAID10 4 drive volumes on VPS nodes and simply making multiples of them and spreading the IO across multiple arrays. This has resolved most every issue with rebuild times and IO on the nodes, and when you spread the IO it is still very good performance.

This also means you can setup your backups in better stages so you don't have to recover a TB+ of user data, but only the volume impacted even in worst case failure.

Posted by Zachary McClung, 01-12-2013, 03:11 AM
Quote:
Originally Posted by (Stephen)
Zach, has your team given thought to changing RAID types? Even going with more raid arrays, even if on the same controller, but less risk when a drive fails, and spreading your VMs over them?

We had 4+ years ago a number of adaptec and LSI raid failures that led to problems, after this point we decided to start using only RAID1 or RAID10 4 drive volumes on VPS nodes and simply making multiples of them and spreading the IO across multiple arrays. This has resolved most every issue with rebuild times and IO on the nodes, and when you spread the IO it is still very good performance.

This also means you can setup your backups in better stages so you don't have to recover a TB+ of user data, but only the volume impacted even in worst case failure.
We currently use RAID 10 on our VPS nodes and have for a long time now. We have setup them up with a special sauce to spread the IO out. The issue in the past were simply due to having a bad batch of drives in every slot of the Raid. Those drives have all been replaced. This was a case of simply a raid controller taking an early death. We have sweet setups for our nodes and have had less then 1% of them have issues. System wide this isn't an issue, it is just magnified in this scenario.

Posted by zsuatt, 01-12-2013, 04:04 AM
I guess the VPSes are OpenVZ? and your doing file based restore and that's why it "depends on the mix of the files" ?

Also, how are you handling the consistency? You've stated that some of it was restored from the broken RAID array, and the rest is synced from a backup. Does that mean half the files will be the "latest" version, while the rest will be from a month ago?

Posted by ckpeter, 01-12-2013, 06:26 PM
Quote:
Originally Posted by Zachary McClung
There was a typo in the update. We are at 626 GB of data restored. I hate to give rough estimates simply for the fact that folks start making assumptions based on it and then start yelling when it doesn't happen on time. Based off the size of the restore we are 68 - 70% of the way complete. Customers are coming online after each VE restore. We have had VEs coming online throughout the day and a majority of customers have already come back online. Depending on the make up of the remaining files we could be looking at late tonight/early Saturday morning or late Sunday night for the very last customer. It simply depends on the mix of files and how much of the 30% remaining wasn't recovered by the backup from the old raid array.

R1Soft is doing the restore and on the version we are currently on, it only allows one restore at a time. The latest version that we will be rolling out in the coming months apart of our revamp will allow us to complete multiple restores at the same time resulting in a much faster restore.
Zachary, I appreciate the ETA. I understand your reluctance to share an ETA for fear of being hold accountable for it, but at the same time, as clients, we do need some sort of guidance on what to expect.

I have been monitoring the progress and it looks like we are doing anywhere from 0.5 GB - 2 GB per hour of data restored. This concerns me, because assuming the earlier figure of 900+ GB of total data, we are looking at another 5 - 7 days before service recovery.

Are there any contingency plan to boost the restore speed? It sounds like the bottleneck is the file sync process on a hard disk. Have you considered getting a few super-fast SSDs, bulk copy everything onto them, then doing the sync on the SSDs to speed things up?

Also, if necessary, can I request to have my VPS restored using the Dec 18 backup, and wait to get the missing data from the RAID later?

Posted by Zachary McClung, 01-12-2013, 07:18 PM
Quote:
Originally Posted by ckpeter
Zachary, I appreciate the ETA. I understand your reluctance to share an ETA for fear of being hold accountable for it, but at the same time, as clients, we do need some sort of guidance on what to expect.

I have been monitoring the progress and it looks like we are doing anywhere from 0.5 GB - 2 GB per hour of data restored. This concerns me, because assuming the earlier figure of 900+ GB of total data, we are looking at another 5 - 7 days before service recovery.

Are there any contingency plan to boost the restore speed? It sounds like the bottleneck is the file sync process on a hard disk. Have you considered getting a few super-fast SSDs, bulk copy everything onto them, then doing the sync on the SSDs to speed things up?

Also, if necessary, can I request to have my VPS restored using the Dec 18 backup, and wait to get the missing data from the RAID later?
There are no contingency plans for speed. We have already restored the part from the hard drive, we are now restoring the segment from R1Soft. There is unfortunately no way to speed that process up. Also, due to the fact only one at a time can be processed there wouldn't be a way to restore your complete VPS to a different node.

We have hit a patch of 17 GB worth of small files. Each of the e-mail files are a couple kb each which is killing the speed of the restore. We are getting towards the end of that, looking for the transfer to pick back up soon. We went from hundreds of GB per day to tens of GB once.

Posted by milkmycow, 01-12-2013, 09:28 PM
Still down. This is so stupid.

Posted by Suldominios, 01-13-2013, 12:15 PM
We are going for 1 week down.
And we don't have an ETA because the ETA is about 19 Days
and the answer in the ticket system is like this:
We are restoring VPS containers from backups and there is currently no exact ETA please follow our updates on forum here :

Link
Contact us if you need any further assistance.

Posted by ModelWebHost, 01-13-2013, 12:20 PM
Quote:
Originally Posted by Suldominios
We are going for 1 week down.
And we don't have an ETA because the ETA is about 19 Days
and the answer in the ticket system is like this:
We are restoring VPS containers from backups and there is currently no exact ETA please follow our updates on forum here :

Link
Contact us if you need any further assistance.
You are right. From previous 3 days, I have been hearing from JPC that my vps is being restored and still going on. They are saying that I have large number of small files that is causing delay.
600+ GB restored before mine and there were no small files and they restored quickly? As I posted in my initial post, they will took almost 19 days and from current situation, we can expect such period of time.
Its the worst time for my business and sale has been dropped to 95%.

Posted by milkmycow, 01-13-2013, 03:16 PM
If one particular backup is going insanely slower than everyone else's, it should be stopped and put at the end of the line of restores so that everyone else doesn't have to suffer through it

Posted by CFoxHost, 01-13-2013, 06:34 PM
It's a shame, but I can't risk my customers potentially having to go through this at some point. I'm in the process of moving to a different provider now.

Posted by Suldominios, 01-13-2013, 09:56 PM
where you will host your vps CFox?

Posted by milkmycow, 01-13-2013, 11:51 PM
Still down and apparently overnight techs are only ones willing to update the official forum thread on their support site because last update was 8:30 am after several overnight updates

Posted by milkmycow, 01-15-2013, 12:11 AM
over 8 days downtime... WTF....

Posted by S4Host, 01-15-2013, 12:32 AM
Yep .. again down

Posted by milkmycow, 01-15-2013, 11:09 AM
Quote:
Originally Posted by S4Host
Yep .. again down
Again? I've been down nonstop for this whole time.

Posted by Suldominios, 01-15-2013, 11:49 AM
this must mean that his vps was restored an stop to work again i guess.

Posted by CFoxHost, 01-16-2013, 12:57 AM
Suldominios, I'm moved now, jaguarpc acccount cancelled. I went with thoughtbug.com. I worked with Mooneer at VenturesOnline so I know he has mad tech skills and I trust his character also so I know he won't cheap out. He understands servers/vps's must be p 24/7, no excuses. I cannot imagine him ever taking a week to get one back online. But I also can't imagine him running a RAID controller so long it dies before updating it either.

In short, I know and trust him. And he got my new VPS up and running wihin a few hours. I can sleep well tonight knowing my VPS will be there in the morning and my customers are well served

Posted by milkmycow, 01-16-2013, 01:03 AM
Sadly I don't have off server backups to switch bc I'm an idiot.

Posted by CFoxHost, 01-16-2013, 01:17 AM
You were trusting your provider when they claimed they were making backups as part of the service you were paying for. Don't be too hard on yourself
Certainly it is good to keep your own as well, but I wouldn't say you were an idiot. At most, just too trusting

Posted by ModelWebHost, 01-16-2013, 08:46 AM
Just a Quick updated.
Today exactly 16 days have been passed and still server has not been restored. They are restoring 1 gb data per 12 hours. Very much disappointed from Zach and his team. Even they tell us the new story each time.
12 hours ago, one of their system administrator Sam Lewis said that my vps restore will be completed very soon (within few hours), also zach confirmed, but still not online. Very tired of their system and It may be the worst experience with JPC.

Posted by Jeffreyw, 01-16-2013, 09:00 AM
I experience this downtime (4 days I think) with their Resellerzoom.

Posted by milkmycow, 01-16-2013, 09:11 AM
Yeah at the current restore rate, we won't be back online until this summer it looks like.

Posted by Suldominios, 01-16-2013, 01:30 PM
10 Days! restore velocity = 2Gb/day or 85Mb/Hour or 242Kb/s

Are they hand typing the datta in binary code? 00101010100101001001010010001010101001001

Posted by AlexandreSeo, 01-16-2013, 02:29 PM
Is Zach here ? Any ETA ? it's been a moment now since no updates...

Posted by ModelWebHost, 01-16-2013, 10:50 PM
Quote:
Originally Posted by AlexandreSeo
Is Zach here ? Any ETA ? it's been a moment now since no updates...
No. Zach will not reply to this thread because of fake promises made to us. Each and everyday comes with new statements.
24 hours ago, I was informed that my VPS is coming online shortly after the SUCCESSFUL restore in JUST 16 DAYS but still its offline.
Server was online for a few minutes and when I checked, none of the accounts was restored properly. Now, I don't understand that if they have not restored the VPS properly, what were they doing in last 16 days.

Posted by milkmycow, 01-17-2013, 03:04 PM
ya.. still down.. this blows

Posted by ModelWebHost, 01-18-2013, 10:18 AM
Here is the summary.
01/01/2013
Server Down

01-06-2013, 10:38 PM
identified a bad disk in the RAID

01-07-2013, 01:50 AM
manual File System Check (FSCK) started

01-08-2013, 05:09 AM
Data restoration Started

01-10-2013, 05:28 PM
Now, Restore from the backup started

2013-01-11/08:49:57
My VPS started to be restore

2013-01-16/13:26:33
After 5 days, they said the restoration is completed but it was not. Even a single account was not restored properly.

2013-01-16/13:53:08
Server was online but sites were not accessible and they said that we are fixing the permissions.
2013-01-17/01:18:48
Still permissions could not be fixed.

2013-01-18/08:43:30
They were not able to fix the permissions and one thing more that made me much worry, they said that cPanel guys could not fix the issue. Amazing, the persons who developed the software, could not solve the issue??This may be the worst story here at WHT forum. They are doing nothing but delaying the process and don't know that when the server be online??

Posted by Jag, 01-18-2013, 11:25 AM
Quote:
Originally Posted by Hostinpk-CEO
Yes. I am serious. Check this thread and you will find the truth.
http://forums.jaguarpc.com/server-ne....nocdirect.com

I was expecting a big change and that's why I continued the services but I will not continue this time. Zach, do only one thing, just restore the VPS so that I can take my data to elsewhere.
you are quoting some fluke issue we talked about , as you can see in that thread 5 months ago. Seriously why bring that up? Its long since been fixed.

A large node fsck'ing for hours before a failure, then we move to backups, can cause some real problems. But we have a fix, and its being deployed right now, started weeks ago.

Quote the rest of the story, the post from just 4 days ago that discussed this on our own forum where I highlighted the fixes being deployed.
http://forums.jaguarpc.com/suggestio...tml#post179259

And I will work on turning that into a giant blog ordeal to put this to rest.

Posted by Jag, 01-18-2013, 11:28 AM
Quote:
Originally Posted by Suldominios
19 Days?
I'll loose my job monday!
im an employee responsible for the site of my working place and the owner is inviting me to leave at monday if the site and email system don't come up. Nice isn't it?
Most were restored to other nodes much faster than that, one node 25 clients, most sent to other places but yes 10 or so had it the worst being on the bottom of the list with compounded problems, but that is a thread from 5 months ago. again , not this same issue or case as whats being discussed in this thread.

clients linking to 5 mo old threads for a current issue is a little misleading.

Posted by Jag, 01-18-2013, 11:32 AM
Quote:
Originally Posted by CFoxHost
This makes me nervous,since I have a VPS with them :\
You have no reson to be my friend, a few loud apples with an unusual set of circumstances, is far from the normal. You don't hear the thousands of other vps clients on here with an issue.

And don't take me wrong, these clients aren't wrong for voicing a concern on their box, I'm just trying to separate a few things, links from 5 month old threads that were long since resolved, are not related to every time a vps node needs a fsck or has a problem.

Let me more directly comment and help the clients that are having some issues. But rest easy.

Posted by AlexandreSeo, 01-18-2013, 11:33 AM
No more replies, everyone is left in the dark. On the homepage of jaguarpc, they say : 14 years of reliable service. Are they lying often ? Nothing on their status page about the issue... tech live chat support is slow and useless.. Is Zachary hiding himself or he is he fired ? Restoration is still going on as per their blog.

Posted by Jag, 01-18-2013, 11:38 AM
Quote:
Originally Posted by Hostinpk-CEO
Very frustrated situation. Server is down for more than 6 days. Sometimes, they say we are restoring from backups and sometimes say they are rebuilding the RAID. Moreover, they said 2 days ago that they restored the 17 gb data and now after 2 days, nothing has been restored.
I have been hosting with other server providers but I have not seen a situation like JPC have ever in my 7 years webhosting business.
Morever, I can't understand that if they keep backups then why don't they restore it? Backups are taking 6 days to be restored? Last time they took more than 19 days for restoration on friday node.
Zach is also not doing anything for us, as we are in dark and don't know when the server will be online?
6 days isnt painting the full picture. Our network providers got hit with a huge 50gb ddos, that took your and many others, even our own site down for a brief period.

Then a few days later that system had a failure, we were already in the process of changing out all its drives. Our vendor shipped us a system without the drives we asked for, and we didnt catch that until we saw a failure and the drive we pulled wasnt what we asked for. We contacted them, they said those drives should be solid, despite our request and 14 yr track record of using on wd and more specifically wd re drives or sas. Well we gave them the benefit and then, 4 drives failed at once. That was a definite sign.

We ordered a ton of wd's told the manufacturuer to take back the seagates and began swapping drives as we could. Problem is you cant drop in a good drive for the sake of it, when a system is already restoring and recovering from a bad drive. It has to stay in perfect condition and then let us swap them.

We did that for many systems and moved clients around to avoid that disruption, you won't hear that part of the story. Which is why i intend to blog this all from A to Z so you guys have a clear picture of what happened with these few cadmus nodes and why, and whats been done, and whats being done while some of us are here in Atlanta a while to clean this up.

It's my fault i didnt start bloggin this sooner to inform you all so you wouldn't feel you had to resort to a 3rd party forum.

Posted by Zachary McClung, 01-18-2013, 11:41 AM
Quote:
Originally Posted by AlexandreSeo
No more replies, everyone is left in the dark. On the homepage of jaguarpc, they say : 14 years of reliable service. Are they lying often ? Nothing on their status page about the issue... tech live chat support is slow and useless.. Is Zachary hiding himself or he is he fired ? Restoration is still going on as per their blog.
I'm still here nor am I hiding. We have been updating our forums http://forums.jaguarpc.com/server-ne...ect-com-6.html in regards to the service outage frequently.

Due to the network issues that were occurring our speeds had slowed; however, they are back to full speed again. We have a handful of VPSes remaining.

Posted by Jag, 01-18-2013, 11:44 AM
Quote:
Originally Posted by Hostinpk-CEO
OK. I can understand that drive went bad but check your forum and you will see several threads regarding the same problem. Each time your drives becomes failed ones.
So, why don't you bring the change and so far I remember, last time you said that we are going to bring the change in the whole system of r1soft backups but still its same.

I will suggest you to change the whole structure of your backup system and specially your DRIVES that are mostly failed and restore takes weeks and weeks.
It took me personally months negotiating with r1, well sorry now called Idera. but we began moving to v5 and improved capacity connections between servers and backups in dec. Im in Atlanta right now with other team members to take the next step in the backend , while techs already have contacted all clients that use a backup agent and have begun converting backups to idera v5. shared, vps, etc wouldn't see that notice because you dont pay for an agent, we do that for you.

I will create the blog story about the last few painful restores in 2012, how they started, what went wrong, and all we are doing , already started doing, to eliminate this problem. Our last test restore on the new setups let us either span clients out rapidly to other machines, more than one at a time 9limitation of v2 that isnt in v5) and/or let us bare metal full restore in 2hrs a full node.

I just need to get you this info, and I'll work rapdily to do so.

Posted by Jag, 01-18-2013, 11:56 AM
Quote:
Originally Posted by milkmycow
4:39pm
"Total restore is currently at 626G and still progressing."

9:36pm
"Total amount of restore data has reached 627G and rest is still being done. Thank you."


1gb in 5 hours? I hope this is another typo. Now down over 5 days. There shouldn't be so many vps accounts on a single server that it takes FIVE DAYS or more to restore.
It isn't a client count issue, for example on a node with 48gb ram 8x raid 10, dual hex core, you may see a variety from 20-35 accounts on there. Depends on the load, the client type, the revenue they generate. but we dont overload them.

Its the fact that a few clients with large vps will pack that thing with data. for example we were just discussing someone that out 500 shared plans on our 2nd tier vps plan. While well under their disk limit, thats holding a lot of data.

Again, this will go into the blog im preparing in response to this thread, and backups in generate, but we have ways to overcome this esily. I'll detail them on the blog, in short, faster backplane connecting backups to systems, more of them, v2 -> v5, faster restore software and ability to restore multiple accounts from one back or even multiples to multiple servers at a time as opposed to old v2 where you can only pull one at a time or do a full bare metal, backup systems themselves we're changing but Im not going to get into those details much (secret sauce in the details) , and splitting clients off older vz to some new nodes that jsut arrived and continue to arrive, ie removing hardware we consider too old, and breaking 8x raid 10's into (2) 4x raid 10 on the same system is something we are experimenting with , should a failure occur at least that would cut down who is affect by half, but I worry about disk performance so we're testing it, and much much more...

I will elaborate, again this is my fault for just not sharing with you all sooner what steps we actually already have taken, what we are actively doing, and whats next, and total time to completely overhaul the entire process.

What you won't hear about is there were other vps systems that had a drive fall here or there in those 5 months, we replaced the drive, using some new backup procedures (not including anything i have written above) and those systems went right back up in no time, not a single complaint.

So can we ever get rid of failures, no, nobody can. Drives fail! But we can improve alot on these "days" disasters. And we are deploying those things now. Even in a worse case, when we are done, nothing should be out more than a few hours, and I hope even that is the rare occurance (once in 2yrs ) type of event.

Posted by Jag, 01-18-2013, 12:04 PM
Quote:
Originally Posted by Hostinpk-CEO
Just a Quick updated.
Today exactly 16 days have been passed and still server has not been restored. They are restoring 1 gb data per 12 hours. Very much disappointed from Zach and his team. Even they tell us the new story each time.
12 hours ago, one of their system administrator Sam Lewis said that my vps restore will be completed very soon (within few hours), also zach confirmed, but still not online. Very tired of their system and It may be the worst experience with JPC.
no node is down, and nothing has been been down for 16 days. You may be mixing a series of back to back issues, but there are definitely no 16 day outages on any nodes. You should bring this stuff to me and my management in our forums, our emails, our systems, my staff doesnt work on wht and it sounds like you have multiple things going on and being mixed up. Again, no node has been down or restoring for 16 days.

Please Please contact us at our place so we can help you.

Posted by ModelWebHost, 01-18-2013, 12:19 PM
Quote:
Originally Posted by Jag
no node is down, and nothing has been been down for 16 days. You may be mixing a series of back to back issues, but there are definitely no 16 day outages on any nodes. You should bring this stuff to me and my management in our forums, our emails, our systems, my staff doesnt work on wht and it sounds like you have multiple things going on and being mixed up. Again, no node has been down or restoring for 16 days.

Please Please contact us at our place so we can help you.
Hello Jag!
First of all I appreciate your all replies that you have made so far in this thread and shown us the clear picture of all the matter but I could remember easily that your support discussed the same changes 5 months ago that have not been brought so far and that's why this happened.
However, listen, when you offer large specs vps then a client can also use the resources allotted to him as I did and some others too. So, there is no question to tell me again and again that I have a huge VE as your support said me so many times.
When my vps had not been restored, so was it up for me? Was this functioning for me?
Kindly put yourself in place of me and think what clients can do with me and how the business have been ruined because of this.
One thing more, when I asked for stop restoration process, it was not honored but after 3 days of this, I had nothing but problems and even a single account was not restored yet. So, what should I expect?
I tried contacted to chris, zach, serg and some other members but all were putting this work on each other and days passed in no time.
Today, another GOOD NEWS have been received from your support that they have to start the /home restoration of VPS again after the restore have been completed that is another pain for me. Moreover, I have been asking for a long time for a specific account restore but no one listens. Kindly take your personal interest to get me out of these problems.

Posted by Jag, 01-18-2013, 12:25 PM
Quote:
Originally Posted by Hostinpk-CEO
Here is the summary.
01/01/2013
Server Down

01-06-2013, 10:38 PM
identified a bad disk in the RAID

01-07-2013, 01:50 AM
manual File System Check (FSCK) started

01-08-2013, 05:09 AM
Data restoration Started

01-10-2013, 05:28 PM
Now, Restore from the backup started

2013-01-11/08:49:57
My VPS started to be restore

2013-01-16/13:26:33
After 5 days, they said the restoration is completed but it was not. Even a single account was not restored properly.

2013-01-16/13:53:08
Server was online but sites were not accessible and they said that we are fixing the permissions.
2013-01-17/01:18:48
Still permissions could not be fixed.

2013-01-18/08:43:30
They were not able to fix the permissions and one thing more that made me much worry, they said that cPanel guys could not fix the issue. Amazing, the persons who developed the software, could not solve the issue??This may be the worst story here at WHT forum. They are doing nothing but delaying the process and don't know that when the server be online??
Jan 1, unrelated issue.

Jan 6, server indentified as problematic, troubleshooting began. It took too long long, and this policy was adjusted last week.

Jan 7.
fsck attempted, ran for hours, too many hours 5-6+

Note here: since then, just days ago, we have changed this policy while Im here in Atlanta. If a fsck is taking too long, just restore to another node. We can now restore with the new setups in 2hrs full bare metal, maybe 4 hrs if we have to split and do accounts to other systems as opposed to bare metal, either way that beats a long fsck. Dont ask my why this policy was changed at some point and by whom, it took a while to find the right manager for support and he's cleaning up the reversed policy changes other managers made. Yes, this is/was mostly a management issue. techs arent to blame. blame me.

Then we will fsck, fix, or kill the server responsible. This bacup should have been moving clients off this system jan 6th, right after troubleshooting took more than a few hours to discover the disk.


Jan 8, 10. I dont get this part, can you send me the notices or tickets related to this. either they started a backup or didnt, Idont know how or why one would be announced 8th and not start til 10th, my guess is it it started on the 8th, you just didnt get the notification followups til the 10th. either way I would like to see what was done and said, and lets face it, this system should have been back up on the 6th, so we're well past any point of time frame any of us, you , me, managers, techs, should accept.

Jan 11-16: 5 days restore, is dismal. something being fixed right now, but thats just sad and dismal. I have no excuse, only apologies and shame.

that should have been the end of the story. I dont know why or how you ended up with corrupt data, I hate to sling a finger at r1, but its been known to do that, ask any host. It doesnt make it any better.

This is also why we tried a big push for clients to buy backup service, the ones that did had their systems restored in hours on other nodes. We aren't hearing from them in this thread, it was a non issue. Why in this day and age anyone would simply not do some second level of backup instead of just relying on the full system restores is beyond me. For 14yrs our backup sla hasnt changed, we do backups for ourselves, to restore our clients and not lose an account. And yes if you need youcan get a file from it, but running a business and not forking up $5 for a backup to sleep at night and avoid problems is beyond me.

It doesn't excuse our failure in this long restore. i'm sure my techs cursed as loud as you did when they saw the data corrupt. If there's a ray of light to take from this, the new v5 can check for corruption better, manage it, backup and rsotre cpanel sides, mysql parts, and again let you even as a vps client get in there and manage the data if you want to do your own restore. We're working rpaidly with idera to complete this deployment. The sacrifice is they gave up compression for speed, which is fine. what you dont know is that 50% more diskspace now needed, amounts to Petabytes here, and we're covering those costs, not passing them on. but you normally won't hear about that stuff either.

2013-01-18 : I'm beyond sorry, and apologies seem rather useless at this stage. Zach or Chris will be along to comment.

Posted by ModelWebHost, 01-18-2013, 12:35 PM
Quote:
Originally Posted by Jag
Jan 1, unrelated issue.

Jan 6, server indentified as problematic, troubleshooting began. It took too long long, and this policy was adjusted last week.

Jan 7.
fsck attempted, ran for hours, too many hours 5-6+

Note here: since then, just days ago, we have changed this policy while Im here in Atlanta. If a fsck is taking too long, just restore to another node. We can now restore with the new setups in 2hrs full bare metal, maybe 4 hrs if we have to split and do accounts to other systems as opposed to bare metal, either way that beats a long fsck. Dont ask my why this policy was changed at some point and by whom, it took a while to find the right manager for support and he's cleaning up the reversed policy changes other managers made. Yes, this is/was mostly a management issue. techs arent to blame. blame me.

Then we will fsck, fix, or kill the server responsible. This bacup should have been moving clients off this system jan 6th, right after troubleshooting took more than a few hours to discover the disk.


Jan 8, 10. I dont get this part, can you send me the notices or tickets related to this. either they started a backup or didnt, Idont know how or why one would be announced 8th and not start til 10th, my guess is it it started on the 8th, you just didnt get the notification followups til the 10th. either way I would like to see what was done and said, and lets face it, this system should have been back up on the 6th, so we're well past any point of time frame any of us, you , me, managers, techs, should accept.

Jan 11-16: 5 days restore, is dismal. something being fixed right now, but thats just sad and dismal. I have no excuse, only apologies and shame.

that should have been the end of the story. I dont know why or how you ended up with corrupt data, I hate to sling a finger at r1, but its been known to do that, ask any host. It doesnt make it any better.

This is also why we tried a big push for clients to buy backup service, the ones that did had their systems restored in hours on other nodes. We aren't hearing from them in this thread, it was a non issue. Why in this day and age anyone would simply not do some second level of backup instead of just relying on the full system restores is beyond me. For 14yrs our backup sla hasnt changed, we do backups for ourselves, to restore our clients and not lose an account. And yes if you need youcan get a file from it, but running a business and not forking up $5 for a backup to sleep at night and avoid problems is beyond me.

It doesn't excuse our failure in this long restore. i'm sure my techs cursed as loud as you did when they saw the data corrupt. If there's a ray of light to take from this, the new v5 can check for corruption better, manage it, backup and rsotre cpanel sides, mysql parts, and again let you even as a vps client get in there and manage the data if you want to do your own restore. We're working rpaidly with idera to complete this deployment. The sacrifice is they gave up compression for speed, which is fine. what you dont know is that 50% more diskspace now needed, amounts to Petabytes here, and we're covering those costs, not passing them on. but you normally won't hear about that stuff either.

2013-01-18 : I'm beyond sorry, and apologies seem rather useless at this stage. Zach or Chris will be along to comment.
Jag@
Again thanks for having good feelings for me or for your clients but now the time is flying and I will request you to handle my issue and not only handle but solve ASAP because I don't want anymore.
Moreover, I am sending the longest ticket # (With 81 replies) since this happened. I will be waiting for updates.

Posted by ckpeter, 01-18-2013, 01:07 PM
Jag, thanks for stopping by with information.

At the moment, my VPS has not been restored.

I generally don't like to complain loudly, or make lots of noise. I am a service provider myself, and I know how bad things can happen to even the best host.

At this point, I have a few questions:

1) What exactly is the bottleneck for the slow restore (disk IO, restoring node one at a time, etc)? What can be done to speed it up?

Customer service has very kindly setup a spare VPS for me. However, it takes me substantial amount of time to set a VPS up, which I don't want to do, if my original VPS will come back in another day.

On the other hand, tracking the progress of restore, it may be UP TO another 2 weeks to restore the files (at the worst case of 0.5 GB/hr). If that's the case, I will need to make contingency plans (beyond what I have already done).

Can you provide the technical details on why restore is so slow, and give your best-guess on when my/everyone's VPS will be restored?

I understand you don't want to be accountable for the wrong ETA, but then again, it is the nature of a paid service provider to be accountable.

2) I don't have much new data from the last month. Can I opt to restore to the Dec 18 backup, so that I can get back online quicker?

3) Have you considered drastic measures such as using an SSD or getting R1soft to send a consultant to troubleshoot the restore process? I am happy to contribute some non-trivial amount of money, just so I can get my data back quicker.

For 2/3, I have previously asked and was told in this thread that it is not possible. However, consider that the length of the downtime has become extraordinary at this point, I would like to ask about them again.

Posted by Jag, 01-18-2013, 01:13 PM
Quote:
Originally Posted by Hostinpk-CEO
Hello Jag!
First of all I appreciate your all replies that you have made so far in this thread and shown us the clear picture of all the matter but I could remember easily that your support discussed the same changes 5 months ago that have not been brought so far and that's why this happened.
However, listen, when you offer large specs vps then a client can also use the resources allotted to him as I did and some others too. So, there is no question to tell me again and again that I have a huge VE as your support said me so many times.
When my vps had not been restored, so was it up for me? Was this functioning for me?
Kindly put yourself in place of me and think what clients can do with me and how the business have been ruined because of this.
One thing more, when I asked for stop restoration process, it was not honored but after 3 days of this, I had nothing but problems and even a single account was not restored yet. So, what should I expect?
I tried contacted to chris, zach, serg and some other members but all were putting this work on each other and days passed in no time.
Today, another GOOD NEWS have been received from your support that they have to start the /home restoration of VPS again after the restore have been completed that is another pain for me. Moreover, I have been asking for a long time for a specific account restore but no one listens. Kindly take your personal interest to get me out of these problems.
We were probably dicussing what thins we "were" going to do, as of last month, we are doing them. And some are done, some are much better, some arent. To finish this project series is probably realistically 2 more months. however the server your on will be in better shape for backsup in 2 weeks. There is a plan and schedule with timelines, i will put it on the blog.

I hate that you feel ignored, those are all great guys you listed, this screams to me an internal software and /or policiy failure that broke communication with them.... and yes i have a fix for that too. We tried to luanch the new platform dec27, thats been 18 months in development to replace the current system. It puts a lot more information and tools at your finger tips, especially escalation and monitoring and such. Issues showed up rapdily on live that didnt in testing, depsite 9 months of beta. So we reverted. Those issues are fixed and devs are still working to improve some areas that received poor feedback. I cant say for sure when i will reattempt to launch it, I want alot more testing and training and few more features put back in before we try to launch it again. And yes my point...somewhere in here, it aims to fix alot of "server issue" , "esacalation" and "communication" . So shift leaders get alerted to open issues like that , that didnt pass from one shift to another, and their boss, and their boss up to me and other top brass.

Posted by Jag, 01-18-2013, 01:28 PM
Quote:
Originally Posted by ckpeter
Jag, thanks for stopping by with information.

At the moment, my VPS has not been restored.

I generally don't like to complain loudly, or make lots of noise. I am a service provider myself, and I know how bad things can happen to even the best host.

At this point, I have a few questions:

1) What exactly is the bottleneck for the slow restore (disk IO, restoring node one at a time, etc)? What can be done to speed it up?

Customer service has very kindly setup a spare VPS for me. However, it takes me substantial amount of time to set a VPS up, which I don't want to do, if my original VPS will come back in another day.

On the other hand, tracking the progress of restore, it may be UP TO another 2 weeks to restore the files (at the worst case of 0.5 GB/hr). If that's the case, I will need to make contingency plans (beyond what I have already done).

Can you provide the technical details on why restore is so slow, and give your best-guess on when my/everyone's VPS will be restored?

I understand you don't want to be accountable for the wrong ETA, but then again, it is the nature of a paid service provider to be accountable.

2) I don't have much new data from the last month. Can I opt to restore to the Dec 18 backup, so that I can get back online quicker?

3) Have you considered drastic measures such as using an SSD or getting R1soft to send a consultant to troubleshoot the restore process? I am happy to contribute some non-trivial amount of money, just so I can get my data back quicker.

For 2/3, I have previously asked and was told in this thread that it is not possible. However, consider that the length of the downtime has become extraordinary at this point, I would like to ask about them again.
1) its some systems arent on updated network gear, hence the new enterprise cage blog and announcements, thats been worked on a while and we're fixing the last of them this month.

Its also r1soft v2, being 1 of 4 original investors we had a site license but dec 2012 they EOL v2, and we have been luckily testing v4 for a year before that but found way too many problems in it. We reported them and to Idera 9formerly r1soft) credit, they actually fixed them. We signed for v5 , the latest in Dec, and have begun moving things over to it. It requires all new agents on all servers, and all existing backup systems to be whiped out and started fresh with v5, and since v5 chews up 35-50% more space, we had to deploy more backups systems, which we did and will keep doing.

It may take us another month to convert them all. And now the good news, in our v5, you just buy backup space and we'll give you an agent free. youcan backup restore, set your on data and times, etc etc. And v5 now includes cpanel backsup and mysql, its all inclusive. The best however, it finally fixes the proprietary feature we built years ago that v3 broke. The ability for our vps clients to use an agent ($3) , to login and access our proviate backups for your own vps.. even if you dont buy extra space. So you could go in and grab or restore what you like outside of a failure.

You could even setup a new vps, with the new jag dashboard in real time, load your backup from the node you were on , to a new vps, and terminate the old. If you didnt want to wait the few hours for our new backup systems to do their thing.



I've put a lot of tech details in this thread about why it was slow and other threads, I'm going to take this to our blog tonight where i can free reign explain it off wht. And post followups to the completion of those v5 updates. i'll send a notice when I post the blog and update this thread so you and other clients that share this concern can follow. Plus its just lots of awesome information.


2) no, you cant, not from this v2. V5 improves lots of limitations and in the future meaning, 2-6 weeks from now you will be able to.

3) ssd's die left and right in every deployment and dc and friend and other host I know that has tried them in mass. not talking the guy with one server, talking the people that have to maintain 40k drives like us, ssd's arent ready except for the enterprise ones, and vps prices would have to tripple to remotely make those possible while disk space would have to be cut in 1/3. We do have an Idera (r1) consultant, in fact team, thats helping our own team deploy and train the new v5 , they have been working together for a bit. We are soooo close to you clients being able to see this, but I hate slapping time lines on it. Its in progress is the best I can say, and I would like to see it done before Apr. My wish would to have it all done next month, feb. But that may not be possible, i want to just make sure its done right first and foremost, and done without losing a backup.

Posted by ckpeter, 01-18-2013, 05:28 PM
Quote:
Originally Posted by Jag
1) its some systems arent on updated network gear, hence the new enterprise cage blog and announcements, thats been worked on a while and we're fixing the last of them this month.
....
Jag, I appreciate the excitement that you have for upcoming enhancements.

At the moment, my focus is on getting my VPS up as soon as possible, so my questions pertain to that. I know SSD in general are not terribly reliable. However, since it sounds like the restore is bottleneck'ed by hard disk seek time, I am only asking about using an SSD to speed up the restore for this outage.

I know you said that there have been many details posted about why the restore is slow, but it is still not clear to me exactly what is going on.

As a client, my wish is simple - it is best if a reliable ETA is given and attained, but if not, I appreciate a good explanation of what's happening, so I can judge for myself how long it may take (since I also have my own clients to respond to, and I can't just relay what JPC tells me if the ETA is not reliable).

Can you give me an educated ETA on when my VPS will be up, along with some factors that may affect the restore speed and your ETA?

Posted by Jag, 01-18-2013, 05:39 PM
There is no disk bottleneck, none, nada. I'm not sure where this idea came from but its simply not true. I think I have explained the bottlenecks, and again Im going to put all this info into a concise blog post so you can see the big picture, the problems, the fixes. There is no i/o bottleneck.

There are only 5 vps in the queue, most were up ni the first few days. Those remaining systems all have some sorts of data on them, and should complete tonight. The arrays sizes are something we are taking a look to reduce along with a number of other steps. But you would have to read through this whole thread at all my replies to put it all together. I'm just going to build a blog post with that info and elaborate so you won't have to.

Posted by ckpeter, 01-18-2013, 05:50 PM
Quote:
Originally Posted by Jag
There is no disk bottleneck, none, nada. I'm not sure where this idea came from but its simply not true. I think I have explained the bottlenecks, and again Im going to put all this info into a concise blog post so you can see the big picture, the problems, the fixes. There is no i/o bottleneck.

There are only 5 vps in the queue, most were up ni the first few days. Those remaining systems all have some sorts of data on them, and should complete tonight. The arrays sizes are something we are taking a look to reduce along with a number of other steps. But you would have to read through this whole thread at all my replies to put it all together. I'm just going to build a blog post with that info and elaborate so you won't have to.
Thank you, Jag. I assumed disk bottleneck because of the mention of great number of files slowing restore down, which I assumed may have something to do with seek time.

I look forward to a full explanation from you.

It sounds like there are only 5 more VPS still waiting to be brought back online at this point? Can you give an ETA? I am fine with an educated one or some sort of time frame, even if you can't be sure.

Posted by Jag, 01-18-2013, 05:55 PM
Quote:
Originally Posted by ckpeter
Thank you, Jag. I assumed disk bottleneck because of the mention of great number of files slowing restore down, which I assumed may have something to do with seek time.

I look forward to a full explanation from you.

It sounds like there are only 5 more VPS still waiting to be brought back online at this point? Can you give an ETA? I am fine with an educated one or some sort of time frame, even if you can't be sure.
No, there are in depth posts in this thread that spell it out, but in short, poor backup networking + poor v2 r1 software, = bottlenecks. All things we are here in Atlanta to fix this month

Posted by milkmycow, 01-18-2013, 06:41 PM
i guess i'm one of the unlucky 5 that get to wait an extra long time...

Posted by milkmycow, 01-19-2013, 05:53 PM
got an email saying I as back up, but all my sites still down.

Posted by milkmycow, 01-19-2013, 06:56 PM
an hour and 8 minutes have passed since i was told i was restored. Responded instantly after they asked me to check it (within 5 mins or so) and no response to ticket now in 1hr 9 min.. Excellent response times here...

before telling me im online and to check, why wasnt it checked by tech?

Posted by ckpeter, 01-19-2013, 10:40 PM
Quote:
Originally Posted by milkmycow
an hour and 8 minutes have passed since i was told i was restored. Responded instantly after they asked me to check it (within 5 mins or so) and no response to ticket now in 1hr 9 min.. Excellent response times here...

before telling me im online and to check, why wasnt it checked by tech?
It looks like the reason why your sites are down may be due to permission. If you are using Plesk, the /var/www/vhosts are now all own by root, so the web server does not serve up any site.

My VPS is now up, but I was told that only data up to Dec 18 are present. I asked, but have not yet received a response, about recovering the missing data from the last few weeks.

Posted by milkmycow, 01-20-2013, 06:16 AM
Quote:
Originally Posted by ckpeter
It looks like the reason why your sites are down may be due to permission. If you are using Plesk, the /var/www/vhosts are now all own by root, so the web server does not serve up any site.

My VPS is now up, but I was told that only data up to Dec 18 are present. I asked, but have not yet received a response, about recovering the missing data from the last few weeks.
im not using plesk

Posted by milkmycow, 01-20-2013, 06:21 AM
6 hours after they said i was restored and i replied saying no i was not back online, i finally got a response.

SIX HOUR RESPONSE TIME TO AN EMERGENCY DOWNTIME TICKET....

and the response? Telling me to login to WHM and Disable BIND...

Login to the server that is down... that hasnt worked in 2 weeks now... thats the solution... for me to magically login... and fix it myself... even though my entire VPS is down... STILL

Posted by milkmycow, 01-20-2013, 06:26 AM
TWO ENTIRE WEEKS down time now. TWO [EXPLETIVE DELETED] WEEKS.

Seriously? no really, seriously?

worst. host. ever.

Posted by ckpeter, 01-20-2013, 09:54 AM
Quote:
Originally Posted by milkmycow
im not using plesk
It doesn't matter. It looks like all the files are restored as owned by root. Whatever control panel you use, this ownership issue is likely the reason why all your sites are down.

Posted by milkmycow, 01-20-2013, 11:07 AM
They still have not replied to the ticket...

Posted by Zachary McClung, 01-20-2013, 12:25 PM
Quote:
Originally Posted by milkmycow
6 hours after they said i was restored and i replied saying no i was not back online, i finally got a response.

SIX HOUR RESPONSE TIME TO AN EMERGENCY DOWNTIME TICKET....

and the response? Telling me to login to WHM and Disable BIND...

Login to the server that is down... that hasnt worked in 2 weeks now... thats the solution... for me to magically login... and fix it myself... even though my entire VPS is down... STILL
Would you PM your ticket #, I'd like to look into this for you. There is clearly some disconnect.

Posted by milkmycow, 01-20-2013, 02:12 PM
It's the 2 week old emergency level 1 vps down ticket that has now gone 8 hours without a response....

How many of those are there?13665273

Posted by Zachary McClung, 01-20-2013, 02:58 PM
Quote:
Originally Posted by milkmycow
It's the 2 week old emergency level 1 vps down ticket that has now gone 8 hours without a response....

How many of those are there?13665273
I'm looking into this ticket for you now.

Posted by milkmycow, 01-20-2013, 05:03 PM
Got one update, didn't help anything. Sites still down, so I said so...

going on an hour and a half on ticket right now..

Is this an emergency or is it not?

I had gotten to point where in my mind I was decided that this TWO WEEKS of downtime wasn't totally your fault. I was going to suck it up and stick with you guys.. But taking HOURS AND HOURS to respond to tickets? No thanks..

Posted by milkmycow, 01-20-2013, 06:33 PM
3 hours with no response. Is JaguarPC closed on Sundays?

Posted by Zachary McClung, 01-20-2013, 07:08 PM
No, we are not closed on Sundays. I have asked someone to look into your ticket again.

Posted by milkmycow, 01-20-2013, 07:20 PM
why am i having to post here to get responses on level 1 server down tickets that are two weeks old without resolution?

Posted by milkmycow, 01-20-2013, 10:23 PM
Still no ticket response. Jesus.

Posted by milkmycow, 01-20-2013, 10:25 PM
7 hours now..

Posted by ModelWebHost, 01-21-2013, 09:34 AM
Jag@
I am back after 3 days but you have done anything for me. Neither you fixed the cpanel problem nor you started the restore again. Seems that you think your responsibility only to explain how things happened and nothing else.
I request you once again to work on my ticket and fix the issues.

Posted by milkmycow, 01-21-2013, 10:55 AM
after posting on here, their facebook, and finally hitting up tech support chat, i finally got someone to pay attention. They fixed the nameserver issue and cpanel/mysql/bind/exim/all the other issues there were and i am FINALLY back online..

its sad how little urgency there is for level 1 emergency tickets

Posted by Zachary McClung, 01-21-2013, 02:17 PM
Quote:
Originally Posted by Hostinpk-CEO
Jag@
I am back after 3 days but you have done anything for me. Neither you fixed the cpanel problem nor you started the restore again. Seems that you think your responsibility only to explain how things happened and nothing else.
I request you once again to work on my ticket and fix the issues.
The restore has already been started for the remainder of your VPS and an admin is working to fix your cpanel as I type this.

Posted by Zachary McClung, 01-21-2013, 02:18 PM
Quote:
Originally Posted by milkmycow
after posting on here, their facebook, and finally hitting up tech support chat, i finally got someone to pay attention. They fixed the nameserver issue and cpanel/mysql/bind/exim/all the other issues there were and i am FINALLY back online..

its sad how little urgency there is for level 1 emergency tickets
I'm glad that the issue was resolved for you. It isn't that we didn't find it urgent. In the case of a down server there is a lot of urgent tickets.

Posted by JaJae, 01-21-2013, 06:12 PM
Quote:
Originally Posted by Hostinpk-CEO
Jag@
I am back after 3 days but you have done anything for me. Neither you fixed the cpanel problem nor you started the restore again. Seems that you think your responsibility only to explain how things happened and nothing else.
I request you once again to work on my ticket and fix the issues.
I seem to remember the last time you had a problem with JaguarPC you came here complaining about extended downtime. A problem that could easily have been resolved if you had your own backups separate from your hosting provider. You were advised then to make your own backups. And now here we are again, as a so-called "CEO" of a web hosting company it "seems that you think your responsibility only to" pay the bills and publicly lambasting your provider for hardware problems.

Posted by Suldominios, 01-21-2013, 08:20 PM
Quote:
Originally Posted by JaJae
I seem to remember the last time you had a problem with JaguarPC you came here complaining about extended downtime. A problem that could easily have been resolved if you had your own backups separate from your hosting provider. You were advised then to make your own backups. And now here we are again, as a so-called "CEO" of a web hosting company it "seems that you think your responsibility only to" pay the bills and publicly lambasting your provider for hardware problems.
So you are saying that we must keep fully updated backups of our sites and customer accounts to move ourselves to another provider in case of this kind of issue?

Posted by ModelWebHost, 01-22-2013, 12:47 PM
Quote:
Originally Posted by Zachary McClung
The restore has already been started for the remainder of your VPS and an admin is working to fix your cpanel as I type this.
Your dirty joke has not yet been finished. Its twenty second day and still problem not solved. Even a single account have not yet been restored properly and nothing working. Last night, some sites were working but from that time, server is down and I have to face once again fake promise of Chris.
Chris said that he will solve the issue and will restore again but till that time sites stopped working and nothing changed.
I am extremely tired of you people. I will suggest you to improve your servers structure instead of making offers in vps sections.

Posted by Zachary McClung, 01-22-2013, 01:01 PM
Quote:
Originally Posted by Hostinpk-CEO
Your dirty joke has not yet been finished. Its twenty second day and still problem not solved. Even a single account have not yet been restored properly and nothing working. Last night, some sites were working but from that time, server is down and I have to face once again fake promise of Chris.
Chris said that he will solve the issue and will restore again but till that time sites stopped working and nothing changed.
I am extremely tired of you people. I will suggest you to improve your servers structure instead of making offers in vps sections.
There is no dirty joke here. You asked us to stop your restore last week mid restore against our recommendation. We could see the VE was corrupt and cPanel files were junk still. Once it was stopped the remaining VEs were restored. Your second run through the restore is currently at 15% no one is feeding you lies.

Posted by ModelWebHost, 01-22-2013, 01:06 PM
Quote:
Originally Posted by Zachary McClung
There is no dirty joke here. You asked us to stop your restore last week mid restore against our recommendation. We could see the VE was corrupt and cPanel files were junk still. Once it was stopped the remaining VEs were restored. Your second run through the restore is currently at 15% no one is feeding you lies.
Already pointed in this thread earlier that I asked you to stop the restore but did you do this? No, absolutely no, even no after 3 days. Check the ticket for more details.
Once again, I am going to have a wait for a month. 24 hours passed and 15% restore, through this, I have to wait almost 7 more days.
Absolutely, painful condition for me. However, can you provide me the login for the accounts which have been restored so far? (Provide new server login into the ticket)

Posted by Zachary McClung, 01-22-2013, 01:24 PM
Quote:
Originally Posted by Hostinpk-CEO
Already pointed in this thread earlier that I asked you to stop the restore but did you do this? No, absolutely no, even no after 3 days. Check the ticket for more details.
Once again, I am going to have a wait for a month. 24 hours passed and 15% restore, through this, I have to wait almost 7 more days.
Absolutely, painful condition for me. However, can you provide me the login for the accounts which have been restored so far? (Provide new server login into the ticket)
The log ins have not changed it is a copy of what was on the server. There is still cPanel corruption which is preventing the log ins. An admin is working on fixing the corruption and will update your ticket once completed.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
Tikier.com offline (Views: 1055)

Language: