Stream down once again, getting "cluster host connection" error

Once again, our server has gone down and will not come back up. This time I get the error "Unable to access account for radiofreebk: Cluster host connection failure for us1: Connection timed out (110)" when I try to start the server. Even when I went to check/verify our settings, I get the error "We apologize for the inconvenience, but an internal error has occurred. Please try again later."

This pattern of downtime is troubling. Please assist and let me know a) why this is happening and b) what you are doing to prevent such failures in the future. Thanks.

Tom Tenney
Author, 'The DIY Internet Radio Cookbook'
 
Hi Tom

We're sorry about the problem. There's a fault at the Dallas data centre right now and its being investigated. Our uptime is generally very good : http://stats.internet-radio.com/ but unfortunately these odd occasions of downtime do happen and we apologies for any inconvenience. Please bear with us while we wait for a resolution from our network providers.

In fact its back right now
.
 
It turns it was a DOS attack against the Dallas data centre which caused the short outage. Apologies once again.
 
I'm continuing to have ongoing issues. My stream monitor reported several stream outages overnight, and currently the stream is playing, but in the Centova panel, Im getting the error

"Station is offline (Unable to access account for radiofreebk: failsend|auto|tcok|reconnexc:Socket error during command transmission (failsend|noauto))"

at the top of the page intermittently, and it's registering 0 listeners, which I know is incorrect (I'm quite sure, for example, that I'm listening.) The stream also does seem to be skipping and rebuffering a lot more than is normal.

My frustration with this service level is growing daily. Is there a SLA associated with these accounts? If so, I'd like to be emailed a copy. In the meantime, it's critical that these issues are fixed. Thank you.
 
We're sorry to say there's been more DDOS attacks today. We haven't known an outage like this in over 5 years. We appreciate your frustration and hope you can bear with us while we await our providers to sort the issue.

We don't have an SLA no but we've added some extra bandwidth to your account for free. Apologies once again.
 
not back for me. still getting Station is offline (Unable to access account for radiofreebk: Cluster host authentication failure for us1: authenticated v=1.0.0&chroot=0&path=/usr/local/centovacast)

and cannot connect from the studio
 
Good. We can't apologise enough. DDOS attacks are notoriously difficult to deal with. We're really hoping our providers can find a permanent solution to this as its most unlike them. We've never known an issue go on for this long. Usually problems are sorted within minutes not days!
 
Yes. Your server is on us1.internet-radio.com which is hosted in their main Dallas DC that's unfortunately been getting hammered the worst. All the other DC's have been targeted at some point including Atlanta, London and Frankfurt although not as bad as the Dallas DC which also hosts most of Linodes main infrastructure and our main US server. We specifically went with Linode over other cheaper hosts because of their reliability which had been great so far. It's a real shame and we're interested to hear more details from them when things settle down. I imagine they have been on code red for the last 5 days as have we. What a nightmare!
 
Hi folks, we are also having major problems today. Working now but has been intermittent all day, seems to come and go. Do you know what the problem is?
 
Same problem for me over the last few days,loosing connection regularaly,very dissapointing especially for the listeners who I have been slowly been winning,guess I will have to start collecting them again:)
 
Now I'm having this issue where stream is up, but everything in Centova seems to be non-functional. See attached screen shot.

I agree with the above poster that this is extremely frustrating when you're new and trying to attract listeners. There's only so much wonkiness they'll take before they just don't come back. It's especially frustrating for us at this time of year when we're trying to get last minute year-end donations for our network. I don't see someone donating if they can't even connect to the stream.

It seems like there should be some kind of fall-back strategy in place for situations like this.
 

Attachments

  • Centova Cast.png
    Centova Cast.png
    126.1 KB · Views: 1,134
We have just received some more information on the problems so we have posted a news thread here.

I am having problems today, stream keeps going down and disconnecting users!! Any help?

We're really sorry about the issues. Most servers have been up and down these past few days. The UK ones were particularly bad yesterday. They all seem to have been back to normal since about 7pm yesterday. We hope this is the end of it.

Hi folks, we are also having major problems today. Working now but has been intermittent all day, seems to come and go. Do you know what the problem is?

Yes, see above for more info and please accept our apologies for the issues. Its most unusual.

Same problem for me over the last few days,loosing connection regularaly,very dissapointing especially for the listeners who I have been slowly been winning,guess I will have to start collecting them again:)

We can totally appreciate your frustration at disappointing listeners and now things seem back to normal you should see them returning.

Now I'm having this issue where stream is up, but everything in Centova seems to be non-functional. See attached screen shot.

I agree with the above poster that this is extremely frustrating when you're new and trying to attract listeners. There's only so much wonkiness they'll take before they just don't come back. It's especially frustrating for us at this time of year when we're trying to get last minute year-end donations for our network. I don't see someone donating if they can't even connect to the stream.

It seems like there should be some kind of fall-back strategy in place for situations like this.

The centovacast issue seems fine now and was the same issue as you had before. We didn't restart it this time and it seemed to correct itself. It was probably related to packet loss between the web control panel and the servers control daemon. As the network came back online it must have regained the connection. Sorry about that.

Yes, it is frustrating indeed and we hope to see you listener figures returning to normal as the network stabilises. As for a fallback strategy, unfortunately it would be impossible to implement redundant servers for everyone and maintain the same price point. The announcement by our upstream providers mentions improved infrastructure being added to help mitigate this so we're confident that this level of downtime should be avoidable in the future. They have an excellent reputation bar this issue which is why we chose to move to them a couple of years ago. Their network is more expensive but superior to our last network and many of our competitors. We also purposefully spread our servers amongst multiple DC's countries to help avoid any complete downtime but in this case it didn't work because every country's DC (from our provider) was targeted. We could move radio servers between countries / DC but the ip would change and possibly the port which would be inconvenient for most stations as they have promoted that url in many directories / apps / websites etc.. and changing it would be a very difficult. Besides, it wouldn't have helped in this case anyway because of most DC's being targeted simultaneously. We've been giving this some thought though and we'll certainly see what we can come up with to help avoid or limit this in the future. Apologies once again.
 
Top