Fun with Remote Servers
You may or may not have noticed but this site was down some since late Sunday evening until right before I wrote this. I'm not exacly sure what happened but I have some sense of it so I thought I'd jot it down here. I also cover the hardware that runs this system as well as a little about the hosting company we are using. If you are interested, read the rest of the story.
The Physical Server
The physical server that runs this site is actually a 1.8GHz dual-core AMD with 2GB of RAM and three 250GB hard drives (a software RAID 1 with a spare). Two long time Atari buddies of mine (Joe in Connecticut and Rob in Massachusetts) and I went in and bought the parts to build the server with. I only chipped in $200 and Rob and Joe footed the rest of the bill. All of the parts were ordered by Rob and shipped to him... and he put the system together. Rob got a basic Linux install going and got the system on the Internet. I ssh'ed in. and did a remote, medialess install of CentOS 4.3 with the GUI installer piped to me via VNC. Rebooted, did all of the updates... and then installed OpenVZ. Shortly after that CentOS 4.4 came out and I upgraded it along with all of the VPSes. Then Rob shipped it to the hosting company.
The Hosting Company
The hosting company that provides the physical server with its connection to the Internet is ColoPronto which is a division of ServerPronto... which I believe is owned by InfoLink. We actually have a small block of IP addresses and are currently running the Host OS and seven VPSes each with their own IP address.
I believe the physical complex where the server is located is in Florida somewhere. While I wish I could have the server within 10 miles of where I live so I could actually have physical access to it... since I don't pay for the connection... which would certainly be several times what Rob is paying... I can't really complain. I believe with the high speed connection and the extra IP addresses, the bill is about $20.93 a month... which is barely higher than dialup here in Montana.
The Problem Begins
The system and connection have been fairly stable until about 10 minutes before New Years Day (Sunday night). I noticed the server was down (Warren sent me an IM message about it too) and did a traceroute to it. The traceroute timed out at the hop just above the machine... so I submitted a trouble ticket via the hosting provider's online customer support site. About 7:30 AM (Monday) I noticed the server was back up. Yeah! I checked the trouble ticket system and didn't see a response.
Then on Monday night I noticed the system had gotten rebooted... and a trouble ticket response. They said they had rebooted the system (even though it had come back up several hours before) and that during the boot Kudzu had come up with a screen saying it had detected a new network card. They took it upon themselves to configure this new NIC and gave it the IP address assigned to the Host OS.
I could still access the server but something was weird. I did an
ifconfig and it showed two network interface cards (eth0 and eth1) and both of them were configured to have the same IP address. ARGH!. Rob thought he had disabled the second NIC in the BIOS of the machine... but perhaps that got reset somehow... OR perhaps a newer Linux kernel could see the card despite it being turned off in the BIOS. Whatever the case, the hosting provider tech support person ended up with both interfaces configured although only one had a network cable plugged into it.
It is not a good thing when two NICs have the same address... and we started running into connectivity issues. I submitted another trouble ticket explaining that we did NOT want eth1 setup... and to check what port actually had the network cable plugged in. Seems the technician who had configured eth1 had also moved the network cable too.
The VPSes were trying to go through eth0 so they had lost connectivity. Since I could get to the Host OS still, I manually restarted each VPS (
vzctl restart VPID) and they started working again as that caused them to start routing through eth1.
Please review the following steps and see if they make sense to you. If they do, please follow them and report any success. :)
- Login to the console as root
- Move the network cable from current port (eth1) to other port (eth0)
- restart networking
service network restart
- Manually take down eth1
ifconfig eth1 down
- Remove the config for eth1
- restart networking again
service network restart
- Verify that eth0 is configured and up and that eth1 does not show up
- Verify connectivity with the outside world
ping something outside
Working for Good?
I got a response back from them that said they had followed my instructions and that it appeared to be working again.
I ssh'ed in again, saw that indeed eth0 was setup and working, and that eth1 was no longer configured. I had to restart the VPSes again so they would switch from eth1 back to eth0... but I was expecting that. I guess I could reboot the machine and make sure it comes up like expected, but I'm just not in the mood for it quite yet.
Let's hope the system stays up for a long time to come... knock on wood.