THR is Down.

thegunguy

Administrator
Staff member
Current Status: Down, and in the process of dragging all of our data to a new server. Luckily, it's going to this server which is already built and running and configured for the forum software we use.

Timeline: Right now I'm guessing 3 hours. I'm writing this at 1:36pm EST.

How did we get here? Honestly, I'm not sure. I think I've been misdiagnosing the problem the whole time.

Here's the timeline:
  1. Our firewall gave out. Magic smoke didn't escape, but it stopped responding, I rebooted it, and it simply won't power up again.
  2. I went to another brand of firewall that I install for clients - one with a configuration scheme that's a bit more difficult than I'm used to, but one that was supposed to work. It didn't. I called an expert and was told it was supposed to work in this role, but I needed something else.
  3. This is contemporaneous with the previous point: I host THR on a cluster of 3 machines and it's set up to restart THR on another machine if the first one fails. Well, THR ended up on a machine that didn't want to allow THR to respond at all. Moving servers fixed this.
  4. I tried to move to the firewall I used previously to the one we've been on the last year. That kind of worked, but I saw lots of slowness and indications of memory errors. So I pulled it back out.
  5. That was the end of my planned firewall replacements, so I dug into a pile and grabbed something a customer had outgrown, that in accordance with the advice I received in (2) should work.
  6. It didn't. Hours and hours and I gave up and ordered a newer, faster model.
  7. Installed that newer model yesterday. Guess what? Didn't work. Another few hours lost.
  8. Now we're on the failed older firewall I mentioned in (4). After replacing the battery and some simple maintenance it seems to be working like a champ. Yay!
  9. All my sites are now up and responding. Except THR. And the firewall is doing everything I've asked of it, so I'm thinking all that time spent on this list so far was wasted.
  10. So, now I'm moving THR to simple, bare-metal hardware. This should work. (I'm praying it works.)

So what's the original problem? I don't know. Weird power fluctuation that killed one power supply and made another server start acting a bit crazy? Maybe, but that doesn't explain why THR's server isn't wanting to respond to web requests it responded to yesterday.

So, I'm trying to fix it rather than figure out what happened.

Progress (Red is in progress, grey is done, black is still to do):
  1. Dump THR's database to disk
  2. Update the code-base on the "new" server
  3. Build all the config files for THR.
  4. Wait for the database file to compress, because even over a gig network it's slow.
  5. Rebuild the database locally.
  6. Copy across all old files. This is the slow part - hundreds of thousands of files don't copy quickly. (Like I said - this was the slow part...well, assuming nothing else unexpected pops up.)
  7. Configure settings on THR to make sure everything works.
  8. Test locally.
  9. Change cloudflare settings so you see THR instead of a domain I never use. (Yes, if you dig on that site you can see a photo of my grandmom circa WWII.)
  10. BACK EVERYTHING UP.
  11. Tweak, tweak, tweak. Gotta make the new server FAST.
 
Last edited:

Armorer 101

New member
I fought the Chinese and Russian government hackers for years best of luck and thank you for the constant and so often unappreciated long hours it takes to keep a site like this up and running.
 
Top