Turn Off Ads?
Page 2 of 2 FirstFirst 12
Results 16 to 22 of 22

Thread: Facebook Outage

  1. #16
    breath westofyou's Avatar
    Join Date
    Oct 2000
    Location
    PDX
    Posts
    57,143

    Re: Facebook Outage

    What a nightmare, couldn't think of a company that deserved it more


  2. Turn Off Ads?
  3. #17
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Facebook Outage

    Quote Originally Posted by Roy Tucker View Post
    Being in this business (albeit much smaller scope), I was curious as to how this can happen to someone like Facebook. I know that we have a very rigorous process for promoting changes up to our production environment and woe be to you if you don't follow the process. But we are all imperfect carbon based units and people skip steps and think "I can just make this one change and it won't affect anything". Believe me, it's easy to do and I've gotten caught once on it (and once is all the chances you get in our company, I've seen people walked out that day over it). It has to be taken very seriously.

    It was a Border Gateway Protocol (BGP) issue. BGP is how your internet-facing servers speak to one another. A very telling statement (found in the Facebook link below) was "During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally.".

    Haha. Oh jeez. Holy moly. Oh god. My worst nightmare. There are some production changes that I have made in the past where I realize that this is a Big Change and I damn well better be scrupulously 1000% correct and really know WTF I'm doing. Like I tell our junior engineers and developers, with great power comes great responsibility. But like I said, all of our changes are made in lower environments thoroughly tested out before being promoted up. So for those big changes, I *know* what the impact is. But in this case, it was a Facebook network engineer (I assume ex-Facebook now) that roguely and out of band did this command without completely understanding the ramifications. And man did they screw up big time. Human error.

    So once the Facebook DNS entries are removed from the internet backbone routers via BGP, you're in really deep yogurt. Your urls are removed from the internet and all of those changes are propagated out everywhere. And I mean *everywhere*. You are effed big time. You can't even access your own systems via VPN since your service provider doesn't know anything Facebook exists, you have to drive into the data center to get on machines there, and everyone is running around with their hair on fire. We have had one catastrophic outage at my company and it was no fun at all. At least with us, we routinely practice our DR exercises so we knew what to do. But stuff hits the fan big time.

    So, tl;dr, human error. Detail below from internally Facebook and externally from Cloudfire. Interesting reads.

    https://engineering.fb.com/2021/10/0...utage-details/

    https://blog.cloudflare.com/october-...cebook-outage/
    Very interesting read, thank you.

    And I know what you mean about changes. About twenty years ago something similar happened, and I can't even mention the business because of confidentiality, but this was a business that had been running for 150 years with only two interruptions during that time. I was working on a project, and another guy was making changes unrelated to what I was working on, and it was on a Friday afternoon. Of course, protocol demands that every change, no matter how trivial, even it is as small as chaging a font, requires a testing plan.

    Well, he was from out of town and everyone else was tired, and he made one last change and said it was really nothing and it really didn't need to go through the verification process that takes a couples of hours. The IT department agreed, and he got on a plane. While he was in the air heading back to the west coast, they found that the entire plant was down because the change he made affected the core process.

    The head of IT and about a dozen others were fired before eight the next morning for skipping the test plan. I don't care how smart someone thinks they are, you gotta verify.

  4. #18
    breath westofyou's Avatar
    Join Date
    Oct 2000
    Location
    PDX
    Posts
    57,143

    Re: Facebook Outage

    Test, test, test and try not to have a deploy on a Friday, it will ruin your weekend

  5. #19
    Be the ball Roy Tucker's Avatar
    Join Date
    May 2001
    Location
    Mason, OH
    Posts
    18,373

    Re: Facebook Outage

    Quote Originally Posted by westofyou View Post
    Test, test, test and try not to have a deploy on a Friday, it will ruin your weekend
    I wish. Our PROD change windows are Friday night after 10 pm and Sunday morning 6-10 am. And I’m in IT security so we have our own changes and are involved in a lot of other groups’ changes. I’ve had a lot of lost weekends. Comes with the territory.
    She used to wake me up with coffee ever morning

  6. #20
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Facebook Outage

    Quote Originally Posted by Roy Tucker View Post
    I wish. Our PROD change windows are Friday night after 10 pm and Sunday morning 6-10 am. And I’m in IT security so we have our own changes and are involved in a lot of other groups’ changes. I’ve had a lot of lost weekends. Comes with the territory.
    Yeah, especially in production environments. I don't know how many projects I've done that are scheduled around holidays and weekends.

  7. #21
    breath westofyou's Avatar
    Join Date
    Oct 2000
    Location
    PDX
    Posts
    57,143

    Re: Facebook Outage

    Quote Originally Posted by Roy Tucker View Post
    I wish. Our PROD change windows are Friday night after 10 pm and Sunday morning 6-10 am. And I’m in IT security so we have our own changes and are involved in a lot of other groups’ changes. I’ve had a lot of lost weekends. Comes with the territory.
    Yeah, we used to have Friday night 11:30 deployments that I'd have to test, not a fan for sure.

  8. #22
    Winning is fun. RiverRat13's Avatar
    Join Date
    Feb 2010
    Posts
    1,947

    Re: Facebook Outage

    Quote Originally Posted by westofyou View Post
    What a nightmare, couldn't think of a company that deserved it more
    The St. Louis Cardinals.

  9. Likes:

    Bob Sheed (10-08-2021),Roy Tucker (10-08-2021)


Turn Off Ads?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please.

Thank you, and most importantly, enjoy yourselves!


RedsZone.com is a privately owned website and is not affiliated with the Cincinnati Reds or Major League Baseball


Contact us: Boss | Gallen5862 | Plus Plus | Powel Crosley | RedlegJake | The Operator