Skip to main content

  1. Blog Posts/

Border Gateway Protocol (BGP): Out of Sight; Until Things go Horribly Wrong

·1719 words·9 mins

Intro #

While working on some networking code for a reviving a project I had started in 2021, I decided to pick up Network Programming with Go by Adam Woodbeck. Re-learning basic networking concepts was refreshing. Adam also gave useful insights into major incidents caused by network outages over the years. We’ll see one of them very early into this article. I recalled a few of the outages from old news posts. Network failures are common. In fact, one of the most known fallacies of software architecture is that ’the network is stable’. But what caught my eye was some portion that he wrote about the Border Gateway Protocol. In his words - BGP usually makes news only when something goes wrong. I decided to dig a bit into this and the article that you are reading now is the result of said digging.

A Revolution #

Since the now-ubiqutous three-napkin protocol was first introduced in 1989 in RFC-1105, it has become a pillar of the modern connected world. First formalized for the internet IPV6 in 1994, BGP underwent through a few improvement iterations until 2006 (BGP4), when it started supporting CIDR and more performant routing tables.

This is what CISCO has to say about the Border Gateway Protocol:

Border Gateway Protocol (BGP) is an Internet Engineering Task Force (IETF) standard, and the most scalable of all routing protocols. BGP is the routing protocol of the global Internet, as well as for Service Provider private networks. BGP has expanded upon its original purpose of carrying Internet reachability information, and can now carry routes for Multicast, IPv6, VPNs, and a variety of other data.

But the history of BGP hasn’t been all hunky dory. As with all attempts made to stabilize “The Network”, it has had its fair share of ups and downs. Usually due to incidents born out of a human error, BGP often has had to bear the brunt of being attributed as the cause of massive let-downs in the internet’s dependability.

Let’s look at a couple of these.

Youtube is Down (2008) #

Two years after Google bought Youtube, and when countries were still coming to terms with the Youtube’s unbridled power of transparency in the hands of their citizens, an incident rocked the video media giant.

In 2008, Pakistan Telecommunications Company effectively took down YouTube worldwide after the Pakistani Ministry of Communications demanded the country block youtube.com in protest of a YouTube video. Pakistan Telecom used BGP to send all requests destined for YouTube to a null route, a route that drops all data without notification to the sender. But Pakistan Telecom accidentally leaked its BGP route to the world instead of restricting it to the country. Other ISPs trusted the update and null routed YouTube requests from their clients, making youtube.com inaccessible for two hours all over the world. - Adam Woodbeck in Network Programming with Go

For all we know, this should have been an internal BGP update (iBGP) that was supposed to be contained within the autonomous system managed by PCTL - Pakistan’s largest ISP who actually published the bad routes to the global BGP routes. Whatever be the actual cause of the misconfiguration, what ultimately was compromised was trust.

Later that year, two security researchers demonstrated how anyone with a BGP router could intercept communication targeted toward a particular IP address at the 2008 DefCon.

Facebook is Down (2021) #

This was while the world was still recovering from the COVID mess, and a major portion of internet users in the developing world were increasingly dependent on Free Basics from Facebook.

On one Monday morning in October 2021, one of Facebook’s engineering teams, executed a command to assess the availability of their global backbone capacity. This small command took down the mighty Facebook, WhatsApp, Messenger, Instagram, Oculus and Mapillary for about 7 hours. This also prevented users using the ’log in with facebook’ feature which depends on GraphQL calls and oAuth redirects. The engineering team never revealed what the actual command was or what was the exact bug that prevented its auditing scripts from catching the mistake, but they did release a post-outage article explaining to the layman what happened and why. Here’s an excerpt from their release:

One of the jobs performed by our smaller facilities is to respond to DNS queries. DNS is the address book of the internet, enabling the simple web names we type into browsers to be translated into specific server IP addresses. Those translation queries are answered by our authoritative name servers that occupy well known IP addresses themselves, which in turn are advertised to the rest of the internet via another protocol called the border gateway protocol (BGP). To ensure reliable operation, our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers, since this is an indication of an unhealthy network connection. In the recent outage the entire backbone was removed from operation, making these locations declare themselves unhealthy and withdraw those BGP advertisements. The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers.

Note the text in bold within the excerpt. Perhaps hardened by past BGP mistrust cases, FB’s engineering team had taken a decision to stop publishing BGP advertisements if there was a problem within their DNS severs. A sensible approach on its own - but one that misfired when coupled with another issue.

There are many more cases that call for a brief go-over, but we won’t be going into those for the sake of brevity. For a well documented list of BGP related incidents, you should probably take a look at Doug Madory’s “A Brief History of the Internet’s Biggest BGP Incidents”.

Let’s take a moment to see how exactly BGP works and where exactly does trust factor in.

How BGP Works (Usually!) #

The diagram below shows how ISPs (Internet Service Providers) advertise their routes. All three clouds in the diagrams are autonomous systems (AS).

How BGP Routes are Advertised
How ISPs Advertise all BGP Routes from CISCOPress

The above diagram can be better understood through the below BGP peering topology diagram

BGP Peering Topology
A Sample BGP Peering Network with Peer Sessions from JuniperNetworks

This topology shows a network with BGP peer sessions. In the sample network, Device E in AS 17 has BGP peer sessions to a group of peers called external-peers. Peers A, B, and C reside in AS 22 and have IP addresses 10.10.10.2, 10.10.10.6, and 10.10.10.10. Peer D resides in AS 79, at IP address 10.21.7.2. This example shows the configuration on Device E.

A BGP peering session itself looks like this:

BGP Peering Session
A BGP Peering Session from JuniperNetworks

Router A is a gateway router for AS 3, and Router B is a gateway router for AS 10. For traffic internal to either AS, an interior gateway protocol (IGP) is used (OSPF, for instance). To route traffic between peer ASs, a BGP session is used.

Each AS has its own iBGP implementation, which it uses internally to map routes. For communicating with peers.

eBGP and iBGP
eBGP and iBGP from CISCOPress

As BGP routes are published over eBGP and accepted as they are by other autonomous systems, each AS in the system benefits from the data in addition to becoming vulnerable to both malicious intent and mistakes of its peers. This is where the trust factor is so crictical.

The Stellar Performance of BGP Routing Tables #

The tables below shows how the IPV4 & IPV6 BGP Tables have grown over the course of five years. While there are tonnes of insightful information that we could derive from this date, let’s focus on the increase in the size of the root prefixes in either versions of the Internet Protocol.

IPV4 #

Growth of the IPV4 BGP Network between 2017 and 2022
Table 1: Growth of the IPV4 BGP Network between 2017 and 2021 from APNIC

IPV6 #

Growth of the IPV6 BGP Network between 2017 and 2022
Table 2: Growth of the IPV6 BGP Network between 2017 and 2021 from APNIC

Matching route requests from within a table size of almost 1M records for IPV4 is must take incredible performance optimization. Route Aggregation is what does it for BGP. More information how route aggregation works can be found on CISCO’s documentation.

Mitigating DDoS Attacks With a BGP Blackhole #

A DDoS attack attempts to flood a route with a massive number of requests - so that it becomes impossible for the provider of services on that routes to serve genuine requests. Thankfully, a BGP blackhole can be used once something common to those malicious requests is be identified. Then these requests are filtered with something known as RTBH (Remote Triggered Black Hole) filtering.

Fast Netmon, a network security company specializing in protection against such attacks gives a concise description of how this phenomenon works:

The high-level concept of using BGP Blackhole to detect and mitigate DDoS attacks is relatively straightforward. When a DDoS attack is detected on a router, traffic is redirected to null0 interface – the black hole in this case. Routing traffic to this null-route effectively drops it from the network, never to be seen again, much like the natural phenomenon it’s named after. BGP routing usually takes place on the /32 /128 level (hosts IPv4, Ipv6).RTBH filtering is a specific approach that uses BGP routing protocol updates to manipulate route tables at the network edge to specifically drop illegitimate traffic before it enters the service provider network. You can use iBGP or announce BlackHole routes via a specific BGP community to redistribute or blackhole this traffic on the upstream ISP side.

How DDoS attacks are Nullified
How DDoS Attacks are Nullified from SENKI

Conclusion #

For close to two decades now, BGP is the glue that connects the internet together. Most people, not working closely in Network Engineering would rarely have a chance to know about it - unless something goes horribly wrong and BGP makes it to the mainstream news. In light of that, it made sense to write a post shining some light on the silent hero of the internet before the next big incident occurs.

Disclaimer: I am not a network engineering expert and this article should be considered a base level technical introduction into the Border Gateway Protocol. Please use the links within the article as doorways to portals of information for more in-depth knowledge on the topic.