Fixing the Internet

lock in electronic pattern, illustration

Few people pay much attention to how the electrical grid works until there is an outage. The same is often true for the Internet.

Yet unlike the electrical grid, where direct attacks are infrequent, vulnerabilities and security issues with the Internet’s routing protocol have led to numerous, frequent malicious attacks that have resulted in widespread service outages, intercepted and stolen personal data, and the use of seemingly legitimate Web sites to launch massive spam campaigns.

The Internet is an interconnected global network of autonomous systems or network operators, like Internet service providers (ISPs), corporate networks, content delivery networks (such as Hulu or Netflix), and cloud computing companies such as Google and Microsoft Cloud. The Border Gateway Protocol (BGP) is used to ensure data can be directed between networks along the most efficient path, similar to how a GPS navigation system maintains a database of street addresses and can assess distance and congestion when selecting the optimal route to a destination.

Each autonomous system connected to the Internet has an Internet Protocol (IP) address, which is its network interface, and provides the location of the host within the network; this allows other networks to establish a path to that host. BGP routers managed by an ISP control the flow of data packets containing content between networks, and maintains a standard routing table used to direct packets in transit. BGP makes routing decisions based on paths, rules, or network policies configured by each network’s administrator.

BGP was first described in a document assembled by the Internet Society’s Network Working Group in June 1989 and was first put into use in 1994. BGP is extremely scalable, allowing tens of thousands of networks around the world to be connected together, and if a router or path becomes unavailable, it can quickly adapt to send packets through another reconnection. However, because the protocol was designed and still operates on a trust model that accepts that any information exchanged by networks is always valid, it remains susceptible to issues such as information exchange failures due to improperly formatted or incorrect data. BGP can also be at the mercy of routers too slow to respond to updates, or that run out of memory or storage, situations that can cause network timeouts, bad routing requests, and processing problems.

Aftab Siddiqui, senior manager of Internet technology at the Internet Society, says the initial BGP protocol was conceived by experts at research institutions, defense organizations, and equipment vendors. “When they designed [BGP], it was based on the premise that everybody trusts each other,” Siddiqui says. “Fast-forward 30 years, I’m pretty sure we cannot claim that anymore.”

As such, BGP is also vulnerable to BGP hijacks or inadvertent IP address leaks, in which route and IP address information can be deliberately intercepted, redirected, or dropped, simply by the advertisement of incorrect or corrupted routes via the BGP protocol. All a malicious actor needs to do is announce a route to IP prefixes that it doesn’t own, thereby funneling traffic to its own servers where it can do whatever it pleases with that data, including stealing personal, business, or financial information, or launching cyber-attacks from that hijacked IP address. Further, because the base BGP protocol accepts all route advertisements as legitimate, traffic to those legitimate IP prefixes can be routed through to that malicious actor’s site until someone notices and fixes the error.

BGP has been updated to include tools to validate the originator of these routing messages, as well as filter out known malevolent IP addresses, but not every network operator is using those tools. Just as a tiny, undetected hole can sink an entire ship, a single security lapse used in an attack can shut down entire networks. In fact, BGP hijacks such as that described here are frequent, with an average of 14 attacks per day between mid-January and June 2020, according to the Internet Society.

Leading the charge to increase the focus on routing security is the Internet Society’s Mutually Agreed Norms for Routing Security (MANRS) working group. MANRS announced in December 2020 a new task force charged with defining and publishing an updated set of actions and metrics to measure the progress of networks adopting the more-robust routing security tools and practices.

Siddiqui, who serves as the MANRS project lead, says that while the number of autonomous systems or networks has swelled to more than 70,000 worldwide, the baseline BGP protocol is still working, thanks to its ability to grow as the number of networks rises. “BGP is very scalable, so nobody wants to make any changes in the baseline of the protocol, because it works perfectly fine, except for the trust issue,” Siddiqui says. He adds that additional processes have been put into place to make routing more secure, but networks need to use them.

While MANRS has more than 500 network-operator participants, many more are simply not using the tools. “That’s why we started this initiative, to engage with the network community,” Siddiqui explains, noting that the message to all network operators is to “be sure that you implement the best practices to secure the global routing tables.”

A key tool that has been implemented is Resource Public Key Infrastructure (RPKI), instead of solely relying on Internet Routing Registries (IRRs), where network operators store information about their routing policies and routed prefixes. While filtering rules are still useful in ensuring only valid routes are accepted from neighboring networks, the need to constantly maintain and update records is both labor- and time-intensive, often requiring networks to reach out to other networks to validate information.

On the other hand, RPKI, a distributed public database of cryptographically signed records containing routing information supplied by autonomous systems or networks, is considered to be the ultimate “truth” for network information, according to Siddiqui. RPKI is carried out by a process known as route origin validation (ROV), which uses route origin authorizations (ROAs)—digitally signed objects that fix an IP address to a specific network or autonomous system—to establish the list of prefixes a network is authorized to announce.

To conduct the validation of network prefixes, a third-party validation software is run to establish an RPKI-to-BGP router session, which downloads the ROAs in the various repositories, verifies their digital signatures, and then makes the results available for use in the BGP workflow. Once validated, an ROA can be used to generate route filters, and other networks are then able to access these records and use them to validate BGP announcements they receive as accurately identifying their origins.

While RPKI use has grown over the past few years, most of that adoption has been by large ISPs such as AT&T, NTT, and Cogent, announcing they are performing origin validation. But in order to ensure complete protection, all operational networks will need to register their routes to enable hijack protection. A key member of the initiative is Google which, to date, has registered more than 99% of its routes in the RPKI, and has announced plans to deploy ROV this year to ensure any invalid routes are rejected.

Google is among the large networks that has participated in MANRS-led task forces to incorporate better security practices, to ensure BGP can continue to serve as a secure routing protocol. “We can keep tackling this [within Google], but why don’t we make it easier for these other players who are now emerging and have more interdependence on the BGP infrastructure than they realize,” says Royal Hansen, vice president of security at Google. “So it was a chance to use Google’s position in the market and our experience in these kinds of problems to see what we could do to make it easier for others so that we close those final gaps in the way BGP routing works.”

Just as a tiny hole can sink an entire ship, a single security lapse used in an attack can shut down entire networks.

Another way Google is helping to promote stronger RPKI and other BGP security controls is via its peering portal, which is a way for Google to assess how well its peers (other networks that have shared BGP routing information with Google to let traffic flow in a more efficient, stable way) are implementing BGP best practices. The peering portal provides Google with a way to share potentially invalid route information, show peers their RPKI status, and also flag peers when they are not applying the proper filtering or other safeguards to routes shared with Google.

Google’s lead-by-example approach is helpful in raising awareness of the steps needed to ensure BGP can be used securely; it may be able to coerce other large ISPs and networks to adopt these best practices simply because so much traffic flows through Google’s networks.

“If we did not participate in this initiative, the probability of its success would be pretty low,” says Bikash Koley, vice president of Google Global Networking (GGN). Koley said that in order to truly close the holes inherent in BGP, there is still a significant amount of engineering work required, which means small ISPs must dedicate resources to these types of initiatives, and that may be difficult to achieve quickly. That is one reason why Google has been working with MANRS to publish information about how it implements route filtering, which should make it easier for smaller networks that want to peer with Google to simply piggyback on an established approach.

However, it is not just Google that is working to implement better practices, such as moving to RPKI or implementing ROV. “There’s Netflix, Akamai, Microsoft, Cloudflare, and other top providers,” Siddiqui says. “So, if you want to pair with them, then [smaller] operators will need to fix their registry information, and only then will they be able to talk to [the large content delivery networks and cloud providers].”

While getting the majority of the 70,000+ global networks on board with using RPKI and other best practices for network routing is a high bar to clear (and one that likely will take years to achieve), Hansen says it is imperative to close this loophole.

“Sometimes we do say the attacker only has to get it right once, and the defender has to be right all the time,” Hansen says, noting the challenges faced by security systems and professionals. Further, the distributed ownership and management of Internet networks adds to the challenge.

“One of the challenges with the Internet is that you have ISPs of all sizes,” Koley says. “You have really tiny ones, as well as really large ones, and they are located all around the world. And I would say that this system is as strong or as weak as the weakest link.”

Further Reading