Network traffic classification is a fundamental problem in networking. Given observations of network traffic, the goal is to infer properties of interest, such as what application generated the traffic. This enables network operators to monitor and optimize performance, detect anomalies or malware, block unwanted traffic, inform capacity planning, and so on.
The problem has been extensively studied for more than 20 years, using a combination of heuristics, based on domain expertise, and automated methodologies. Some techniques rely on hard-coded rules, such as the use of well-known ports or servers. For example, a DNS request, the HTTP Host field, or the SNI field in TLS, may all reveal the name of the server contacted (for example, server.netflix.com), which may in turn be indicative of the service itself. Other techniques rely on behavioral characteristics, such as flow statistics, communication patterns, or traffic volume time series. For example, voice-over-IP applications generate small, evenly spaced packets; Web applications produce bursty traffic; and smart home devices exchange occasional status updates and commands with the cloud.