Information hiding is a research domain that covers a wide spectrum of methods that are used to make (secret) data difficult to notice. Due to improvements in network defenses such techniques are recently gaining an increasing attention from actors like cybercriminals, terrorist and state-sponsored groups as they allow to store data or to cloak communication in a way that is not easily discoverble.22 There are several real-world cases that reached the attention of the public media, including the following:23,38
- the arrest of one of al Qaeda's members in Berlin with video files containing hidden information on ongoing and future terrorists' operations (2012),a
- the exfiltration of confidential data from the U.S. to Moscow by Russian spies (2010),b
- the transfer of child pornographic material by a group of pedophiles called "Shadowz Brotherhood" (2002),c and
- the planning of a terrorist attack after the September 11, 2001 attacks. A number of articles suggested that al Qaeda members used steganography to coordinate their actions (2001).d
In these cases, information-hiding techniques were used to hide the confidential or illegal data into innocent-looking material, for example, digital pictures.
Steganography is a well-known subfield of information hiding that aims is to cloak secret data in a suitable carrier. Since the time of Ancient Greece, over the Medieval Ages, to today's world, information hiding has been often used to conceal messages on their way to a desired recipient.38 For instance, music notes were utilized to embed secret information that was only recognizable by the person that knows where to look for it. Another example for steganography is the writing with invisible ink.26 The use of covert techniques grew significantly during the two World Wars, in which the military developed several methods to hide information in innocent-looking objects. So-called microdots, for example, hide text by shrinking it to the size of a punctuation mark that can be hidden on a sheet of paper.
Today's form of information hiding follows the same origins as the digital era. Modern information-hiding techniques can be divided based on their application into two broad groups: covert data storage and covert data communication (Figure 1). Covert data storage allows the application of data-hiding techniques to conceal secret information in such a way that no one besides the involved persons will know where the information is stored or how to extract it. Digital media steganography and file/file system/mass storage steganography are the most prominent classes belonging to this group. On the other hand, covert data communication methods focus on hiding the fact that any communication process took place and were initially described as channels that were not foreseen for communication.18 This means that involved parties can participate in a covert communication and, in principle, a third-party observer would be unaware of it. The most important classes belonging to this group include out-of-band covert channels, network steganography (also known as network covert channels), as well as local covert channels (that are limited in communication range to the single device).
Here, we will briefly describe an evolution of the classes of techniques mentioned.
Modern Information Hiding: An Evolution of Techniques
It must be also noted that in the years between the early 1990s and about 2001, mostly academics considered information hiding as a relevant research discipline. This research domain was, however, shifted back into the focus of applied researchers, security professionals and law enforcement agencies (LEAs) after new cases became known in which information hiding was successfully applied for malicious purposes. Obviously, the biggest concern in LEAs is that covert techniques are being used to ensure stealth communication among terrorist/criminals and cybercriminals.
Below we briefly summarize the evolution of the data-hiding groups presented in Figure 1.
Digital media steganography incorporates techniques to hide information within digital images, audio files, and digital videos.8 Johnson and Katzenbeisser group such steganographic methods into six categories:15 steganographic methods can either substitute redundant data in cover objects, embed data in a signal's transform space, utilize spread spectrum techniques, change statistical properties of a cover object, or represent secret information by introducing a distortion into a signal. A sixth category differs from the first five and contains steganographic methods that create cover objects only for carrying secret information (instead of modifying existing cover objects). For a survey of this type of methods please refer to Cheddad.6
Another branch of modern information hiding is related to steganographic file systems. The first such approach was proposed by Anderson et al.1 The main idea relied on the fact that encrypted data resembles random bits naturally present on the disk. Therefore, only the ability to extract the vectors marking the file boundaries permitted the location process. Later, another approach for a hidden file system was proposed in Pang.25 Authors implemented a Linux-based steganographic file system that could preserve the integrity of the stored files and employ a hiding scheme in the disk space by camouflaging with the aid of dummy hidden files and abandoned blocks. Recently, the concept of a steganographic system has been proposed in Neuner,24 which utilizes system timestamps as a covert data storage. An alternative approach of secret data storage is to utilize chosen locations (such as Master File Table entries) or unorganized storage space on mass storage media.30
Single-host (local) covert channels have been studied for several decades.18 These covert channels allow the leakage of information between isolated processes by utilizing shared resources, for example, local files or disk arms.10 Their major goal is to break a mandatory security policy. Also, the applications and operating systems were analyzed to detect and analyze such possible covert channels and to prevent and limit them, for example, within the VAX security kernel.14 Recently, the most common case in which local covert channels were investigated involves the colluding applications scenario29 (proposed first in 2011). This scheme assumes the device is infected with a malware composed of two processes running in the separate sandboxes so they are unable to directly communicate. They use data-hiding techniques to establish a local covert channel to bypass the security framework of the infected device. Typically malicious processes are modulating common resources like shared notifications and file-system locks, or they alter the idle state of the CPU or the system load (for more information see methods surveyed in Mazurczyk21).
During last few years there is also an increased interest in so-called out-of-band covert channels quite similar to single-host covert channels but utilize a shared physical medium that can be accessed by both the sending and receiving process.4 Out-of-band channels do not necessarily break a mandatory security policy but rather allow the stealthy transfer of secret information. By using physical mediums, out-of-band covert channels overcome air-gaps between systems, for example, using non-audible acoustic channels,11 light, or vibration.12 Several other out-of-band covert channels are described in Carrara.4
It must be noted that recently secret-concealing techniques have spanned considerably to enable data hiding in many other areas, especially in network traffic: the so-called network information hiding (and in particular its sub-field called network steganography) constitutes a growing branch of modern information hiding.
Network steganography deals with the concealment of information within network transmissions.23 This means that network data that appears to be innocent is actually carrying hidden data. Network information hiding can be used, for example, by malware to conceal its command and control communication (instead of only encrypting it) while it is also suitable for a long-term stealthy data leakage, for example, after an organization was attacked using an advanced persistent threat.22 Indeed, since the Linux Fokirtor malwaree (2013), which hides data in SSH traffic, and Reginf (2014), which utilizes several network protocols to signal hidden information through these, more malware with more sophisticated features arose. Malware also increasingly exploits functions of online social networks to transfer hidden information, for example, embedded into exchanged messages or posted images. Fisk et al. reported in 2002 that a typical Web server could leak up to several gigabytes of data per year via network information-hiding methods.7 Now, 15 years later, Internet-based information leakage is considered to be much more powerful, resulting in exponentially increasing amounts of leaked data per time.
In comparison to digital media steganography, network steganography can be used for a constant data leakage.23 Therefore, network information hiding methods modify either the timing or the content of network traffic, for example, by modifying unused bits, the structure of protocol headers, the rate in which traffic is sent or the order of packets.36 First methods were introduced already in the 1980s and 1990s,9, 28 but most hiding methods for networks were published after 2000 and were focused on newer protocols such as IPv620 and LTE advanced,27 cyber-physical systems (such as smart homes/buildings),32 industrial communication protocols such as Modbus,19 and cloud computing.5 Extensive surveys discussing more than hundred methods can be found in Zander,36 Wendzel,35 and Mazurczyk.23 While a digital image allows to carry a rather limited number of bytes per single file, network traffic can permanently carry little amounts of data, at day and at night. A comparison of the two digital carrier types, such as digital media (the most well-researched and most popular cover for information hiding methods) and network traffic is shown in the accompanying table.
The biggest concern in law enforcement agencies is that covert techniques are being used to ensure stealth communication among terrorist/criminals and cybercriminals.
From this perspective, it must be emphasized that information-hiding technique utilization on a suspect's computer will not be discovered by a forensic analysis if it is not being directly sought. In general, all that criminals or terrorists need to enable a covert communication is to agree upon the carrier in which the secret data will be embedded. The carrier can be a digital image, audio, video, text file, network traffic, or any other digital medium. Obviously, this also involves specialized software exploiting this carrier.
Moreover, a covert sender and receiver will often utilize an encryption scheme (and a password/key) that will allow for securing the content even if the hidden communication is discovered. It must be also noted that in practice there is a plethora of various types of information-hiding techniques in which carriers can be modified and a great number of carriers that can be used for this purpose, which adds another dimension to the challenges for the forensic experts.
Current State of Information Hiding in Cybercrime and Forensics Challenges
In general, it is impossible to precisely evaluate how widespread the use of information-hiding techniques is among criminals/terrorists, cybercriminals, or state-sponsored groups. However, there are signs that information-hiding utilization can be heavily underestimated as security experts do not always correctly recognize and classify techniques used. For instance, by observing how malware developers increasingly apply information-hiding techniques we can be certain this trend is most likely going to increase. Figure 2 presents data that can support this claim. It illustrates the percentage of information hiding-capable malware identified (with respect to the total number of discovered malware) between 2011 and 2016 (historical data collected by members of the Criminal Use of Information HidingCUInginitiative, to be discussed later). We treated malicious software as an information hiding-capable malware when it has been used at least once for data-hiding techniques defined in Figure 1. To collect this information, we relied on reports from security companies, our own continuous malware landscape analysis and data from LEAs.
Nevertheless, the trend observed in Figure 1 may still be only a "tip of an iceberg." As a result, discovery of data-hiding tools will become a great challenge for LEAs, counterterrorism organizations, and forensic experts.
It is also worth noting that information-hiding techniques were initially only found in highly sophisticated malware like Regin, Duqu,g or Hammertoss,h which by security experts are thought to be created by nation-states to infiltrate a wide range of international targets and eventually launch attacks if necessary.
However, currently it can be observed that not only APTs, but even the typical malware is turning toward increased utilization of data hiding. This is somehow expected as typically sophisticated malicious software is supported by an actor that is not strongly resource-constrained (in money, human resources or in time). Therefore, takeover of advanced information hiding techniques by cybercriminals can be the result of a "trickle down" effect from "milware" (that is, state-sponsored malicious software) to "malware" (created by non-state groups) as described in Zieliska.38 It is also worth noting that, typically, cybercriminals will mostly focus on hiding as much information as necessary, whereas nation-state actors will try to conceal as much data as possible.
Based on the aim to be achieved, information-hiding techniques can be used by criminals/terrorists and other malicious actors for the following purposes (Figure 3):
- As a mean for covert storage: To hide secret data in such a way that no one besides the owner is authorized to discover its location and retrieve it. In other words, the aim is to not reveal the stored secret to any unauthorized party. This way criminals/terrorists can store their secret data in a hidden manner (such in the case of pedophiles group "Shadowz Brotherhood" mentioned earlier).
- As a covert communication tool: To communicate messages with the aim of keeping some aspect of their exchange secret. Criminals/terrorists can use information hiding to covertly exchange their confidential data (for example, as in case of the Russian spy ring discovered in U.S.).
- As a data exfiltration technique: Cybercriminals/insiders can use it to steal/exfiltrate confidential data (this is the case for a Zeus/Zbot trojani).
- As a mean for covert malware communication: Finally, malware can be equipped with information hiding techniques to become stealthier while residing on the infected host and/or while communicating with Command & Control (C&C) servers (for example, a Hammertoss APT).
From the forensic challenges perspective first it must be noted that there is a huge asymmetry when it comes to devising new information hiding techniques and its detection/elimination. Although research on counter-measures started early (Zander37), their application in practice can be challenging or impossible (for example, because they were designed for the design phase of a system, not for a forensic analysis). Developing new data-hiding methods is usually much easier than the effort needed to detect them.
Additionally, if the carrier is selected properly (that is, if the carrier is popular enough so it is not an anomaly itself) even trivial techniques can remain hidden for long periods of time. What is worse from a cybercrime perspective: there are many information-hiding tools that are easy to access and use, even for an unexperienced user. In April 2014, the Steganography Analysis and Research Center (SARC) claimed their latest version of the Steganography Application Fingerprint Database contained over 1,250 steganography applications.
It must be also noted that many of the commercial tools for information-hiding detection do not exactly focus on revealing the embedded secret data but rather try to find artifacts left behind by the hiding tools. This appears to be a good approach; however, it is only successful for the list of well-known data hiding tools or under assumption that it was the legitimate user of the device who installed this type of software. In practice, if a proprietary data-hiding tool is utilized or the device is infected with information hiding-capable malware, revealing its artifacts will be not possible or the true intention of the attacker will be still difficult to establish.
In contrast, the detection of hidden data, which is typically done by the forensics examiner for LEAs or anti-terrorism units, is far more challenging and the extraction/recovery of the secrets is even harder (for example, due to utilized encryption of covert data).
Another point is that still many forensics examiners do not routinely check a suspect's computer for information-hiding software and, even if they do, several issues arise (see the accompanying sidebar).
When discussing forensic challenges for information hiding, the two most important aspects that should be considered are technical capabilities of the suspect and type of the crime.17
The technical capabilities of the suspect may map to the resources that he has on his computer (installed software, hardware) or which he accessed (for example, visited webpages or downloaded e-books).
The type of a suspected crime can also point to a utilization of data hiding. For example, terrorists or child pornographers tend to hide their secrets in images and then send it through email or by posting it on a website. A similar case is with crimes that involve the transfer of business-type records. Obviously, if a cybercrime is investigated then information hiding usage is always a viable option.
As mentioned, methods as well as applications of information hiding have become significantly more sophisticated in recent years. For example, an arising challenge for forensic experts are solutions like SonicVortex Transactions (http://www.sonic-coin.com/) that can be treated as a next-generation crypto currency platform. It enables hiding encrypted bitcoin transactions in innocent pictures and offers a stealthy address and built-in TOR support. This can potentially provide tremendous difficulties for investigating financial crimes.
In addition, there have been press releases stating that criminals/terrorists are exploiting different aspects of online games/gaming consoles to enable covert communicationj as they offer many aspects including digital images, video, network traffic, and even elements within the virtual world that can be modified in order to conceal messages. It must be noted this option was recognized in the academia community almost a decade ago.30
A few academic publications already deal with the transfer and storage of hidden data in the Internet of Things (IoT).23,32 It is likely it will only be a matter of time before IoT services, such as smart homes and wearables, will be subjected to information hiding-based cybercrime. IoT devices provide entirely new ways to store hidden data in their actuators and in the memory of embedded componentsplaces where no current methods will search for hidden data and for which no tailored tools are available.
Many of the commercial tools for information-hiding detection do not exactly focus on revealing the embedded secret data but rather try to find the artifacts left behind by the hiding tools.
Also, other popular and innocent-looking online services like Skype, IP telephony, BitTorrent, or cloud storage systems can be exploited to enable covert communication.23 Therefore, network traffic and data exchanged during such transmissions can be utilized for information hiding purposes often without overt sender and overt receiver knowledge or consent.
Moreover, hiding tools became increasingly adaptive. In this context, adaptiveness refers to a malware function that can automatically adjust to a changing environment. For instance, imagine an administrator discovered a malware using network information hiding and he decides to block its communication channel. In such a case, the administrator could introduce new filter rules for firewalls or traffic normalizers. However, adaptive malware would detect the blocked channel and would most-likely find a way to route around this barrier. Therefore, it uses one of the several different hiding methods available, eventually building a covert overlay network with dynamic routing capabilities. These techniques were already discovered several years ago in academia.2,33
A New Initiative to Fight Information Hiding-Based Cybercrime
In the context of the forensic challenges mentioned previously, policymakers, governmental organizations, and law-enforcement, security industry and academia should work jointly to build novel products and methods to protect companies and citizens.
Criminal Use of Information Hiding (CUIng) is an initiative recently launched in cooperation with Europol's European Cybercrime Centre (EC3). The initiative is open for all interested members from different backgrounds to participate in it. The current structure of the initiative consists of the Steering Committee and regular members. The Steering Committee is responsible for setting the strategic direction of the initiative and proposing, approving and coordinating all its activities. The Steering Committee is a mix of members from academia, industry, LEAs, and institutions.
Its main objectives, which are summarized in Figure 4, are the following:
- to raise awareness for the criminal use of information hiding on all relevant levels (from IT administration to governments),
- to track progress of academic research in the domain,
- to monitor the technology's utilization by criminals,
- to share information about incidents between the relevant players,
- to provide practical advice to these players,
- to work jointly with researchers around the world, and
- to foster the education and training on the professional and academic level.
Although the CUIng initiative started only recently, its circle of involved organizations and individuals has managed to collect and categorize a vast amount of relevant information, be it about discovered malware pieces, information about incidents or research output. Currently it consists of more than 100 experts from over 30 countries worldwide who are presenting different backgrounds. The initiative gathers and shares the following information:
- General background on information hiding: it provides a general overview of recent trends and techniques.
- Scientific publications: relevant papers, which present the state-of-the-art in academic research in this area.
- Information hiding-capable malware reports: analyses of real-life malicious software that uses data hiding techniques.
- Relevant tools: applications which allow to conceal data as well as different approaches for countermeasure/detection.
- New categorization and didactic concepts and training course materials: new concepts on how to teach/train information hiding to make the topic more accessible and materials from the previous trainings.
First surveyed results and trends were presented at relevant events (both academic and industry conferences). For instance, CUIng was a program partner for several industry eCRIME conferences and it will organize a dedicated event CUING 2017 workshop (with ARES 2017) and a special session (with IWDW 2017) on these topics.
CUIng helped Europol's EC3 to create a CyberBit (intelligence notification on cyber-related topics that aim to raise awareness and to trigger discussions or further actions), that is, a brief backgrounder for the Trends Series entitled "Steganography for Increased Malware Stealth." In January 2016, a dedicated training course for EC3 entitled "Training on Information Hiding Techniques and its Utilization in Modern Malware" was organized.
CUIng members are also involved in creating new tools, projects, and concepts for digital forensic purposes. The most notable examples include:
- Network Information Hiding Patterns projectk that allows the reduction of a large number of available hiding techniques to only several patternsthis can aid the community to remain focused on core developments and to understand better the network hiding concepts.
- Covert Channel Educational Analysis Protocol (CCEAP) tool,34 which defines a sample protocol to teach various network hiding patterns and can be used in didactic environments. The tool is unique as it lowers the barrier for understanding network covert channels by eliminating the requirement of understanding several network protocols in advance. However, it must be noted, that testbeds not based on hiding patterns33 exist.
- Removed Steganography Application Scanner (RSAS) tooll that enables to discover artifacts of the known steganographic applications even if they were previously uninstalled or run from a portable memory storage.
Current CUIng members' experiences related to the initiative show that cooperating jointly and building a robust community will take advantage of the expert knowledge and expertise from academia, industry, law enforcement, and institutions. This networking approach does not eliminate but limits the problem of the criminal use of information hiding before it becomes a much more widespread phenomenon. It must be also noted that CUIng is about to release a first set of guidelines for the protection of organizations and the forensic analysis in the coming year.
The increasing number of known cases in which modern information hiding is applied in cybercrime as well as the constantly rising number of academic publications in the field underpin the importance of the topic and the broad interest in it. It is important to foster professional and academic training on information hiding, a better understanding and the improvement of the methodology in the field, especially for forensics. Another need is to enable a better sharing of incidents and trends. The CUIng initiative presented here is a vehicle to push these processes.
More information about CUIng can be found at http://www.cuing.org.
Acknowledgments. The authors thank the anonymous reviewers for helpful and constructive comments that greatly contributed to improving this article.
Figure. Watch the authors discuss their work in this exclusive Communications video. https://cacm.acm.org/videos/information-hiding
12. Hasan, R., Saxena, N., Haleviz, T., Zawoad, S. and Rinehart, D. Sensing-enabled channels for hard-to-detect command and control of mobile devices. In Proceedings of the Symp. Information, Computer and Communications Security. ACM, New York, NY, 2013, 469480.
24. Neuner, S., Voyiatzis, A.G., Schmiedecker, M., Brunthaler, S., Katzenbeisser, S. and Weippl, E.R. Time is on my side: Steganography in file system metadata. Digital Investigation 18 (2016), S76S86.
27. Rezaei, F., Hempel, M., Peng, D., Qian, Y. and Sharif, H. Analysis and evaluation of covert channels over LTE advanced. In Proceedings of the Wireless Communications and Networking Conference. IEEE, 2013, 19031908.
29. Schlegel, R., Zhang, K., Zhou, X., Intwala, X., Kapadia, A., Wang, X.: Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smartphones, in: Network and Distributed System Security Symposium, 2011.
30. Thompson, I. and Monroe, M. FragFS: An advanced data hiding technique. BlackHat Federal, 2006; http://www.blackhat.com/presentations/bh-federal-06/BH-Fed-06-Thompson/BH-Fed-06-Thompson-up.pdf
34. Wendzel, S. and Mazurczyk, W., Poster: An educational network protocol for covert channel analysis using patterns. In Proceedings of the ACM Conference on Computer and Communications Security (Vienna, Austria, Oct. 2428, 2016), 17391741.
©2018 ACM 0001-0782/18/1
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.