Much computer science research is interdisciplinary, bringing together experts from multiple fields to solve challenging problems in the sciences, engineering, and medicine. One area where the interface between computer scientists and domain scientists is especially strong is wireless sensor networks, which offer the opportunity to apply computer science concepts to obtaining measurements in challenging field settings. Sensor networks have been applied to studying vibrations on the Golden Gate Bridge,1 tracking zebra movements,2 and understanding microclimates in redwood canopies.4
Our own work on sensor networks for volcano monitoring6 has taught us some valuable lessons about what's needed to make sensor networks successful for scientific campaigns. At the same time, we find a number of myths that persist in the sensor network literature, possibly leading to invalid assumptions about what field conditions are like, and what research problems fall out of working with domain scientists. We believe these lessons are of broad interest to "applied computer scientists" beyond the specific area of sensor networks.
Our group at Harvard has been collaborating with geophysicists at New Mexico Tech, UNC, and the Instituto Geofísico in Ecuador for the last five years on developing wireless sensor networks for monitoring active and hazardous volcanoes (see Figure 1). We have deployed three sensor networks on two volcanoes in Ecuador: Tungurahua and Reventador. In each case, wireless sensors measured seismic and acoustic signals generated by the volcano, and digitized signals are collected at a central base station located at the volcano observatory. This application pushes the boundaries of conventional sensor network design in terms of the high data rates involved (100Hz or more per channel); the need for fine-grained time synchronization to compare signals collected across different nodes; the need for reliable, complete signal collection over the lossy wireless network; and the need to discern "interesting" signals from noise.
These deployments have taught us many lessons about what works and what doesn't in the field, and what the important problems are from the perspective of the domain scientists. Interestingly, many of these problems are not the focus of much of the computer science research community. Our view is that a sensor network should be treated as a scientific instrument, and therefore subject to the same high standards of data quality applied to conventional scientific instrumentation.
First, let us dispel a few common myths about sensor network field deployments.
Myth #1: Nodes are deployed randomly. A common assumption in sensor network papers is that nodes will be randomly distributed over some spatial area (see Figure 2). An often-used idiom is that of dropping sensor nodes from an airplane. (Presumably, this implies that the packaging has been designed to survive the impact and there is a mechanism to orient the radio antennas vertically once they hit the ground.)
Such a haphazard approach to sensor siting would be unheard of in many scientific campaigns. In volcano seismology, sensor locations are typically chosen carefully to ensure good spatial coverage and the ability to reconstruct the seismic field. The resulting topologies are fairly irregular and do not exhibit the spatial uniformity often assumed in papers. Moreover, positions for each node must be carefully recorded using GPS, to facilitate later data analysis. In our case, installing each sensor node took nearly an hour (involving digging holes for the seismometer and antenna mast), not to mention the four-hour hike through the jungle just to reach the deployment site.
Myth #2: Sensor nodes are cheap and tiny. The original vision of sensor networks drew upon the idea of "smart dust" that could be literally blown onto a surface. While such technology is still an active area of research, sensor networks have evolved around off-the-shelf "mote" platforms that are substantially larger, more power hungry, and expensive than their hypothetically aerosol counterparts ("smart rocks" is a more apt metaphor). The notion that sensor nodes are disposable has led to much research that assumes it is possible to deploy many more sensor nodes than are strictly necessary to meet scientific requirements, leveraging redundancy to extend network battery lifetime and tolerate failures.
It should be emphasized that the cost of the attached sensor can outstrip the mote itself. A typical mote costs approximately $100, sometimes with onboard sensors for temperature, light, and humidity. The inexpensive sensors used on many mote platforms many not be appropriate for scientific use, confounded by low resolution and the need for calibration. While the microphones used in our volcano sensor network cost pennies, seismometers cost upward of thousands of dollars. In our deployments, we use a combination of relatively inexpensive ($75 or so) geophones with limited sensitivity, and more expensive ($1,000) seismometers. The instruments used by many volcano deployments are in the tens of thousands of dollars, so much that many research groups borrow (rather than buy) them.
Working with domain scientists has taught us some valuable lessons about sensor network design.
Myth #3: The network is dense. Related to the previous myths is the idea that node locations will be spatially homogeneous and dense, with each node having on the order of 10 or more neighbors in radio range. Routing protocols, localization schemes, and failover techniques often leverage such high density through the power of many choices.
This assumption depends on how closely aligned the spatial resolution of the desired network matches the radio range, which can be hundreds of meters with a suitably designed antenna configuration. In volcanology, the propagation speed of seismic waves (on the order of kilometers per second) dictates sensor placements hundreds of meters apart or more, which is at the practical limit of the radio range. As a result, our networks have typically featured nodes with at most two or three radio neighbors, with limited opportunities for redundancy in the routing paths. Likewise, the code-propagation protocol we used worked well in a lab setting when all of the nodes were physically close to each other; when spread across the volcano, the protocol fell over, probably due to the much higher degree of packet loss.
Working with domain scientists has taught us some valuable lessons about sensor network design. Our original intentions were to leverage the collaboration as a means of furthering our own computer science research agenda, assuming that whatever we did would be satisfactory to the geophysicists. In actuality, their data requirements ended up driving our research in several new directions, none of which we anticipated when we started the project.
Lesson #1: It's all about the data. This may seem obvious, but it's interesting how often the actual data produced by a sensor network is overlooked when designing a clever new protocol or programming abstraction. To first approximation, scientists simply want all of the data produced by all of the sensors, all of the time.
The approach taken by such scientists is to go to the field, install instruments, collect as much data as possible, and then spend a considerable amount of time analyzing it and writing journal papers. After all, data collection is expensive and time consuming, and requires working in dirty places without a decent Internet connection. Scientists have a vested interest in getting as much "scientific value" as possible out of a field campaign, even if this requires a great deal of effort to understand the data once it has been collected. In contrast, the sensor network community has developed a wide range of techniques to perform data processing on the fly, aggregating and reducing the amount of data produced by the network to satisfy bandwidth and energy constraints. Many of these techniques are at odds with the domain scientists' view of instrumentation. No geophysicist is interested in the "average seismic signal" sampled by multiple nodes in the network. We advocate a two-pronged approach to this problem. The first is to incorporate large flash memories onto sensor nodes: it is now possible to build multi-gigabyte SD or Compact Flash memory into every node, allowing for months of continuous sensor data to be stored locally.
Though this converts the sensor node into a glorified data logger, it also ensures that all of the data will be available for (later) analysis and validation of the network's correct operation. It is often necessary to service nodes in the field, such as to change batteries, offering an early opportunity to retrieve the data manually by swapping flash cards. The second approach is to perform data collection with the goal of maximizing scientific value while satisfying resource constraints, such as a target battery lifetime. Our work on the Lance system5 demonstrated it is possible to drive signal downloads from a sensor network in a manner that achieves near optimal data quality subject to these constraints. Figure 3 shows the rectification of raw signals collected from the network.
Inherent in this approach is the assumption that not all data is created equal: there must be some domain-specific assignment of "value" to the signals collected by the network to drive the process. In volcano seismology, scientists are interested in signals corresponding to geophysical events (earthquakes, tremors, explosions) rather than the quiet lull that can last for hours or days between such events. Fortunately, a simple amplitude filter running on each sensor node can readily detect seismic events of interest.
Lesson #2: Computer scientists and domain scientists need common ground. It should come as no surprise that the motivations of computer scientists and "real" scientists are not always aligned. Domain scientists are largely interested in obtaining high-quality data (see Lesson #1 above); whereas computer scientists are driven by the desire to do "cool stuff:" new protocols, new algorithms, new programming models. Our field thrives on novelty whereas domain scientists have an interest in measured conservatism. Anything new we computer scientists throw into the system potentially makes it harder, not easier, for the domain scientists to publish papers based on the results.
Finding common ground is essential to making such collaborations work. Starting small can help. Our first volcano deployment involved just three nodes running for two days, but in the process we learned an incredible amount about how volcanologists do field work (and what a donkey willand will notcarry on its back). Our second deployment focused on collecting data with the goal of making the geophysicists happy with the fidelity of the instrument. The third was largely driven by CS goals, but with an eye toward meeting the scientists' data requirements. Writing joint grant proposals can also help to get everyone on the same page.
Lesson #3: Don't forget about the base station! The base station is a critical component of any sensor network architecture: it is responsible for coordinating the network's operation, monitoring its activity, and collecting the sensor data itself. Yet it often gets short shrift, perhaps because of the false impression the base station code will be easy to write or that it is uninteresting.
The vast majority of our development efforts focused on the sensor node software, which is fairly complex and uses nonstandard programming languages and tools. The base station, in our case a laptop located at the volcano observatory, was mostly an afterthought: some slapped-together Perl scripts and a monolithic Java program acting as a combined network controller, data logger, monitor, and GUI. The base station code underwent a major overhaul in the first two days after arriving in the field, mostly to add features (such as logging) that we didn't anticipate needing during our lab testing. We paid for the slapdash nature of the base station software. One race condition in the Java code (for which the author takes full credit) led to an 8-hour outage, while everyone was asleep. (We also assumed that the electricity supply at the observatory would be fairly reliable, which turned out not to be true.)
Our redesign for the 2007 Tungurahua deployment involved modularizing the base station code, so that each component can fail independently. One program communicates with the network; another acts as a GUI; another logs the sensor data; and another runs the algorithm for scheduling downloads. Bugs can be fixed and each of these programs can be restarted at any time without disrupting the other programs.
Scientific discovery is increasingly driven by advances in computing technology, and sensor networks are an important tool to enhance data collection in many scientific domains. Still, there is a gap between the stereotype of a sensor network in the literature and what many scientists need to obtain good field data. Working closely with domain scientists yields tremendous opportunities for furthering a computer science research agenda driven by real-world problems.
2. Liu, T. et al. Implementing software on resource-constrained mobile sensors: Experiences with Impala and ZebraNet. In Proceedings of the Second International Conference on Mobile Systems, Applications, and Services (MobiSYS'04), June 2004.
5. Werner-Allen, G., Dawson-Haggerty, S., and Welsh, M. Lance: Optimizing high-resolution signal collection in wireless sensor networks. In Proceedings of the 6th ACM Conference on Embedded Networked Sensor Systems (SenSys'08), Nov. 2008.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.