CNNs Catch Animals in the Wild

Deep Meerkat identifies a fleeting image of a bird in the wild. — A growing number of researchers are harnessing computer vision techniques to identify the few frames of interest among massive amounts of visual information.

Ecologists rely on observations in the wild to better understand animal behavior and monitor endangered species. However, it is not an easy task. Many animal habitats are difficult to access, making it expensive and logistically challenging to travel to them and spend time there.

Furthermore, some animals can be secretive, while others cover huge distances every day, sometimes on rough terrain. "It's just not possible to get the kind of data we're interested in at the scale we want by just walking around and looking for them," says Ben Weinstein, a postdoctoral fellow at the University of Florida.

For decades, camera traps have taken the place of human observers. They incorporate stationary cameras set up in multiple locations, typically activated when motion sensors detect something entering the field of view. Although such systems have allowed researchers to have more eyes on the ground, analyzing the many resulting images has been painstaking work, since it is still largely performed by humans.

Now, Weinstein is one of a growing number of researchers that are harnessing computer vision techniques to help. Weinstein realized there was a simple use for deep learning via convolutional neural networks (CNNs) while he was analyzing images captured during his doctoral research. Since he was trying to determine which species of hummingbirds visit certain plants, cameras were set up in front of the flowers and took a photo every second for up to 5 days. However, he found that only 1% of the images he'd captured had a hummingbird in it.

"The advantage of using a standard convolutional neural network is that we can quickly screen out the vast majority of frames which are empty," says Weinstein.

The tool he built, called Deep Meerkat, uses a CNN to differentiate between moving objects in the foreground, which are typically of interest, and those in the background, which are not. Weinstein fine-tuned an existing model that had already been trained on an image set to avoid having to train a new one from scratch, then incorporated the CNN into a less complex system he had previously built that uses background subtraction to screen images for new objects of interest.

Even with low frame rates and poor image quality, the system was able to correctly identify 95.7% of the frames featuring hummingbirds, while ignoring 76.1% of frames with other background motion. "It was just a simple application that demonstrated the power of this tool," says Weinstein. "Before, people were manually watching multiple days of footage at very high speeds, looking for incredibly fast hummingbirds, so they might miss one."

Still, ecologists often need more detailed information about animals caught on camera, such as the ability to identify members of different species. Deep Meerkat, for instance, only recognizes differences between frames, and has no knowledge of what a hummingbird or other animal looks like.

Other deep learning systems are now up to the task, when trained on large image data sets. One of the problems users face, however, is that these systems are better at recognizing common species, since there are usually more pictures available of them in training data. Conversely, ecologists are typically more interested in spotting new or rare species, than in identifying common ones.

At least 50 images of an animal are typically needed for a system to learn to identify a species reasonably well, with about 75% accuracy, says Sara Beery, a doctoral student at the California Institute of Technology, and a research intern at Google; if a species is easy to differentiate, 20 images may be all that's needed, she adds. In the iNaturalist dataset, for example, a widely-used collection of 859,000 images from the natural world of over 5,000 plant and animal species, many species don't meet these requirements. "Below that cut-off, you just basically can't guarantee anything," she says.

One way to compensate for a lack of images is to use synthetic data. Three-dimensional models of animals can be created using a game engine, and they may be used to train deep learning systems. In one study, computer vision researchers used this technique to help conservationists study Grevy's zebra, one of the most endangered species in Africa. Using the few available images of the animals, the researchers were able to add realistic fur patterns to their models, which helped optimize them. The resulting three-dimensional (3D) renderings captured their poses, which are of interest to researchers, since they provide information about the animals' health and behavior.

In many cases, Beery and her colleagues found that simpler types of simulated data can be just as effective. A technique called splatting, which involves cutting out animals from just a few images and pasting them on empty camera trap backgrounds, can work just as well to boost the amount of training data. In a recent paper, they showed that synthetic data improves the performance of algorithms for rare species, especially when there are lots of varied examples.

Zero-shot learning is another approach that can help identify rare animals by relying on a comparison of attributes from different species. A dataset called Animals with Attributes 2, for example, consists of images of 50 types of animals whose key features are extracted. By looking for shared and new attributes, trained CNNs can then attempt to classify previously unseen animals.

However, the technique is still not accurate enough to be trusted on its own, according to Beery. She thinks that humans need to be part of the process and are still needed for all machine learning approaches at the moment. The goal now is to use humans efficiently. "I don't think [humans] need to look at every single image anymore," she says. "It's a case of treating humans as a valuable resource, and using as little of their time as possible and building networks that are as accurate as possible."

Both Beery and Weinstein aim to create tools that can be shared more widely among ecologists. Deep Meerkat, for example, can be downloaded online, and Weinstein says he sees three to five downloads a week for a variety of projects. "One day it's great white sharks in South Africa, another day it's butterflies in Cameroon," he says.

By working with Microsoft AI for Earth and Google Research, Beery is able to apply her research to create open source tools. For example, she found that adding bounding boxes around animals in training images helps machine learning systems detect animals, rather than finding other correlations that don't generalize to images from new locations. Funding from Microsoft allowed her and her team to gather large amounts of data from different organizations, that then was annotated with these boxes. The data was used to train a CNN, which is now available online for biologists to use.

Beery is keen to continue developing models that work well globally. In recent work, she found that using contextual information from other images captured at the same location boosts CNN performance when analyzing frames where animals are too close or too far from the camera or image quality is poor, for example. By training a model with lots of different data from one location, such as camera trap images, weather information, and satellite data from Kenya, Beery hopes to determine how it can be leveraged to perform well in a new location.

"There's no way that I'm going to have training data covering the entire globe, so how do we take the models that we have and get them to work as well as possible at a global scale," asks Beery. "I'm really passionate about that."

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.