Home → News → JPL Creates PDF Archive to Aid Malware Research → Full Text

JPL Creates PDF Archive to Aid Malware Research

By Jet Propulsion Laboratory

June 16, 2023

[article image]

Data scientists at the U.S. National Aeronautics and Space Administration's Jet Propulsion Laboratory (JPL) have compiled 8 million PDF files into an open source archive for enhancing online security.

The corpus is part of the Defense Advanced Research Projects Agency (DARPA) Safe Documents program.

Experts can look through this archive to find information on malware that could be concealed within a file's code to help predict emerging online threats and to augment PDF technology.

The researchers identified the PDFs for inclusion using Common Crawl, a public repository of Web-crawl data, while specialized software re-fetched truncated files.

The approximately 8-terabyte dataset is the largest publicly available corpus of its type.

From Jet Propulsion Laboratory
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA


No entries found