A Scalable, Ensemble Approach for Building and Visualizing Deep Code-Sharing Networks Over Millions of Malicious Binaries

by Joshua Saxe
Sept. 19, 2017 0 comments Black Hat belen_caty

In this talk, I propose an answer: an obfuscation-resilient ensemble similarity analysis approach that addresses polymorphism, packing, and obfuscation by estimating code-sharing in multiple static and dynamic technical domains at once, such that it is very difficult for a malware author to defeat all of the estimation functions simultaneously. To make this algorithm scale, we use an approximate feature counting technique and a feature-hashing trick drawn from the machine-learning domain, allowing for the fast feature extraction and fast retrieval of sample "near neighbors" even when handling millions of binaries.

https://www.blackhat.com/us-14/archives.html#a-scalable-ensemble-approach-for-building-and-visual...