att_abstract={The sheer volume of new malware found each day is enormous.
Worse, current trends show the amount of malware is doubling
each year. This exponential growth is quickly outpacing existing
malware analysis techniques. As a result, defenders in practice are
constantly faced with hard choices about which malware to analyze
and how much analysis should be done given the limited computing
resources available.
In this paper we propose efficient techniques for identifying malware
families and variants in order to provide scalable malware
triage capabilities. By providing the capability to quickly compare
and cluster malware into families, we enable defenders can make
more intelligent choices on which subsequent (potentially expensive)
analysis to perform, and on which samples. In particular, we
do not claim 100% accuracy, but instead strive for a balance that
maintains high accuracy for common variants but provides significantly
better scalability than previous approaches.
At the core of our work is an algorithm called BitShred for fast
similarity detection within binary code. We have implemented Bit-
Shred, and show that it is several factors to several orders of magnitude
faster than previous approaches, can take advantage of distributed
resources such as Hadoop, all while offering similar accuracy
at identifying malware families to previous approaches. We
also note that our techniques are applicable in other settings, e.g.,
measuring similarity based upon dynamic traces, and automatic
code reuse detection (such as plagiarism) in binary code.},
	att_copyright_notice={(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Conference on Computer and Communications Security {{, 2011-10-04}}.
	att_tags={malware clustering, security},
	author={Jiyong Jang AND David Brumley AND Shobha Venkataraman},
	institution={{ACM Conference on Computer and Communications Security}},
	title={{BitShred: Feature Hashing Malware for Scalable Triage and Semantic Analysis}},