att_abstract={{Identifying real-world business communities, e.g., energy, finance, defense, in Internet traffic is a challenging problem but would be valuable for the construction of better intrusion detection techniques, for example. Seed-based community detection identifies a community in a graph by iteratively adding the ‘closest’ vertices to an initial set of seed-vertices which are known to belong to the community. Previous research focused on unambiguous networks, where edges describe a specific intention in a fixed domain (e.g., a ‘friend’ in a social network) and tightly-knit communities whose members are better connected to each other (‘close’) than to the rest of the network. However, looking at a complete day of raw Internet traffic, we found that (1) the intend of a communication is ambiguous (e.g., ad-downloads are indistinguishable from web-page downloads) and (2) real-world industries manifest themselves as loosely-coupled communities, i.e., with more edges to non-community members than to community members. We present a new seed-based community detection algorithm that provides higher precision and recall in our setting than the related work. We show that this enables the detection of loosely-knit communities using three sample industries. For instance, our solution detected 111 individual energy companies with only 6 false positives, starting from eight ISOs (Independent System Operators) and RTO (Regional Transmission Operators) in the US.}},
	att_copyright_notice={{This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2014. {{, 2014-08-11}}
	att_tags={community detection},
	author={Stefan Weigert and Matti Hiltunen and Christof Fetzer},
	institution={{The 2014 IEEE/WIC/ACM International Conference on Web Intelligence}},
	title={{Finding the Needle in the Haystack: Identifying Business Communities in Internet Traffic}},