att_abstract={{Analyzing huge amounts of log-data is often a difficult task, especially if it has to be done in real-time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. A data structure which is often used in log analysis are graphs. Examples are clique analysis and communities of interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency. In this paper, we present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10ms. We validate our approach by presenting two example applications, namely telephony fraud detection and internet attack detection. A thorough evaluation proves the scalability and near real-time properties of our system.}},
	att_categories={C_NSS.3, C_IIS.2, C_NSS.4},
	att_copyright_notice={{(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in SLAML: Workshop on Managing Large-Scale Systems via the Analysis of System Logs and the Application{{, 2011-10-23}}.
	att_tags={COI, data analysis, log analysis, netflow},
	author={Stefan Weigert and Matti Hiltunen and Christof Fetzer},
	institution={{SLAML: Workshop on Managing Large-Scale Systems via the Analysis of System Logs and the Application }},
	title={{Mining large distributed log-data in near real-time}},