Get the logs from the farm via flume & syslog, mapreduce them in hive for IP, how often / second, bytes, item and compare with "human" profiles. Get the data on the fly via sqlstream, processes back into Oracle and from there a loadbalancer could get the IPs for a smooth redirect and I process the data into a graphing system (connection from that IP): Hourly I check geolocation, whois, provider. Using pig.latin. Ready for first testing in our labs. And, of course, not a really performant task (yet) ;-)
Hey, I'm Alex. I founded X-Warp, Infinimesh, Infinite Devices, Scalytics and worked with Cloudera, E.On, Google, Evariant, and had the incredible luck to build products with outstanding people in my life, across the globe.