Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Temporal query


Copy link to this message
-
Re: Temporal query
I am not aware of anyone that does this for you directly, but it should not be too difficult for you to write what you want using pig or hive.  I am not as familiar with Jaql but I assume that you can do it there too.  Although it might be simpler to write it using Map/Reduce because we can abuse Map/Reduce in ways that the higher level languages disallow so that they can do optimizations.

What I would do is in the mapper scan through each entry and look for transitions of $value around $threshold, and the time that they occurred.  You can then look for 30+ second windows where $value > $threshold within that partition and output them to the reducer.  The trick with this is that you need to pay special attention to the beginning and end of the partition.  You need to also send to the reducer the state at the beginning and end of each partition and how long it was in that state.  The reducer can then combine these pieces together and see if they meet the 30+ second criteria. If so output them with the rest, otherwise don't.  The known times when it is > 30 seconds can be sent to any reducer, so they can have any key, but for the transitions to work correctly you need to send them to a single reducer, so they should have a very specific key.  You could also try to divide them up if you have to scale very very large, but that would be rather difficult to get right.

--Bobby Evans
On 3/29/12 4:02 AM, "banermatt" <[EMAIL PROTECTED]> wrote:

Hello,

I'm developping a log file anomaly detection system on an hadoop cluster.
I'm looking for a way to process query like: "select all values when
value>threshold for a duration>30 secondes". Do you know a tool which could
help me to process such a query?
I documented on the script langages pig, hive and jaql which seem to have
very similar application. I tried it but I was not be able to do what I
want.

Thank you in advance,

Matthieu

--
View this message in context: http://old.nabble.com/Temporal-query-tp33544869p33544869.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB