Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Temporal query


Copy link to this message
-
Re: Temporal query
Also check out Hadoop Rumen

On Thu, Mar 29, 2012 at 10:22 AM, Tom Deutsch <[EMAIL PROTECTED]> wrote:

> Matthieu - you are welcome to contact me off list for assistance with Jaql.
>
> ---------------------------------------
> Sent from my Blackberry so please excuse typing and spelling errors.
>
>
> ----- Original Message -----
> From: Robert Evans [[EMAIL PROTECTED]]
> Sent: 03/29/2012 10:09 AM EST
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: Temporal query
>
>
>
> I am not aware of anyone that does this for you directly, but it should
> not be too difficult for you to write what you want using pig or hive.  I
> am not as familiar with Jaql but I assume that you can do it there too.
>  Although it might be simpler to write it using Map/Reduce because we can
> abuse Map/Reduce in ways that the higher level languages disallow so that
> they can do optimizations.
>
> What I would do is in the mapper scan through each entry and look for
> transitions of $value around $threshold, and the time that they occurred.
>  You can then look for 30+ second windows where $value > $threshold within
> that partition and output them to the reducer.  The trick with this is that
> you need to pay special attention to the beginning and end of the
> partition.  You need to also send to the reducer the state at the beginning
> and end of each partition and how long it was in that state.  The reducer
> can then combine these pieces together and see if they meet the 30+ second
> criteria. If so output them with the rest, otherwise don't.  The known
> times when it is > 30 seconds can be sent to any reducer, so they can have
> any key, but for the transitions to work correctly you need to send them to
> a single reducer, so they should have a very specific key.  You could also
> try to divide them up if you have to scale very very large, but that would
> be rather difficult to get right.
>
> --Bobby Evans
>
>
> On 3/29/12 4:02 AM, "banermatt" <[EMAIL PROTECTED]> wrote:
>
>
>
> Hello,
>
> I'm developping a log file anomaly detection system on an hadoop cluster.
> I'm looking for a way to process query like: "select all values when
> value>threshold for a duration>30 secondes". Do you know a tool which could
> help me to process such a query?
> I documented on the script langages pig, hive and jaql which seem to have
> very similar application. I tried it but I was not be able to do what I
> want.
>
> Thank you in advance,
>
> Matthieu
>
> --
> View this message in context:
> http://old.nabble.com/Temporal-query-tp33544869p33544869.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB