Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Temporal query


Copy link to this message
-
Re: Temporal query
Ravi Prakash 2012-03-29, 16:01
Also check out Hadoop Rumen

On Thu, Mar 29, 2012 at 10:22 AM, Tom Deutsch <[EMAIL PROTECTED]> wrote:

> Matthieu - you are welcome to contact me off list for assistance with Jaql.
>
> ---------------------------------------
> Sent from my Blackberry so please excuse typing and spelling errors.
>
>
> ----- Original Message -----
> From: Robert Evans [[EMAIL PROTECTED]]
> Sent: 03/29/2012 10:09 AM EST
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: Temporal query
>
>
>
> I am not aware of anyone that does this for you directly, but it should
> not be too difficult for you to write what you want using pig or hive.  I
> am not as familiar with Jaql but I assume that you can do it there too.
>  Although it might be simpler to write it using Map/Reduce because we can
> abuse Map/Reduce in ways that the higher level languages disallow so that
> they can do optimizations.
>
> What I would do is in the mapper scan through each entry and look for
> transitions of $value around $threshold, and the time that they occurred.
>  You can then look for 30+ second windows where $value > $threshold within
> that partition and output them to the reducer.  The trick with this is that
> you need to pay special attention to the beginning and end of the
> partition.  You need to also send to the reducer the state at the beginning
> and end of each partition and how long it was in that state.  The reducer
> can then combine these pieces together and see if they meet the 30+ second
> criteria. If so output them with the rest, otherwise don't.  The known
> times when it is > 30 seconds can be sent to any reducer, so they can have
> any key, but for the transitions to work correctly you need to send them to
> a single reducer, so they should have a very specific key.  You could also
> try to divide them up if you have to scale very very large, but that would
> be rather difficult to get right.
>
> --Bobby Evans
>
>
> On 3/29/12 4:02 AM, "banermatt" <[EMAIL PROTECTED]> wrote:
>
>
>
> Hello,
>
> I'm developping a log file anomaly detection system on an hadoop cluster.
> I'm looking for a way to process query like: "select all values when
> value>threshold for a duration>30 secondes". Do you know a tool which could
> help me to process such a query?
> I documented on the script langages pig, hive and jaql which seem to have
> very similar application. I tried it but I was not be able to do what I
> want.
>
> Thank you in advance,
>
> Matthieu
>
> --
> View this message in context:
> http://old.nabble.com/Temporal-query-tp33544869p33544869.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>