Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> NEED HELP in Hive Query


+
yogesh dhari 2012-10-14, 14:54
+
chyi-kwei yau 2012-10-14, 15:31
+
yogesh dhari 2012-10-14, 17:03
+
chyi-kwei yau 2012-10-14, 17:24
+
yogesh dhari 2012-10-14, 17:57
Copy link to this message
-
Re: NEED HELP in Hive Query
B = group A by ( name, date, url);
-- B now has 2 fields: "group" which is a tuple of (name, date, url)
and "A" which is a collection of tuples from A with the same
name-date-url
-- try "illustrate B" or "describe B" to see what that looks like

counts = foreach B generate flatten(group) as (name, date, url),
COUNT_STAR(A) as num_entries;

Dmitriy

On Sun, Oct 14, 2012 at 10:57 AM, yogesh dhari <[EMAIL PROTECTED]> wrote:
>
> Thanks Chyikwei :-)
>
> I got it now :-), Is there be another method without using flatten(A.name) and so on ?
>
> A = load '/File/000000_0' using PigStorage('\u0001')
>
>
>        as (name, date, url, hit:INT);
>
>
>
>
>
> B = group A by ( name, date, url);
>
>
>
>
>
>  C = foreach B generate flatten(A.name), flatten(A.date), flatten(A.url), SUM(A.hit) ;
>
>
>
>
>
> D = distinct C;
>
>
>
>
>
> Dump D;
>
> Thanks & Regards
> Yogesh Kumar Dhari
>
>> Date: Sun, 14 Oct 2012 13:24:27 -0400
>> Subject: Re: NEED HELP in Hive Query
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>> Hi yogesh,
>>
>> Thes result of "group by" should look like:
>> {group: (group keys),  { (instance1) , (instance2)  } }
>>
>> For example:
>>
>> If A looks like:
>> A: {name: chararray,age: int,gpa: float}
>>
>> And after  "B = GROUP A BY age;"
>>
>> B will become:
>> B: {group: int, A: {name: chararray,age: int,gpa: float}}
>>
>> Then you can use
>> FOREACH B Generate.....
>> To get the result you want.
>>
>> If my explaination is not clear, just take a look at
>> http://pig.apache.org/docs/r0.10.0/basic.html#GROUP
>>
>> Hope this help.
>>
>> Best,
>> Chyi-Kwei
>>
>> On Sun, Oct 14, 2012 at 1:03 PM, yogesh dhari <[EMAIL PROTECTED]> wrote:
>> >
>> > Hi CHyi-kwei,
>> >
>> > Thanks for help, I think I wasn't able to clarify my question
>> >
>> > The query you wrote
>> >
>> > It will count the number of occurrence of same NAME, DATE and URL but won't add all hitcount under same name, date, url.
>> >
>> > I want result like this
>> >
>> > like :  timesascent.in,     2008-08-27,      http://timesascent.in/      (/*addition of
>> > all hitcount under same name, date, url    (37+17+17+27+....)*/  98 )
>> >           timesascent.in,       2008-08-27,       http://timesascent.in/section/2/Interviews    (/*addition of
>> > all hitcount under same name, date, url    (15+14)*/  29)
>> >           .
>> >           .
>> >           .
>> >
>> > From this file below
>> >
>> >       NAME                                 DATE                               URL                                                                  HITCOUNT
>> > timesascent.in    2008-08-27    http://timesascent.in/index.aspx?page=tparchives    15
>> > timesascent.in    2008-08-27
>> > http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26
>> >     20
>> > timesascent.in    2008-08-27    http://timesascent.in/    37
>> > timesascent.in    2008-08-27    http://timesascent.in/section/39/Job%20Wise    14
>> > timesascent.in    2008-08-27
>> > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html
>> >     20
>> > timesascent.in    2008-08-27    http://timesascent.in/    17
>> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    15
>> > timesascent.in    2008-08-27    http://timesascent.in/    17
>> > timesascent.in    2008-08-27    http://timesascent.in/    27
>> > timesascent.in    2008-08-27    http://timesascent.in/    37
>> > timesascent.in    2008-08-27    http://timesascent.in/    27
>> > timesascent.in    2008-08-27    http://www.timesascent.in/    16
>> > timesascent.in    2008-08-27    http://timesascent.in/section/2/Interviews    14
>> > timesascent.in    2008-08-27    http://timesascent.in/    14
>> > timesascent.in    2008-08-27    http://timesascent.in/    22
>> >
>> >
>> > Please help and suggest how to write query for this in HIVE and  PIG
>> >
>> > Thanks & Regards
>> > Yogesh Kumar Dhari
>> >
>> >> Date: Sun, 14 Oct 2012 11:31:00 -0400
+
chyi-kwei yau 2012-10-14, 20:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB