Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Searching for a string off a group by query


+
Tharindu Mathew 2012-07-16, 20:56
+
John Omernik 2012-07-17, 03:20
+
Tharindu Mathew 2012-07-17, 09:32
Copy link to this message
-
Re: Searching for a string off a group by query
Then I think the array_contains is your best bet.  See my example query
below.  I used a subquery (always handy) also I replaced count(activityID)
in your SQL with count(1)  The results returned are exactly the same (at
least in my data set) and for some reason, my cluster seemed to run faster
with count(1) I am not sure why.  That and it didn't make sense to me to
count the field I am grouping on, it's confusing, are you looking for the
count of that grouped field or the count of the number of grouped fields
(obviously the first is returned, but from a "reading" point of view a
person reading/viewing your query may be confused).

Original
select activityId, count(activityId), *find_in_set("CCC",
collect_set(msgBody))* from ActivityStream group by activityId;

Suggested:
select * from (
    select
        activityId,
        count(1) as activityid_count,
        collect_set(msgBody) as msg_bodies
    from
        ActivityStream
    group by activityId
) b where array_contains(msg_bodies, "CCC") > 0
On Tue, Jul 17, 2012 at 4:32 AM, Tharindu Mathew <[EMAIL PROTECTED]>wrote:

> I'm basically trying to search for a string (ex: "CCC") for a set of
> strings (ex: * *collect_set(msgBody)*)* that comes as a result of a group
> by query.
>
>
> On Tue, Jul 17, 2012 at 8:50 AM, John Omernik <[EMAIL PROTECTED]> wrote:
>
>> Not sure what you are trying to do, but you may want to check out the
>> array_contains function. Also, if you are using Hive 9 you can use the
>> concat_ws() function.  This is taken from a google search:
>>
>> select concat_ws(‘.’, array(‘www’,’apache’,’org’)) from src limit 1;
>> www.apache.org
>>
>>
>> https://cwiki.apache.org/Hive/presentations.data/WhatsNewInHive090HadoopSummit2012BoF.pdf
>>
>> On the array_contains:
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-CollectionFunctions
>>
>> boolean array_contains(Array<T>, value)Returns TRUE if the array
>> contains value
>>
>>
>>
>> On Mon, Jul 16, 2012 at 3:56 PM, Tharindu Mathew <[EMAIL PROTECTED]>wrote:
>>
>>> Hi everyone,
>>>
>>> I'd like to do $subject and was approaching it with the following query:
>>>
>>> select activityId, count(activityId), *find_in_set("CCC",
>>> collect_set(msgBody))* from ActivityStream group by activityId;
>>>
>>> But find_in_set doesn't seem to accept arrays. Is there a way to cast
>>> this string array into a string list or a string so I can conduct a search?
>>> Maybe, there's another way to do this.
>>>
>>> Thanks in advance.
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>> blog: http://mackiemathew.com/
>>>
>>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>