Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - complex datatypes filling


Copy link to this message
-
Re: complex datatypes filling
Stephen Sprague 2014-01-17, 16:54
@OP, that i don't know.

personally, i'd keep it simple and just run your three jobs independently
(ie concurrently) and let the o/s or h/w do any caching that's possible.

That's just my 2 cents though.
On Fri, Jan 17, 2014 at 2:06 AM, Lefty Leverenz <[EMAIL PROTECTED]>wrote:

> Here's the wikidoc for transform:  Transform/Map-Reduce Syntax<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform>
> .
>
> -- Lefty
>
>
> On Thu, Jan 16, 2014 at 10:44 PM, Bogala, Chandra Reddy <
> [EMAIL PROTECTED]> wrote:
>
>> Thanks for quick reply. I will take a look at stream job and transform
>> functions.
>>
>> One more question:
>>
>> I have multiple csv files ( same structure, dir added as partition)
>> mapped to hive table. Then I run different group by jobs on same data like
>> below. All these are spanned as different jobs. So multiple mappers
>> read/fetch data from disk and then computes different group/aggregation
>> jobs.
>>
>> Each below job fetch same data from disk. Can this be avoided by reading
>> split only once and mapper computing different group by jobs in same mapper
>> itself. That may no of mappers will come down drastically and also mainly
>> multiple disk seeks for same data avoided. Do I need to write custom map
>> reduce job to do this?
>>
>>
>>
>> 1)      Insert into temptable1 select TAG,col2,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,TS
>>
>>
>>
>> 2)      Insert into temptable2 select TAG,col2,col3,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,col3,TS
>>
>>
>>
>> 3)      Insert into temptable3 select TAG,col2,col3,col4,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,col3,col4,TS
>>
>>
>>
>> Thanks,
>>
>> Chandra
>>
>>
>>
>> *From:* Stephen Sprague [mailto:[EMAIL PROTECTED]]
>> *Sent:* Friday, January 17, 2014 11:39 AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: complex datatypes filling
>>
>>
>>
>> remember you can always setup a stream job to do any wild and crazy
>> custom thing you want. see the tranform() function documentation.  Its
>> really quite easy. honest.
>>
>>
>>
>> On Thu, Jan 16, 2014 at 9:39 PM, Bogala, Chandra Reddy <
>> [EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>   I found lot of examples to map json data into hive complex data types
>> (map, array , struct etc). But I don’t see anywhere filling complex data
>> types with nested sql  query ( I.e group by few columns(key) and array of
>> struct(multiple columns) containing  result values ).
>>
>> So that it will be easy for me to map back into embedded/nested json
>> document.
>>
>>
>>
>> Thanks,
>>
>> Chandra
>>
>>
>>
>
>