Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> complex datatypes filling


Copy link to this message
-
Re: complex datatypes filling
@OP, that i don't know.

personally, i'd keep it simple and just run your three jobs independently
(ie concurrently) and let the o/s or h/w do any caching that's possible.

That's just my 2 cents though.
On Fri, Jan 17, 2014 at 2:06 AM, Lefty Leverenz <[EMAIL PROTECTED]>wrote:

> Here's the wikidoc for transform:  Transform/Map-Reduce Syntax<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform>
> .
>
> -- Lefty
>
>
> On Thu, Jan 16, 2014 at 10:44 PM, Bogala, Chandra Reddy <
> [EMAIL PROTECTED]> wrote:
>
>> Thanks for quick reply. I will take a look at stream job and transform
>> functions.
>>
>> One more question:
>>
>> I have multiple csv files ( same structure, dir added as partition)
>> mapped to hive table. Then I run different group by jobs on same data like
>> below. All these are spanned as different jobs. So multiple mappers
>> read/fetch data from disk and then computes different group/aggregation
>> jobs.
>>
>> Each below job fetch same data from disk. Can this be avoided by reading
>> split only once and mapper computing different group by jobs in same mapper
>> itself. That may no of mappers will come down drastically and also mainly
>> multiple disk seeks for same data avoided. Do I need to write custom map
>> reduce job to do this?
>>
>>
>>
>> 1)      Insert into temptable1 select TAG,col2,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,TS
>>
>>
>>
>> 2)      Insert into temptable2 select TAG,col2,col3,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,col3,TS
>>
>>
>>
>> 3)      Insert into temptable3 select TAG,col2,col3,col4,SUM(col5) as
>> SUM_col5,SUM(col6) as SUM_col6,SUM(col7) as SUM_col7,ts  from
>> raw_data_by_epoch where ts=${hivevar:collectiontimestamp} group by
>> TAG,col2,col3,col4,TS
>>
>>
>>
>> Thanks,
>>
>> Chandra
>>
>>
>>
>> *From:* Stephen Sprague [mailto:[EMAIL PROTECTED]]
>> *Sent:* Friday, January 17, 2014 11:39 AM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: complex datatypes filling
>>
>>
>>
>> remember you can always setup a stream job to do any wild and crazy
>> custom thing you want. see the tranform() function documentation.  Its
>> really quite easy. honest.
>>
>>
>>
>> On Thu, Jan 16, 2014 at 9:39 PM, Bogala, Chandra Reddy <
>> [EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>>   I found lot of examples to map json data into hive complex data types
>> (map, array , struct etc). But I don’t see anywhere filling complex data
>> types with nested sql  query ( I.e group by few columns(key) and array of
>> struct(multiple columns) containing  result values ).
>>
>> So that it will be easy for me to map back into embedded/nested json
>> document.
>>
>>
>>
>> Thanks,
>>
>> Chandra
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB