Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive sort by using a single reducer


Copy link to this message
-
Re: Hive sort by using a single reducer
Hi

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-DifferencebetweenSortByandOrderBy
Sort By will give you only partially sorted results if you have more
than one reducer

Ruslan

On Mon, Sep 3, 2012 at 1:38 AM, Binesh Gummadi <[EMAIL PROTECTED]> wrote:
> Thanks for your quick reply. Rank is a column which has integer data. I am
> writing to dynamoDB database tho. Not sure why only a single reducer is used
> tho. I will check sql with explain command again and will report my
> findings. I will check your implementation too.
>
> ________________________________
> Binesh Gummadi
>
>
>
>
> On Sun, Sep 2, 2012 at 4:01 PM, Edward Capriolo <[EMAIL PROTECTED]>
> wrote:
>>
>>
>> Sort by does not have the single reduce restriction. Not sure which rank
>> you are using but any one should allow you to sort and rank if the query is
>> written correctly. Our implementation on my github.com/edwardcapriolo allows
>> this.
>>
>> On Sunday, September 2, 2012, Binesh Gummadi <[EMAIL PROTECTED]>
>> wrote:
>> > I am trying to insert data into a table after selecting and sorting by a
>> > column. What I really want is order by a column and select the top million
>> > rows. I am using Amazon EMR hive cloud to process data.
>> > Here is my query
>> > INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc
>> > LIMIT 1000000;
>> > It creates two jobs. First job run rather quickly and second job reducer
>> > is running forever as it is running with a single reducer. Here is my
>> > question on
>> > stackoverflow(http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer).
>> > According to docs "order by" clause has a limitation of 1 reducer. Does
>> > sort by has same limitation? Are there any other ways of solving the above
>> > requirement?
>> > ________________________________
>> > Binesh Gummadi
>> >
>> >
>
>

--
Best Regards,
Ruslan Al-Fakikh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB