Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive sort by using a single reducer


Copy link to this message
-
Re: Hive sort by using a single reducer
Sort by does not have the single reduce restriction. Not sure which rank
you are using but any one should allow you to sort and rank if the query is
written correctly. Our implementation on my
github.com/edwardcaprioloallows this.
On Sunday, September 2, 2012, Binesh Gummadi <[EMAIL PROTECTED]>
wrote:
> I am trying to insert data into a table after selecting and sorting by a
column. What I really want is order by a column and select the top million
rows. I am using Amazon EMR hive cloud to process data.
> Here is my query
> INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc
LIMIT 1000000;
> It creates two jobs. First job run rather quickly and second job reducer
is running forever as it is running with a single reducer. Here is my
question on stackoverflow(
http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer
).
> According to docs "order by" clause has a limitation of 1 reducer. Does
sort by has same limitation? Are there any other ways of solving the above
requirement?
> ________________________________
> Binesh Gummadi
>
>