|
|
+
Binesh Gummadi 2012-09-02, 17:53
-
Re: Hive sort by using a single reducerEdward Capriolo 2012-09-02, 20:01
Sort by does not have the single reduce restriction. Not sure which rank
you are using but any one should allow you to sort and rank if the query is written correctly. Our implementation on my github.com/edwardcaprioloallows this. On Sunday, September 2, 2012, Binesh Gummadi <[EMAIL PROTECTED]> wrote: > I am trying to insert data into a table after selecting and sorting by a column. What I really want is order by a column and select the top million rows. I am using Amazon EMR hive cloud to process data. > Here is my query > INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc LIMIT 1000000; > It creates two jobs. First job run rather quickly and second job reducer is running forever as it is running with a single reducer. Here is my question on stackoverflow( http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer ). > According to docs "order by" clause has a limitation of 1 reducer. Does sort by has same limitation? Are there any other ways of solving the above requirement? > ________________________________ > Binesh Gummadi > > +
Binesh Gummadi 2012-09-02, 21:38
+
Ruslan Al-Fakikh 2012-09-04, 18:55
|