|
|
-
How to use the reduce result in next code part
Liu, Keyan 2012-05-28, 14:49
Hi All,
I am using Mapreduce to scan HBase region to get the rowkey_list that related with one query. In Map period, the mapper outputs partial rowkey_list. In reduce period, the reducer will collect and sort all rowkey. If I need to use rowkey_list result of reduce, how can transport the rowkey_list outside reduce?
I have tried to write one reduce output to HDFS "/part-r-00000", but I found the efficiency is too low. How can I use the reduce result in next code part? Is there one API or example that can be used? Thanks.
Regards, William Liu
+
Liu, Keyan 2012-05-28, 14:49
-
How to use the reduce result in next code part
Liu, Keyan 2012-06-11, 03:09
Hi All,
I am using Mapreduce to scan HBase region to get the rowkey_list that related with one query. In Map period, each mapper outputs partial rowkey_list. In reduce period, the reducer will collect and sort all rowkey. If I need to use rowkey_list result of the reduce, how can transport the rowkey_list outside reduce?
I have tried to write one reduce output to HDFS "/part-r-00000", then read the result in HDFS, but I found the efficiency is too low. How can I use the reduce result in next code part? Is there one API or example that can be used? Thanks.
Regards, William Liu
+
Liu, Keyan 2012-06-11, 03:09
-
Re: How to use the reduce result in next code part
Jagat Singh 2012-06-11, 03:24
Hello,
Just do a quick search for chain map reduce you would get example.
Regards,
Jagat Singh
----------- Sent from Mobile , short and crisp. On 11-Jun-2012 8:40 AM, "Liu, Keyan (NSN - CN/Beijing)" <[EMAIL PROTECTED]> wrote:
> ** > > Hi All, > > I am using Mapreduce to scan HBase region to get the rowkey_list that > related with one query. > > In Map period, each mapper outputs partial rowkey_list. In reduce period, > the reducer will collect and sort all rowkey. > > If I need to use rowkey_list result of the reduce, how can transport the > rowkey_list outside reduce? > > I have tried to write one reduce output to HDFS “/part-r-00000”, then > read the result in HDFS, but I found the efficiency is too low. > > How can I use the reduce result in next code part? Is there one API or > example that can be used? > > Thanks. > > Regards, > > William Liu >
+
Jagat Singh 2012-06-11, 03:24
-
Re: How to use the reduce result in next code part
Harsh J 2012-06-11, 03:22
Hi,
Can your rowkey_list requiring logic not be implemented within a single reduce(key, <List> values) call itself? If you require the whole list before processing, and the whole lists may be small, then collecting their cloned copies in memory is also one way out.
On Mon, Jun 11, 2012 at 8:39 AM, Liu, Keyan (NSN - CN/Beijing) <[EMAIL PROTECTED]> wrote: > Hi All, > > I am using Mapreduce to scan HBase region to get the rowkey_list that > related with one query. > > In Map period, each mapper outputs partial rowkey_list. In reduce period, > the reducer will collect and sort all rowkey. > > If I need to use rowkey_list result of the reduce, how can transport the > rowkey_list outside reduce? > > I have tried to write one reduce output to HDFS “/part-r-00000”, then read > the result in HDFS, but I found the efficiency is too low. > > How can I use the reduce result in next code part? Is there one API or > example that can be used? > > Thanks. > > Regards, > > William Liu
-- Harsh J
+
Harsh J 2012-06-11, 03:22
-
RE: How to use the reduce result in next code part
Liu, Keyan 2012-06-11, 03:35
Hi,
The reduce is to aggregate multiple rowkey_list to be one. The complete rowkey_list is sorted by Reducer. In the next code part, I would like to use the complete roweky_list. However context cannot be used/passed.
How to collect their cloned copies in memory?
Thanks and regards,
-----Original Message----- From: ext Harsh J [mailto:[EMAIL PROTECTED]] Sent: Monday, June 11, 2012 11:22 AM To: [EMAIL PROTECTED] Subject: Re: How to use the reduce result in next code part
Hi,
Can your rowkey_list requiring logic not be implemented within a single reduce(key, <List> values) call itself? If you require the whole list before processing, and the whole lists may be small, then collecting their cloned copies in memory is also one way out.
On Mon, Jun 11, 2012 at 8:39 AM, Liu, Keyan (NSN - CN/Beijing) <[EMAIL PROTECTED]> wrote: > Hi All, > > I am using Mapreduce to scan HBase region to get the rowkey_list that > related with one query. > > In Map period, each mapper outputs partial rowkey_list. In reduce period, > the reducer will collect and sort all rowkey. > > If I need to use rowkey_list result of the reduce, how can transport the > rowkey_list outside reduce? > > I have tried to write one reduce output to HDFS "/part-r-00000", then read > the result in HDFS, but I found the efficiency is too low. > > How can I use the reduce result in next code part? Is there one API or > example that can be used? > > Thanks. > > Regards, > > William Liu
-- Harsh J
+
Liu, Keyan 2012-06-11, 03:35
|
|