Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Results from a Map/Reduce


Copy link to this message
-
RE: Results from a Map/Reduce
Hey Peter,

That System.exit line is nothing important, just the main thread waiting for the tasks to finish before closing.

You're interested in having the MR job return a single result?  To do that, you would need to roll-up the processing done in each of your Map tasks into a single Reduce task.  With one reducer, you can have a single point to do the final aggregation of the result.

I'm not sure exactly what kind of aggregation you are doing but funneling into a single reducer can range from no problem to don't even try it.  Sounds like you just want a final number or something so shouldn't be an issue.

You might also consider doing your aggregations with coprocessors if you're into experimenting on HBase Trunk :)

As for FirstKeyOnlyFilter:

/**
 * A filter that will only return the first KV from each row.
 * <p>
 * This filter can be used to more efficiently perform row count operations.
 */

That's what it does.  If you scan a table, regardless of what you ask for in the query, the filter will just return whatever the first KeyValue is on each row and will skip every other column/version/value of that row except the first.

Like it says, it's generally useful for doing row counting but that's about it.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 10:56 AM
> To: [EMAIL PROTECTED]
> Subject: Results from a Map/Reduce
>
> Hi, dumb question again.
>   I have been using a Scan to return a result back to my client which works
> fine except when I am returning a million rows just to aggregate the results.
> The next logical step would be to do the aggregation in a Map/Reduce. I've
> been looking at what samples I could find and they see to all do this...
>
>     System.exit(job.waitForCompletion(true) ? 0 : 1);
>
> My question, is there a way to return a result from the job in a similar way of
> getting a ResultScanner back in iterating through the results?
>
> Also, is there a good definition of what a 'FirstKeyOnlyFilter' does?
>
> Thanks
>
> -Pete