Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Results from a Map/Reduce


Copy link to this message
-
RE: Results from a Map/Reduce
Hey Peter,

That System.exit line is nothing important, just the main thread waiting for the tasks to finish before closing.

You're interested in having the MR job return a single result?  To do that, you would need to roll-up the processing done in each of your Map tasks into a single Reduce task.  With one reducer, you can have a single point to do the final aggregation of the result.

I'm not sure exactly what kind of aggregation you are doing but funneling into a single reducer can range from no problem to don't even try it.  Sounds like you just want a final number or something so shouldn't be an issue.

You might also consider doing your aggregations with coprocessors if you're into experimenting on HBase Trunk :)

As for FirstKeyOnlyFilter:

/**
 * A filter that will only return the first KV from each row.
 * <p>
 * This filter can be used to more efficiently perform row count operations.
 */

That's what it does.  If you scan a table, regardless of what you ask for in the query, the filter will just return whatever the first KeyValue is on each row and will skip every other column/version/value of that row except the first.

Like it says, it's generally useful for doing row counting but that's about it.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:[EMAIL PROTECTED]]
> Sent: Friday, December 17, 2010 10:56 AM
> To: [EMAIL PROTECTED]
> Subject: Results from a Map/Reduce
>
> Hi, dumb question again.
>   I have been using a Scan to return a result back to my client which works
> fine except when I am returning a million rows just to aggregate the results.
> The next logical step would be to do the aggregation in a Map/Reduce. I've
> been looking at what samples I could find and they see to all do this...
>
>     System.exit(job.waitForCompletion(true) ? 0 : 1);
>
> My question, is there a way to return a result from the job in a similar way of
> getting a ResultScanner back in iterating through the results?
>
> Also, is there a good definition of what a 'FirstKeyOnlyFilter' does?
>
> Thanks
>
> -Pete
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB