Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Obtain many mappers (or regions)


Copy link to this message
-
Re: Obtain many mappers (or regions)
Just a simple suggestion that will make your life a bit easier...

If your data is relatively small, small enough that you can easily fit the result set in to memory...
You may want to do the following...
Oozie calls your map/reduce job.
At the start of your m/r job, you connect from the client to hbase and read the result set in to a list object. (or something similar). You then write a custom input format class that uses a list object as its input. You can then split the input as you need it.

Much easier than trying to pre split temporary tables and a lot less work and overhead.

This is something that could be part of an indexing solution. ;-P
(meaning that the classes are reusable for other solutions...)

HTH -Mike

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 27, 2011, at 7:46 AM, Florin P <[EMAIL PROTECTED]> wrote:

> Hi!
>  Thank you for your response. As I said, it is a temporary table. This table acts as a metadata for long tasks processing that we would like to trigger from the cluster (as map/reduce jobs) in order that all machines to take some of that tasks.
>  I have read the indicated chapter, and then I have followed the scenario:
>   1.We have loaded the small data into the hbase table
>   2. From the hbase admin interface we triggered the split action
>   3. We have seen that 32 new regions were created for that table
>   4. We have ran a map/reduce job that counts the number of rows
>   5. Only two mappers were created
> What is puzzles me is that only 2 mapper tasks were created, even in the indicated book it is stated that
> (cite)"
> When TableInputFormat, is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table. Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.
> "  
>
> Can you please explain why this is happen? Did we miss some property configuration?
>
> Thank you.
> regards,
>  Florin
> --- On Mon, 6/27/11, Doug Meil <[EMAIL PROTECTED]> wrote:
>
>> From: Doug Meil <[EMAIL PROTECTED]>
>> Subject: RE: Obtain many mappers (or regions)
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Date: Monday, June 27, 2011, 8:01 AM
>> Hi there-
>>
>> If you only have 100 rows I think that HBase might be
>> overkill.
>>
>> You probably want to start with this to get a background on
>> what HBase can do...
>> http://hbase.apache.org/book.html
>> .. there is a section on MapReduce with HBase as well.
>>
>> -----Original Message-----
>> From: Florin P [mailto:[EMAIL PROTECTED]]
>>
>> Sent: Monday, June 27, 2011 4:53 AM
>> To: [EMAIL PROTECTED]
>> Subject: Obtain many mappers (or regions)
>>
>> Hello!
>> I have the following scenario:
>> 1. A temporary HBase table with small number of rows (aprox
>> 100) 2. A cluster with 2 machines that I would like to
>> crunch the data contained in the rows  
>>
>> I would like to create two mappers that will crunch the
>> data from rows.
>> How can I achieve this?
>> A general question is:
>>   how we can obtain many mappers to crunch small data
>> quantity?
>>
>> Thank you.
>>   Regards,
>>   Florin  
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB