Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> hbase puts in map tasks don't seem to run in parallel


+
Jonathan Bishop 2012-06-03, 01:25
+
Joep Rottinghuis 2012-06-03, 19:02
+
Jonathan Bishop 2012-06-03, 22:45
Copy link to this message
-
Re: hbase puts in map tasks don't seem to run in parallel
This is probably more of an [EMAIL PROTECTED] topic than common-user.

To answer your question, you will want to pre-split the table like so: http://hbase.apache.org/book/perf.writing.html

Cheers,

Joep

Sent from my iPhone

On Jun 3, 2012, at 3:45 PM, Jonathan Bishop <[EMAIL PROTECTED]> wrote:

> Thanks Joep,
>
> My table is empty when I start and will consist of 18M rows when completed
>
> So I guess I need to understand how to pick row keys such that the regions
> will be on that mappers node. Any advice would be appreciated.
>
> BTW, I do notice that the region servers of other nodes become busy, but
> only after a large number of rows have been processed - say 10%. It would
> be better if I could deliberately control which regions/regionserver were
> going to be used though, to prevent the network traffic of sending rows to
> regionservers on other nodes.
>
> Jon
>
> On Sun, Jun 3, 2012 at 12:02 PM, Joep Rottinghuis <[EMAIL PROTECTED]>wrote:
>
>> How large is your table?
>> If it is newly created and still almost empty then it will probably
>> consist of only one region, which will be hosted on one region server.
>>
>> Even as the table grows and gets split into multiple regions, you will
>> have to split your mappers in such a way that each writes to the key ranges
>> corresponding to the regions hosted locally on the corresponding region
>> sever.
>>
>> Cheers,
>>
>> Joep
>>
>> Sent from my iPhone
>>
>> On Jun 2, 2012, at 6:25 PM, Jonathan Bishop <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> I am new to hadoop and hbase, but have spent the last few weeks learning
>> as
>>> much as I can...
>>>
>>> I am attempting to create an hbase table during a hadoop job by simply
>>> doing puts to a table from each map task. I am hoping that each map task
>>> will use the regionserver on its node so that all 10 of my nodes are
>>> putting values into the table at the same time.
>>>
>>> Here is my map class below. The Node class is a simple data structure
>> which
>>> knows how to parse a line of input and create a Put for hbase.
>>>
>>> When I run this I see that only one region server is active for the
>> table I
>>> am creating. I know that my input file is split among all 10 of my data
>>> nodes, and I know that if I do not do puts to the hbase table everything
>>> runs in a parallel on all 10 machines. It is only when I start doing
>> hbase
>>> puts that the run times go way up.
>>>
>>> Thanks,
>>>
>>> Jon
>>>
>>> public static class MapClass extends Mapper<Object, Text, IntWritable,
>>> Node> {
>>> HTableInterface table = null;
>>> @Override
>>> protected void setup(Context context) throws IOException,
>>> InterruptedException {
>>> String tableName = context.getConfiguration().get(TABLE);
>>> table = new HTable(tableName);
>>> }
>>> @Override
>>> public void map(Object key, Text value, Context context) throws
>>> IOException, InterruptedException {
>>> Node node = null;
>>> try {
>>> node = Node.parseNode(value.toString());
>>> } catch (ParseException e) {
>>> throw new IOException();
>>> }
>>> Put put = node.getPut();
>>> table.put(put);
>>> }
>>> @Override
>>> protected void cleanup(Context context) throws IOException,
>>> InterruptedException {
>>> table.close();
>>> }
>>> }
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB