Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> hbase puts in map tasks don't seem to run in parallel


Copy link to this message
-
Re: hbase puts in map tasks don't seem to run in parallel
How large is your table?
If it is newly created and still almost empty then it will probably consist of only one region, which will be hosted on one region server.

Even as the table grows and gets split into multiple regions, you will have to split your mappers in such a way that each writes to the key ranges corresponding to the regions hosted locally on the corresponding region sever.

Cheers,

Joep

Sent from my iPhone

On Jun 2, 2012, at 6:25 PM, Jonathan Bishop <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am new to hadoop and hbase, but have spent the last few weeks learning as
> much as I can...
>
> I am attempting to create an hbase table during a hadoop job by simply
> doing puts to a table from each map task. I am hoping that each map task
> will use the regionserver on its node so that all 10 of my nodes are
> putting values into the table at the same time.
>
> Here is my map class below. The Node class is a simple data structure which
> knows how to parse a line of input and create a Put for hbase.
>
> When I run this I see that only one region server is active for the table I
> am creating. I know that my input file is split among all 10 of my data
> nodes, and I know that if I do not do puts to the hbase table everything
> runs in a parallel on all 10 machines. It is only when I start doing hbase
> puts that the run times go way up.
>
> Thanks,
>
> Jon
>
> public static class MapClass extends Mapper<Object, Text, IntWritable,
> Node> {
> HTableInterface table = null;
> @Override
> protected void setup(Context context) throws IOException,
> InterruptedException {
> String tableName = context.getConfiguration().get(TABLE);
> table = new HTable(tableName);
> }
> @Override
> public void map(Object key, Text value, Context context) throws
> IOException, InterruptedException {
> Node node = null;
> try {
> node = Node.parseNode(value.toString());
> } catch (ParseException e) {
> throw new IOException();
> }
> Put put = node.getPut();
> table.put(put);
> }
> @Override
> protected void cleanup(Context context) throws IOException,
> InterruptedException {
> table.close();
> }
> }