Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> getSplits() in TableInputFormatBase


Copy link to this message
-
Re: getSplits() in TableInputFormatBase
You have 1 region per table and thats why you are getting 1 split when you
scan any of those tables...

Moreover, the number of map tasks configuration is ignored when you are
running in pseudo dist mode since the job tracker is local.

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Sun, Apr 11, 2010 at 2:23 AM, john smith <[EMAIL PROTECTED]> wrote:

> Amandeep,
>
> No . I have 3 tables A,B,C ..Does the number of regions 5 include 1 region
> from each META and ROOT also?
>
> I should get numSplits = 3 (total number of user regions) . But I am
> getting
> 1 .
>
> Thanks
>
>
>
>
>
>
>
>
> On Sun, Apr 11, 2010 at 2:40 PM, Amandeep Khurana <[EMAIL PROTECTED]>
> wrote:
>
> > 3 tables? are you counting root and meta also?
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Sun, Apr 11, 2010 at 1:57 AM, john smith <[EMAIL PROTECTED]>
> > wrote:
> >
> > > From the web interface...
> > >
> > >
> > > number of regions =5
> > > number of tables = 3
> > >
> > > Thanks
> > >
> > >
> > > On Sun, Apr 11, 2010 at 2:23 PM, Amandeep Khurana <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > How many regions do you have?
> > > >
> > > >
> > > > Amandeep Khurana
> > > > Computer Science Graduate Student
> > > > University of California, Santa Cruz
> > > >
> > > >
> > > > On Sun, Apr 11, 2010 at 1:39 AM, john smith <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > Amandeep ,
> > > > >
> > > > > Thanks for the explanation . What is the default value to the num
> of
> > > maps
> > > > ?
> > > > > Is it not equal to the num of regions ?
> > > > >
> > > > > Right now I am running HBase in pseudo distributed mode . If I set
> > num
> > > of
> > > > > map tasks to 100000 (some big num)..
> > > > >
> > > > > I get numSplits=1
> > > > >
> > > > > If I dont set any thing .. numSplits =2;
> > > > >
> > > > >
> > > > > Can you explain this.
> > > > >
> > > > > Thanks
> > > > > j.S
> > > > >
> > > > > On Sun, Apr 11, 2010 at 1:50 PM, Amandeep Khurana <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > If you set the number of map tasks as a higher number than the
> > number
> > > > of
> > > > > > regions (I generally set it to 100000 or something like that),
> the
> > > > number
> > > > > > of
> > > > > > splits = number of regions. If you keep it lower, then it
> combines
> > > > > regions
> > > > > > in a single split.
> > > > > >
> > > > > >
> > > > > > Amandeep Khurana
> > > > > > Computer Science Graduate Student
> > > > > > University of California, Santa Cruz
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 11, 2010 at 1:15 AM, john smith <
> > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > >
> > > > > > > Amandeep,
> > > > > > >
> > > > > > > I guess that is not true ,.. See the explanation as in docs ..
> > > > > > >
> > > > > > >
> > > > > > > "Splits are created in number equal to the smallest between
> > > numSplits
> > > > > and
> > > > > > > the number of HRegion<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html
> > > > > > > >s
> > > > > > > in the table. If the number of splits is smaller than the
> number
> > of
> > > > > > > HRegion<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html
> > > > > > > >s
> > > > > > > then splits are spanned across multiple
> > > > > > > HRegion<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/regionserver/HRegion.html
> > > > > > > >s
> > > > > > > and are grouped the most evenly possible. In the case splits
> are
> > > > uneven
> > > > > > the
> > > > > > > bigger splits are placed first in the InputSplit array.  "
> > > > > > >
> > > > > > >
> > > > > > > depending on whether numSplits < (or >)  num of regions .. it
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB