Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Block size of HBase files


Copy link to this message
-
Re: Block size of HBase files
>now have 731 regions (each about ~350 mb !!). I checked the
configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
too !!!

You mentioned the splits at the time of table creation?  How u created the
table?

-Anoop-

On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Thanks for the details. No i haven't run any compaction or i have no idea
> if there is one going on in background. I executed a major_compact on that
> table  and i now have 731 regions (each about ~350 mb !!). I checked the
> configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
> too !!!
>
> I am not trying to access HFiles in my MR job, infact i am just using a PIG
> script which handles this. This number (731) is close to my number of map
> tasks, which makes sense. But how can i decrease this, shouldn't the size
> of each region be 1 GB with that configuration value ?
>
>
> On 13 May 2013 18:36, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > You can change HFile size through hbase.hregion.max.filesize parameter.
> >
> > On May 13, 2013, at 2:45 AM, Praveen Bysani <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > I wanted to minimize on the number of map reduce tasks generated while
> > > processing a job, hence configured it to a larger value.
> > >
> > > I don't think i have configured HFile size in the cluster. I use
> Cloudera
> > > Manager to mange my cluster, and the only configuration i can relate
> > > to is hfile.block.cache.size
> > > which is set to 0.25. How do i change the HFile size ?
> > >
> > > On 13 May 2013 15:03, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> > >
> > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani <
> > [EMAIL PROTECTED]
> > >>> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I have the dfs.block.size value set to 1 GB in my cluster
> > configuration.
> > >>
> > >>
> > >> Just out of curiosity - why do you have it set at 1GB?
> > >>
> > >>
> > >>> I
> > >>> have around 250 GB of data stored in hbase over this cluster. But
> when
> > i
> > >>> check the number of blocks, it doesn't correspond to the block size
> > >> value i
> > >>> set. From what i understand i should only have ~250 blocks. But
> instead
> > >>> when i did a fsck on the /hbase/<table-name>, i got the following
> > >>>
> > >>> Status: HEALTHY
> > >>> Total size:    265727504820 B
> > >>> Total dirs:    1682
> > >>> Total files:   1459
> > >>> Total blocks (validated):      1459 (avg. block size 182129886 B)
> > >>> Minimally replicated blocks:   1459 (100.0 %)
> > >>> Over-replicated blocks:        0 (0.0 %)
> > >>> Under-replicated blocks:       0 (0.0 %)
> > >>> Mis-replicated blocks:         0 (0.0 %)
> > >>> Default replication factor:    3
> > >>> Average block replication:     3.0
> > >>> Corrupt blocks:                0
> > >>> Missing replicas:              0 (0.0 %)
> > >>> Number of data-nodes:          5
> > >>> Number of racks:               1
> > >>>
> > >>> Are there any other configuration parameters that need to be set ?
> > >>
> > >>
> > >> What is your HFile size set to? The HFiles that get persisted would be
> > >> bound by that number. Thereafter each HFile would be split into
> blocks,
> > the
> > >> size of which you configure using the dfs.block.size configuration
> > >> parameter.
> > >>
> > >>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Praveen Bysani
> > >>> http://www.praveenbysani.com
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Praveen Bysani
> > > http://www.praveenbysani.com
> >
>
>
>
> --
> Regards,
> Praveen Bysani
> http://www.praveenbysani.com
>