Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Block size of HBase files


Copy link to this message
-
Re: Block size of HBase files
>now have 731 regions (each about ~350 mb !!). I checked the
configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
too !!!

You mentioned the splits at the time of table creation?  How u created the
table?

-Anoop-

On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Thanks for the details. No i haven't run any compaction or i have no idea
> if there is one going on in background. I executed a major_compact on that
> table  and i now have 731 regions (each about ~350 mb !!). I checked the
> configuration in CM, and the value for hbase.hregion.max.filesize  is 1 GB
> too !!!
>
> I am not trying to access HFiles in my MR job, infact i am just using a PIG
> script which handles this. This number (731) is close to my number of map
> tasks, which makes sense. But how can i decrease this, shouldn't the size
> of each region be 1 GB with that configuration value ?
>
>
> On 13 May 2013 18:36, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > You can change HFile size through hbase.hregion.max.filesize parameter.
> >
> > On May 13, 2013, at 2:45 AM, Praveen Bysani <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > I wanted to minimize on the number of map reduce tasks generated while
> > > processing a job, hence configured it to a larger value.
> > >
> > > I don't think i have configured HFile size in the cluster. I use
> Cloudera
> > > Manager to mange my cluster, and the only configuration i can relate
> > > to is hfile.block.cache.size
> > > which is set to 0.25. How do i change the HFile size ?
> > >
> > > On 13 May 2013 15:03, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> > >
> > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani <
> > [EMAIL PROTECTED]
> > >>> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I have the dfs.block.size value set to 1 GB in my cluster
> > configuration.
> > >>
> > >>
> > >> Just out of curiosity - why do you have it set at 1GB?
> > >>
> > >>
> > >>> I
> > >>> have around 250 GB of data stored in hbase over this cluster. But
> when
> > i
> > >>> check the number of blocks, it doesn't correspond to the block size
> > >> value i
> > >>> set. From what i understand i should only have ~250 blocks. But
> instead
> > >>> when i did a fsck on the /hbase/<table-name>, i got the following
> > >>>
> > >>> Status: HEALTHY
> > >>> Total size:    265727504820 B
> > >>> Total dirs:    1682
> > >>> Total files:   1459
> > >>> Total blocks (validated):      1459 (avg. block size 182129886 B)
> > >>> Minimally replicated blocks:   1459 (100.0 %)
> > >>> Over-replicated blocks:        0 (0.0 %)
> > >>> Under-replicated blocks:       0 (0.0 %)
> > >>> Mis-replicated blocks:         0 (0.0 %)
> > >>> Default replication factor:    3
> > >>> Average block replication:     3.0
> > >>> Corrupt blocks:                0
> > >>> Missing replicas:              0 (0.0 %)
> > >>> Number of data-nodes:          5
> > >>> Number of racks:               1
> > >>>
> > >>> Are there any other configuration parameters that need to be set ?
> > >>
> > >>
> > >> What is your HFile size set to? The HFiles that get persisted would be
> > >> bound by that number. Thereafter each HFile would be split into
> blocks,
> > the
> > >> size of which you configure using the dfs.block.size configuration
> > >> parameter.
> > >>
> > >>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> Praveen Bysani
> > >>> http://www.praveenbysani.com
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Praveen Bysani
> > > http://www.praveenbysani.com
> >
>
>
>
> --
> Regards,
> Praveen Bysani
> http://www.praveenbysani.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB