Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Block size of HBase files


+
Praveen Bysani 2013-05-13, 06:40
+
Amandeep Khurana 2013-05-13, 07:03
+
Praveen Bysani 2013-05-13, 09:45
+
Ted Yu 2013-05-13, 10:36
+
Praveen Bysani 2013-05-13, 11:48
+
Anoop John 2013-05-13, 11:54
+
Praveen Bysani 2013-05-13, 12:19
Copy link to this message
-
Re: Block size of HBase files
I mean when u created the table (Using client I guess)  have u specified
any thuing like splitKeys or [start,end, no#regions]?

-Anoop-

On Mon, May 13, 2013 at 5:49 PM, Praveen Bysani <[EMAIL PROTECTED]>wrote:

> We insert data using java hbase client (org.apache.hadoop.hbase.client.*) .
> However we are not providing any details in the configuration object ,
> except for the zookeeper quorum, port number. Should we specify explicitly
> at this stage ?
>
> On 13 May 2013 19:54, Anoop John <[EMAIL PROTECTED]> wrote:
>
> > >now have 731 regions (each about ~350 mb !!). I checked the
> > configuration in CM, and the value for hbase.hregion.max.filesize  is 1
> GB
> > too !!!
> >
> > You mentioned the splits at the time of table creation?  How u created
> the
> > table?
> >
> > -Anoop-
> >
> > On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi,
> > >
> > > Thanks for the details. No i haven't run any compaction or i have no
> idea
> > > if there is one going on in background. I executed a major_compact on
> > that
> > > table  and i now have 731 regions (each about ~350 mb !!). I checked
> the
> > > configuration in CM, and the value for hbase.hregion.max.filesize  is 1
> > GB
> > > too !!!
> > >
> > > I am not trying to access HFiles in my MR job, infact i am just using a
> > PIG
> > > script which handles this. This number (731) is close to my number of
> map
> > > tasks, which makes sense. But how can i decrease this, shouldn't the
> size
> > > of each region be 1 GB with that configuration value ?
> > >
> > >
> > > On 13 May 2013 18:36, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > You can change HFile size through hbase.hregion.max.filesize
> parameter.
> > > >
> > > > On May 13, 2013, at 2:45 AM, Praveen Bysani <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I wanted to minimize on the number of map reduce tasks generated
> > while
> > > > > processing a job, hence configured it to a larger value.
> > > > >
> > > > > I don't think i have configured HFile size in the cluster. I use
> > > Cloudera
> > > > > Manager to mange my cluster, and the only configuration i can
> relate
> > > > > to is hfile.block.cache.size
> > > > > which is set to 0.25. How do i change the HFile size ?
> > > > >
> > > > > On 13 May 2013 15:03, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani <
> > > > [EMAIL PROTECTED]
> > > > >>> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have the dfs.block.size value set to 1 GB in my cluster
> > > > configuration.
> > > > >>
> > > > >>
> > > > >> Just out of curiosity - why do you have it set at 1GB?
> > > > >>
> > > > >>
> > > > >>> I
> > > > >>> have around 250 GB of data stored in hbase over this cluster. But
> > > when
> > > > i
> > > > >>> check the number of blocks, it doesn't correspond to the block
> size
> > > > >> value i
> > > > >>> set. From what i understand i should only have ~250 blocks. But
> > > instead
> > > > >>> when i did a fsck on the /hbase/<table-name>, i got the following
> > > > >>>
> > > > >>> Status: HEALTHY
> > > > >>> Total size:    265727504820 B
> > > > >>> Total dirs:    1682
> > > > >>> Total files:   1459
> > > > >>> Total blocks (validated):      1459 (avg. block size 182129886 B)
> > > > >>> Minimally replicated blocks:   1459 (100.0 %)
> > > > >>> Over-replicated blocks:        0 (0.0 %)
> > > > >>> Under-replicated blocks:       0 (0.0 %)
> > > > >>> Mis-replicated blocks:         0 (0.0 %)
> > > > >>> Default replication factor:    3
> > > > >>> Average block replication:     3.0
> > > > >>> Corrupt blocks:                0
> > > > >>> Missing replicas:              0 (0.0 %)
> > > > >>> Number of data-nodes:          5
> > > > >>> Number of racks:               1
> > > > >>>
> > > > >>> Are there any other configuration parameters that need to be set
> ?
> > > > >>
> > > > >>
> > > > >> What is your HFile size set to? The HFiles that get persisted
+
Praveen Bysani 2013-05-14, 02:23
+
Anoop John 2013-05-13, 10:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB