Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Disk space usage of HFilev1 vs HFilev2


Copy link to this message
-
Re: Disk space usage of HFilev1 vs HFilev2
Anil,

  Please let us know how well this works.

On Mon, Aug 27, 2012 at 4:19 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Guys,
>
> I was digging through the hbase-default.xml file and i found this property
> relates HFile handling:
> </property>
>     <property>
>       <name>hfile.format.version</name>
>       <value>2</value>
>       <description>
>           The HFile format version to use for new files. Set this to 1 to
> test
>           backwards-compatibility. The default value of this option should
> be
>           consistent with FixedFileTrailer.MAX_VERSION.
>       </description>
>   </property>
>
> I believe setting this to 1 would help me carry out my test. Now we know
> how to store data in HFileV1 in HBase0.92 :) . I'll post the result once i
> try this out.
>
> Thanks,
> Anil
>
>
> On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <[EMAIL PROTECTED]>
> wrote:
>
> > Cool. Now we have something on the records :-)
> >
> > ./Zahoor@iPad
> >
> > On 15-Aug-2012, at 3:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> > > Not wanting to have this thread too end up as a mystery-result on the
> > > web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> > > into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> > > them (waited for completion and drop in IO write activity) and then
> > > measured them to find this:
> > >
> > > 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> > > 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> > >
> > > So… not much of a difference. It is still your data that counts. I
> > > believe what Anil may have had were merely additional, un-compacted
> > > stores?
> > >
> > > P.s. Note that my 'test' table were all defaults. That is, merely
> > > "create 'test', 'col1'", nothing else, so the block indexes must've
> > > probably gotten created for every row, as thats at 64k by default,
> > > while my rows are all 100k each.
> > >
> > > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <[EMAIL PROTECTED]>
> > wrote:
> > >> Hi Kevin,
> > >>
> > >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> > last
> > >> option will be to do store data on pseudo-distributed or standalone
> > cluster
> > >> for the comparison.
> > >> The advantage with the current installation is that its a fully
> > distributed
> > >> cluster with around 33 million records in a table. So, it would give
> me
> > a
> > >> better estimate.
> > >>
> > >> Thanks,
> > >> Anil Gupta
> > >>
> > >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <
> [EMAIL PROTECTED]
> > >wrote:
> > >>
> > >>> Do you not have a pseudo cluster for testing anywhere?
> > >>>
> > >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <[EMAIL PROTECTED]>
> > wrote:
> > >>>
> > >>>> Hi Jerry,
> > >>>>
> > >>>> I am wiling to do that but the problem is that i wiped off the
> > HBase0.90
> > >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92?
> If i
> > >>> can
> > >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> > >>>>
> > >>>> Thanks,
> > >>>> Anil Gupta
> > >>>>
> > >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <[EMAIL PROTECTED]>
> > wrote:
> > >>>>
> > >>>>> Hi Anil:
> > >>>>>
> > >>>>> Maybe you can try to compare the two HFile implementation directly?
> > Let
> > >>>> say
> > >>>>> write 1000 rows into HFile v1 format and then into HFile v2 format.
> > You
> > >>>> can
> > >>>>> then compare the size of the two directly?
> > >>>>>
> > >>>>> HTH,
> > >>>>>
> > >>>>> Jerry
> > >>>>>
> > >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <[EMAIL PROTECTED]
> >
> > >>>> wrote:
> > >>>>>
> > >>>>>> Hi Zahoor,
> > >>>>>>
> > >>>>>> Then it seems like i might have missed something when doing hdfs
> > >>> usage
> > >>>>>> estimation of HBase. I usually do hadoop fs -dus
> /hbase/$TABLE_NAME
> > >>> for
> > >>>>>> getting the hdfs usage of a table. Is this the right way? Since i
> > >>> wiped
> > >>>>> of

Kevin O'Dell
Customer Operations Engineer, Cloudera