Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Disk space usage of HFilev1 vs HFilev2


Copy link to this message
-
Re: Disk space usage of HFilev1 vs HFilev2
Hi Guys,

I was digging through the hbase-default.xml file and i found this property
relates HFile handling:
</property>
    <property>
      <name>hfile.format.version</name>
      <value>2</value>
      <description>
          The HFile format version to use for new files. Set this to 1 to
test
          backwards-compatibility. The default value of this option should
be
          consistent with FixedFileTrailer.MAX_VERSION.
      </description>
  </property>

I believe setting this to 1 would help me carry out my test. Now we know
how to store data in HFileV1 in HBase0.92 :) . I'll post the result once i
try this out.

Thanks,
Anil
On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <[EMAIL PROTECTED]> wrote:

> Cool. Now we have something on the records :-)
>
> ./Zahoor@iPad
>
> On 15-Aug-2012, at 3:12 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
> > Not wanting to have this thread too end up as a mystery-result on the
> > web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> > into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> > them (waited for completion and drop in IO write activity) and then
> > measured them to find this:
> >
> > 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> > 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> >
> > So… not much of a difference. It is still your data that counts. I
> > believe what Anil may have had were merely additional, un-compacted
> > stores?
> >
> > P.s. Note that my 'test' table were all defaults. That is, merely
> > "create 'test', 'col1'", nothing else, so the block indexes must've
> > probably gotten created for every row, as thats at 64k by default,
> > while my rows are all 100k each.
> >
> > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <[EMAIL PROTECTED]>
> wrote:
> >> Hi Kevin,
> >>
> >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> last
> >> option will be to do store data on pseudo-distributed or standalone
> cluster
> >> for the comparison.
> >> The advantage with the current installation is that its a fully
> distributed
> >> cluster with around 33 million records in a table. So, it would give me
> a
> >> better estimate.
> >>
> >> Thanks,
> >> Anil Gupta
> >>
> >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <[EMAIL PROTECTED]
> >wrote:
> >>
> >>> Do you not have a pseudo cluster for testing anywhere?
> >>>
> >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>> Hi Jerry,
> >>>>
> >>>> I am wiling to do that but the problem is that i wiped off the
> HBase0.90
> >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
> >>> can
> >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> >>>>
> >>>> Thanks,
> >>>> Anil Gupta
> >>>>
> >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <[EMAIL PROTECTED]>
> wrote:
> >>>>
> >>>>> Hi Anil:
> >>>>>
> >>>>> Maybe you can try to compare the two HFile implementation directly?
> Let
> >>>> say
> >>>>> write 1000 rows into HFile v1 format and then into HFile v2 format.
> You
> >>>> can
> >>>>> then compare the size of the two directly?
> >>>>>
> >>>>> HTH,
> >>>>>
> >>>>> Jerry
> >>>>>
> >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>>
> >>>>>> Hi Zahoor,
> >>>>>>
> >>>>>> Then it seems like i might have missed something when doing hdfs
> >>> usage
> >>>>>> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
> >>> for
> >>>>>> getting the hdfs usage of a table. Is this the right way? Since i
> >>> wiped
> >>>>> of
> >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
> >>> it
> >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> HBase0.92?
> >>>>>> In this way i can do a fair comparison.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <[EMAIL PROTECTED]> wrote:
> >>>>>>
> >>>>>>> Hi Anil,

Thanks & Regards,
Anil Gupta