Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Fwd: File formats in Hadoop


Copy link to this message
-
Fwd: File formats in Hadoop
Weishung Chung 2011-03-22, 16:44
---------- Forwarded message ----------
From: Weishung Chung <[EMAIL PROTECTED]>
Date: Tue, Mar 22, 2011 at 11:31 AM
Subject: Re: File formats in Hadoop
To: Vivek Krishna <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], Doug Cutting <[EMAIL PROTECTED]>
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions

key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 :  rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3:  rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <[EMAIL PROTECTED]>wrote:

> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[EMAIL PROTECTED]>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>