Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Fwd: File formats in Hadoop


Copy link to this message
-
Fwd: File formats in Hadoop
---------- Forwarded message ----------
From: Weishung Chung <[EMAIL PROTECTED]>
Date: Tue, Mar 22, 2011 at 11:31 AM
Subject: Re: File formats in Hadoop
To: Vivek Krishna <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], Doug Cutting <[EMAIL PROTECTED]>
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is
the key value pair be
eg column family1 with one qualifier 1 with 2 versions

key1 : rowkey1+column family1:qualifier1+timestamp1
value1: corresponding cell value1
key2 :  rowkey1+column family1:qualifier1+timestamp2
value2: corresponding cell value 2
key3:  rowkey2+column family1:qualifier1+timestamp1
value3: corresponding cell value 3
<http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>
On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <[EMAIL PROTECTED]>wrote:

> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
> might help.
>
> Viv
>
>
>
>
> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[EMAIL PROTECTED]>wrote:
>
>> My fellow superb hbase experts,
>>
>> Looking at the HFile specs and have some questions.
>> How is a particular table cell in a HBase table being represented in the
>> HFile? Does the key of the key value pair represent the rowkey+column
>> family:qualifier+timestamp and the value represent the corresponding cell
>> value? If so, to read a row, multiple key/value pair reads have to be
>> done?
>>
>> Thank you :)
>>
>>
>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Thank you, I will definitely take a look. Also, the TFile spec below
>> helps
>> > me to understand more,
>> > what an exciting work !
>> >
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > <
>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf
>> >
>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote:
>> >> > I am browsing through the hadoop.io package and was wondering what
>> >> other
>> >> > file formats are available in hadoop other than SequenceFile and
>> TFile?
>> >> > Is all data written through hadoop including those from hbase saved
>> in
>> >> the
>> >> > above formats? It seems like SequenceFile is in key value pair
>> format.
>> >>
>> >> Avro includes a file format that works with Hadoop.
>> >>
>> >>
>> >>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>> >>
>> >> Doug
>> >>
>> >
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB