|
Harsh J
2011-03-19, 16:26
Niels Basjes
2011-03-20, 11:04
Weishung Chung
2011-03-21, 14:42
Doug Cutting
2011-03-21, 16:41
Weishung Chung
2011-03-22, 14:09
Weishung Chung
2011-03-22, 15:43
Vivek Krishna
2011-03-22, 15:58
Weishung Chung
2011-03-22, 16:31
Weishung Chung
2011-03-22, 19:56
|
-
Re: File formats in HadoopHarsh J 2011-03-19, 16:26
Hello,
On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I am browsing through the hadoop.io package and was wondering what other > file formats are available in hadoop other than SequenceFile and TFile? Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of SequenceFiles, if you need maps/sets), and IFiles (Used by the map-output buffers to produce a key-value file for Reducers to use, internal use only). Apache Hive use RCFiles, which is very interesting too. Apache Avro provides Avro-Datafiles that are designed for use with Hadoop Map/Reduce + Avro-serialized data. I'm not sure of this one, but Pig probably was implementing a table-file-like solution of their own a while ago. Howl? -- Harsh J http://harshj.com
-
Re: File formats in HadoopNiels Basjes 2011-03-20, 11:04
And then there is the matter of how you put the data in the file. I've
heard that some people write the data as protocolbuffers into the sequence file. 2011/3/19 Harsh J <[EMAIL PROTECTED]>: > Hello, > > On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: >> I am browsing through the hadoop.io package and was wondering what other >> file formats are available in hadoop other than SequenceFile and TFile? > > Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of > SequenceFiles, if you need maps/sets), and IFiles (Used by the > map-output buffers to produce a key-value file for Reducers to use, > internal use only). > > Apache Hive use RCFiles, which is very interesting too. Apache Avro > provides Avro-Datafiles that are designed for use with Hadoop > Map/Reduce + Avro-serialized data. > > I'm not sure of this one, but Pig probably was implementing a > table-file-like solution of their own a while ago. Howl? > > -- > Harsh J > http://harshj.com > -- Met vriendelijke groeten, Niels Basjes
-
Re: File formats in HadoopWeishung Chung 2011-03-21, 14:42
I found this interesting article about sequence file, share it here
http://www.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ On Sun, Mar 20, 2011 at 6:04 AM, Niels Basjes <[EMAIL PROTECTED]> wrote: > And then there is the matter of how you put the data in the file. I've > heard that some people write the data as protocolbuffers into the > sequence file. > > 2011/3/19 Harsh J <[EMAIL PROTECTED]>: > > Hello, > > > > On Sat, Mar 19, 2011 at 9:31 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > >> I am browsing through the hadoop.io package and was wondering what > other > >> file formats are available in hadoop other than SequenceFile and TFile? > > > > Additionally, on Hadoop, there're MapFiles/SetFiles (Derivative of > > SequenceFiles, if you need maps/sets), and IFiles (Used by the > > map-output buffers to produce a key-value file for Reducers to use, > > internal use only). > > > > Apache Hive use RCFiles, which is very interesting too. Apache Avro > > provides Avro-Datafiles that are designed for use with Hadoop > > Map/Reduce + Avro-serialized data. > > > > I'm not sure of this one, but Pig probably was implementing a > > table-file-like solution of their own a while ago. Howl? > > > > -- > > Harsh J > > http://harshj.com > > > > > > -- > Met vriendelijke groeten, > > Niels Basjes >
-
Re: File formats in HadoopDoug Cutting 2011-03-21, 16:41
On 03/19/2011 09:01 AM, Weishung Chung wrote:
> I am browsing through the hadoop.io package and was wondering what other > file formats are available in hadoop other than SequenceFile and TFile? > Is all data written through hadoop including those from hbase saved in the > above formats? It seems like SequenceFile is in key value pair format. Avro includes a file format that works with Hadoop. http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html Doug
-
Re: File formats in HadoopWeishung Chung 2011-03-22, 14:09
Thank you, I will definitely take a look. Also, the TFile spec below helps
me to understand more, what an exciting work ! https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf> On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 03/19/2011 09:01 AM, Weishung Chung wrote: > > I am browsing through the hadoop.io package and was wondering what other > > file formats are available in hadoop other than SequenceFile and TFile? > > Is all data written through hadoop including those from hbase saved in > the > > above formats? It seems like SequenceFile is in key value pair format. > > Avro includes a file format that works with Hadoop. > > > http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html > > Doug >
-
Re: File formats in HadoopWeishung Chung 2011-03-22, 15:43
My fellow superb hbase experts,
Looking at the HFile specs and have some questions. How is a particular table cell in a HBase table being represented in the HFile? Does the key of the key value pair represent the rowkey+column family:qualifier+timestamp and the value represent the corresponding cell value? If so, to read a row, multiple key/value pair reads have to be done? Thank you :) On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Thank you, I will definitely take a look. Also, the TFile spec below helps > me to understand more, > what an exciting work ! > > > https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf > > <https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> On 03/19/2011 09:01 AM, Weishung Chung wrote: >> > I am browsing through the hadoop.io package and was wondering what >> other >> > file formats are available in hadoop other than SequenceFile and TFile? >> > Is all data written through hadoop including those from hbase saved in >> the >> > above formats? It seems like SequenceFile is in key value pair format. >> >> Avro includes a file format that works with Hadoop. >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html >> >> Doug >> > >
-
Re: File formats in HadoopVivek Krishna 2011-03-22, 15:58
http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained
might help. Viv On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > My fellow superb hbase experts, > > Looking at the HFile specs and have some questions. > How is a particular table cell in a HBase table being represented in the > HFile? Does the key of the key value pair represent the rowkey+column > family:qualifier+timestamp and the value represent the corresponding cell > value? If so, to read a row, multiple key/value pair reads have to be done? > > Thank you :) > > > On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Thank you, I will definitely take a look. Also, the TFile spec below > helps > > me to understand more, > > what an exciting work ! > > > > > > > https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf > > > > < > https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf > > > > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]> > wrote: > > > >> On 03/19/2011 09:01 AM, Weishung Chung wrote: > >> > I am browsing through the hadoop.io package and was wondering what > >> other > >> > file formats are available in hadoop other than SequenceFile and > TFile? > >> > Is all data written through hadoop including those from hbase saved in > >> the > >> > above formats? It seems like SequenceFile is in key value pair format. > >> > >> Avro includes a file format that works with Hadoop. > >> > >> > >> > http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html > >> > >> Doug > >> > > > > >
-
Re: File formats in HadoopWeishung Chung 2011-03-22, 16:31
I also found this informative article
http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is the key value pair be eg column family1 with one qualifier 1 with 2 versions key1 : rowkey1+column family1:qualifier1+timestamp1 value1: corresponding cell value1 key2 : rowkey1+column family1:qualifier1+timestamp2 value2: corresponding cell value 2 key3: rowkey2+column family1:qualifier1+timestamp1 value3: corresponding cell value 3 <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html> On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <[EMAIL PROTECTED]>wrote: > http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained > might help. > > Viv > > > > > On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[EMAIL PROTECTED]>wrote: > >> My fellow superb hbase experts, >> >> Looking at the HFile specs and have some questions. >> How is a particular table cell in a HBase table being represented in the >> HFile? Does the key of the key value pair represent the rowkey+column >> family:qualifier+timestamp and the value represent the corresponding cell >> value? If so, to read a row, multiple key/value pair reads have to be >> done? >> >> Thank you :) >> >> >> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> >> > Thank you, I will definitely take a look. Also, the TFile spec below >> helps >> > me to understand more, >> > what an exciting work ! >> > >> > >> > >> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf >> > >> > < >> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf >> > >> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]> >> wrote: >> > >> >> On 03/19/2011 09:01 AM, Weishung Chung wrote: >> >> > I am browsing through the hadoop.io package and was wondering what >> >> other >> >> > file formats are available in hadoop other than SequenceFile and >> TFile? >> >> > Is all data written through hadoop including those from hbase saved >> in >> >> the >> >> > above formats? It seems like SequenceFile is in key value pair >> format. >> >> >> >> Avro includes a file format that works with Hadoop. >> >> >> >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html >> >> >> >> Doug >> >> >> > >> > >> > >
-
Re: File formats in HadoopWeishung Chung 2011-03-22, 19:56
I found this useful article that explains the internal storage of HFile
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html <http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html> On Tue, Mar 22, 2011 at 11:31 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I also found this informative article > > http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html > > > > <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html>is > the key value pair be > eg column family1 with one qualifier 1 with 2 versions > > key1 : rowkey1+column family1:qualifier1+timestamp1 > value1: corresponding cell value1 > key2 : rowkey1+column family1:qualifier1+timestamp2 > value2: corresponding cell value 2 > key3: rowkey2+column family1:qualifier1+timestamp1 > value3: corresponding cell value 3 > <http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html> > On Tue, Mar 22, 2011 at 10:58 AM, Vivek Krishna <[EMAIL PROTECTED]>wrote: > >> http://nosql.mypopescu.com/post/3220921756/hbase-internals-hfile-explained >> might help. >> >> Viv >> >> >> >> >> On Tue, Mar 22, 2011 at 11:43 AM, Weishung Chung <[EMAIL PROTECTED]>wrote: >> >>> My fellow superb hbase experts, >>> >>> Looking at the HFile specs and have some questions. >>> How is a particular table cell in a HBase table being represented in the >>> HFile? Does the key of the key value pair represent the rowkey+column >>> family:qualifier+timestamp and the value represent the corresponding cell >>> value? If so, to read a row, multiple key/value pair reads have to be >>> done? >>> >>> Thank you :) >>> >>> >>> On Tue, Mar 22, 2011 at 9:09 AM, Weishung Chung <[EMAIL PROTECTED]> >>> wrote: >>> >>> > Thank you, I will definitely take a look. Also, the TFile spec below >>> helps >>> > me to understand more, >>> > what an exciting work ! >>> > >>> > >>> > >>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf >>> > >>> > < >>> https://issues.apache.org/jira/secure/attachment/12396286/TFile+Specification+20081217.pdf >>> > >>> > On Mon, Mar 21, 2011 at 11:41 AM, Doug Cutting <[EMAIL PROTECTED]> >>> wrote: >>> > >>> >> On 03/19/2011 09:01 AM, Weishung Chung wrote: >>> >> > I am browsing through the hadoop.io package and was wondering what >>> >> other >>> >> > file formats are available in hadoop other than SequenceFile and >>> TFile? >>> >> > Is all data written through hadoop including those from hbase saved >>> in >>> >> the >>> >> > above formats? It seems like SequenceFile is in key value pair >>> format. >>> >> >>> >> Avro includes a file format that works with Hadoop. >>> >> >>> >> >>> >> >>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html >>> >> >>> >> Doug >>> >> >>> > >>> > >>> >> >> > |