|
|
I have many small files (close to 1 million) and I was thinking of creating a key value pair for them. The file name can be the key and the content can be value.
Would it be better if I do a base64 on the content and load it to hbase or try to echo the content for hbase shell?
Has anyone done something similar to this?
-- --- Get your facts first, then you can distort them as you please.--
Joey Echeverria 2011-09-15, 12:32
It sounds lik you're planning to use the HBase shell to insert all of this data. If that's correct, I'd recommend against it. I would write a simple MapReduce program to insert the data instead. You could run a map-only job that reads in the files and writes each one as a row in HBase. WIth the java APIs you can write the raw bytes pretty easily.
-Joey
On Thu, Sep 15, 2011 at 7:56 AM, Rita <[EMAIL PROTECTED]> wrote: > I have many small files (close to 1 million) and I was thinking of creating > a key value pair for them. The file name can be the key and the content can > be value. > > Would it be better if I do a base64 on the content and load it to hbase or > try to echo the content for hbase shell? > > Has anyone done something similar to this? > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
+
Joey Echeverria 2011-09-15, 12:32
Akash Ashok 2011-09-15, 13:24
Also could you tell how small these files are ? If they are way less than 64MB default HDFS block size you'd want to splice them before running a MapReduce.
Cheers, Akash A
On Thu, Sep 15, 2011 at 6:02 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> It sounds lik you're planning to use the HBase shell to insert all of > this data. If that's correct, I'd recommend against it. I would write > a simple MapReduce program to insert the data instead. You could run a > map-only job that reads in the files and writes each one as a row in > HBase. WIth the java APIs you can write the raw bytes pretty easily. > > -Joey > > On Thu, Sep 15, 2011 at 7:56 AM, Rita <[EMAIL PROTECTED]> wrote: > > I have many small files (close to 1 million) and I was thinking of > creating > > a key value pair for them. The file name can be the key and the content > can > > be value. > > > > Would it be better if I do a base64 on the content and load it to hbase > or > > try to echo the content for hbase shell? > > > > Has anyone done something similar to this? > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
+
Akash Ashok 2011-09-15, 13:24
Each file is about 12k to 6k.
Inserting wont be an issue just the access. I would like to access them quickly.
Not sure what the proper key should be. The file name is ok, but just wondering if there is anything more I can be doing to leverage hbase. On Thu, Sep 15, 2011 at 9:24 AM, Akash Ashok <[EMAIL PROTECTED]> wrote:
> Also could you tell how small these files are ? If they are way less than > 64MB default HDFS block size you'd want to splice them before running a > MapReduce. > > Cheers, > Akash A > > On Thu, Sep 15, 2011 at 6:02 PM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > > > It sounds lik you're planning to use the HBase shell to insert all of > > this data. If that's correct, I'd recommend against it. I would write > > a simple MapReduce program to insert the data instead. You could run a > > map-only job that reads in the files and writes each one as a row in > > HBase. WIth the java APIs you can write the raw bytes pretty easily. > > > > -Joey > > > > On Thu, Sep 15, 2011 at 7:56 AM, Rita <[EMAIL PROTECTED]> wrote: > > > I have many small files (close to 1 million) and I was thinking of > > creating > > > a key value pair for them. The file name can be the key and the content > > can > > > be value. > > > > > > Would it be better if I do a base64 on the content and load it to hbase > > or > > > try to echo the content for hbase shell? > > > > > > Has anyone done something similar to this? > > > > > > > > > > > > -- > > > --- Get your facts first, then you can distort them as you please.-- > > > > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > >
-- --- Get your facts first, then you can distort them as you please.--
|
|