Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> schema doubt


Copy link to this message
-
Re: schema doubt
Each file is about 12k to 6k.

Inserting wont be an issue just the access. I would like to access them
quickly.

Not sure what the proper key should be. The file name is ok, but just
wondering if there is anything more I can be doing to leverage hbase.
On Thu, Sep 15, 2011 at 9:24 AM, Akash Ashok <[EMAIL PROTECTED]> wrote:

> Also could you tell how small these files are ? If they are way less than
> 64MB default HDFS block size you'd want to splice them before running a
> MapReduce.
>
> Cheers,
> Akash A
>
> On Thu, Sep 15, 2011 at 6:02 PM, Joey Echeverria <[EMAIL PROTECTED]>
> wrote:
>
> > It sounds lik you're planning to use the HBase shell to insert all of
> > this data. If that's correct, I'd recommend against it. I would write
> > a simple MapReduce program to insert the data instead. You could run a
> > map-only job that reads in the files and writes each one as a row in
> > HBase. WIth the java APIs you can write the raw bytes pretty easily.
> >
> > -Joey
> >
> > On Thu, Sep 15, 2011 at 7:56 AM, Rita <[EMAIL PROTECTED]> wrote:
> > > I have many small files (close to 1 million) and I was thinking of
> > creating
> > > a key value pair for them. The file name can be the key and the content
> > can
> > > be value.
> > >
> > > Would it be better if I do a base64 on the content and load it to hbase
> > or
> > > try to echo the content for hbase shell?
> > >
> > > Has anyone done something similar to this?
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
> >
> >
> > --
> > Joseph Echeverria
> > Cloudera, Inc.
> > 443.305.9434
> >
>

--
--- Get your facts first, then you can distort them as you please.--
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB