Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> bulk load skipping tsv files


Copy link to this message
-
Re: bulk load skipping tsv files
Jinyuan:

bq. no new data needed, only some value will be changed by recalculation.

Have you considered using coprocessor to fullfil the above task ?

Cheers

On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus <[EMAIL PROTECTED]>wrote:

> If I understood your usecase correctly, then if you don't need to maintain
> older versions of data then why don't you set the 'max version' parameter
> for your table to 1? I believe that the increase in data even in case of
> updates is due to that (?) Have you tried that?
>
> Regards,
> Shahab
>
>
> On Fri, May 17, 2013 at 11:49 AM, Jinyuan Zhou <[EMAIL PROTECTED]
> >wrote:
>
> > Actually,  I wanted to update each row of a table each day. no new data
> > needed, only some value will be changed by recalculation.  It looks like
> > every time I do, the data is doubled in table. even though it is update.
> I
> > believe even an update will result in new hfiles and the cluster is then
> > very busy on splitting region and related stuff. It need to about an hour
> > undate only about 250 milliron rows. I only need one version. so, I think
> > it might be faster, I just  store the calculated resesult in HFile and
> then
> > trunk the original table, then  bulk load to the Hfiles to the  empty
> > table.
> > Thanks,
> >
> >
> >
> > On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > bq. What I want is to read from some hbase table and create hfiles
> > directly
> > >
> > > Can you describe your use case in more detail ?
> > >
> > > Thanks
> > >
> > > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi,
> > > > I wonder if there are tool similar
> > > > to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read from
> > tsv
> > > > file and create HFiles which are ready to be loaded into the
> > > corresponding
> > > > region by another
> > > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I
> > want
> > > > is to read from some hbase table and create hfiles directly  I think
> I
> > I
> > > > know how to write up such class by following steps in ImportTsv class
> > > but I
> > > > wonder if some one already did this.
> > > > Thanks,
> > > > Jack
> > > >
> > > > --
> > > > -- Jinyuan (Jack) Zhou
> > > >
> > >
> >
> >
> >
> > --
> > -- Jinyuan (Jack) Zhou
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB