Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - bulk load skipping tsv files


Copy link to this message
-
Re: bulk load skipping tsv files
Ted Yu 2013-05-17, 17:05
Jinyuan:

bq. no new data needed, only some value will be changed by recalculation.

Have you considered using coprocessor to fullfil the above task ?

Cheers

On Fri, May 17, 2013 at 8:57 AM, Shahab Yunus <[EMAIL PROTECTED]>wrote:

> If I understood your usecase correctly, then if you don't need to maintain
> older versions of data then why don't you set the 'max version' parameter
> for your table to 1? I believe that the increase in data even in case of
> updates is due to that (?) Have you tried that?
>
> Regards,
> Shahab
>
>
> On Fri, May 17, 2013 at 11:49 AM, Jinyuan Zhou <[EMAIL PROTECTED]
> >wrote:
>
> > Actually,  I wanted to update each row of a table each day. no new data
> > needed, only some value will be changed by recalculation.  It looks like
> > every time I do, the data is doubled in table. even though it is update.
> I
> > believe even an update will result in new hfiles and the cluster is then
> > very busy on splitting region and related stuff. It need to about an hour
> > undate only about 250 milliron rows. I only need one version. so, I think
> > it might be faster, I just  store the calculated resesult in HFile and
> then
> > trunk the original table, then  bulk load to the Hfiles to the  empty
> > table.
> > Thanks,
> >
> >
> >
> > On Fri, May 17, 2013 at 7:55 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > bq. What I want is to read from some hbase table and create hfiles
> > directly
> > >
> > > Can you describe your use case in more detail ?
> > >
> > > Thanks
> > >
> > > On Fri, May 17, 2013 at 7:52 AM, Jinyuan Zhou <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi,
> > > > I wonder if there are tool similar
> > > > to org.apache.hadoop.hbase.mapreduce.ImportTsv.  IimportTsv read from
> > tsv
> > > > file and create HFiles which are ready to be loaded into the
> > > corresponding
> > > > region by another
> > > > tool org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. What I
> > want
> > > > is to read from some hbase table and create hfiles directly  I think
> I
> > I
> > > > know how to write up such class by following steps in ImportTsv class
> > > but I
> > > > wonder if some one already did this.
> > > > Thanks,
> > > > Jack
> > > >
> > > > --
> > > > -- Jinyuan (Jack) Zhou
> > > >
> > >
> >
> >
> >
> > --
> > -- Jinyuan (Jack) Zhou
> >
>