Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase import Tsv performance (slow import)


Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Anoop: Only thing is that some
mappers crashed.. So thin MR fw will run that mapper again on the same data
set.. Then the unique id will be different?

Anil: Yes, for the same dataset also the UniqueId will be different.
UniqueID does not depends on the data.

Thanks,
Anil Gupta

On Tue, Oct 23, 2012 at 11:07 PM, Anoop John <[EMAIL PROTECTED]> wrote:

> >. Is there a way that i can explicitly turn on WAL for bulk loading?
> no..
> How you generate the unique id?  Remember that initial steps wont need the
> HBase cluster at all. MR generates the HFiles and the o/p will be in file
> only..  Mappers also will write o/p to file...  Only thing is that some
> mappers crashed.. So thin MR fw will run that mapper again on the same data
> set.. Then the unique id will be different? I think you no need to worry
> about data loss from Hbase side..  So WAL is not required..
>
> -Anoop-
>
>
>
>
> On Wed, Oct 24, 2012 at 10:58 AM, anil gupta <[EMAIL PROTECTED]>
> wrote:
>
> > That's a very interesting fact. You made it clear but my custom Bulk
> Loader
> > generates an unique ID for every row in map phase. So, all my data is not
> > in csv or text. Is there a way that i can explicitly turn on WAL for bulk
> > loading?
> >
> > On Tue, Oct 23, 2012 at 10:14 PM, Anoop John <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Anil
> > >                 In case of bulk loading it is not like data is put into
> > > HBase one by one.. The MR job will create an o/p like HFile.. It will
> > > create the KVs and write to file in order as how HFile will look like..
> > The
> > > the file is loaded into HBase finally.. Only for this final step HBase
> RS
> > > will be used.. So there is no point in WAL there...  I am making it
> clear
> > > for you?   The data is already present in form of raw data in some txt
> or
> > > csv file  :)
> > >
> > > -Anoop-
> > >
> > > On Wed, Oct 24, 2012 at 10:41 AM, Anoop John <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hi Anil
> > > >
> > > >
> > > >
> > > > On Wed, Oct 24, 2012 at 10:39 AM, anil gupta <[EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> Hi Anoop,
> > > >>
> > > >> As per your last email, did you mean that WAL is not used while
> using
> > > >> HBase
> > > >> Bulk Loader? If yes, then how we ensure "no data loss" in case of
> > > >> RegionServer failure?
> > > >>
> > > >> Thanks,
> > > >> Anil Gupta
> > > >>
> > > >> On Tue, Oct 23, 2012 at 9:55 PM, ramkrishna vasudevan <
> > > >> [EMAIL PROTECTED]> wrote:
> > > >>
> > > >> > As Kevin suggested we can make use of bulk load that goes thro WAL
> > and
> > > >> > Memstore.  Or the second option will be to use the o/p of mappers
> to
> > > >> create
> > > >> > HFiles directly.
> > > >> >
> > > >> > Regards
> > > >> > Ram
> > > >> >
> > > >> > On Wed, Oct 24, 2012 at 8:59 AM, Anoop John <
> [EMAIL PROTECTED]>
> > > >> wrote:
> > > >> >
> > > >> > > Hi
> > > >> > >     Using ImportTSV tool you are trying to bulk load your data.
> > Can
> > > >> you
> > > >> > see
> > > >> > > and tell how many mappers and reducers were there. Out of total
> > time
> > > >> what
> > > >> > > is the time taken by the mapper phase and by the reducer phase.
> > >  Seems
> > > >> > like
> > > >> > > MR related issue (may be some conf issue). In this bulk load
> case
> > > >> most of
> > > >> > > the work is done by the MR job. It will read the raw data and
> > > convert
> > > >> it
> > > >> > > into Puts and write to HFiles. MR o/p is HFiles itself. The next
> > > part
> > > >> in
> > > >> > > ImportTSV will just put the HFiles under the table region
> store..
> > > >>  There
> > > >> > > wont be WAL usage in this bulk load.
> > > >> > >
> > > >> > > -Anoop-
> > > >> > >
> > > >> > > On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard <
> > > >> > > [EMAIL PROTECTED]> wrote:
> > > >> > >
> > > >> > > > Hi everyone
> > > >> > > >
> > > >> > > > I'm starting with hbase and testing for our needs. I have set
> > up a
> > > >> > hadoop
> > > >> > > > cluster of Three machines and A Hbase cluster atop on the same

Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB