Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
anil gupta 2012-03-09, 16:29
Hi Lakshman,

As per your last email, it seems that updating the doc seems to be an easy
and right approach.

Thanks,
Anil Gupta

On Fri, Mar 9, 2012 at 12:20 AM, Laxman <[EMAIL PROTECTED]> wrote:

> Hi Anil,
>
> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
> > invoke "parser.parse(opts, args, false);" then all will be good. I
> > haven't
> > looked at the api to know about the possibility of same.
>
> Changing to parser.parse(opts, args, false) solves this problem.
> I think, we need to consider the following before going for this change.
>
> This involves behavior change in legacy hadoop code.
> Directly changing from true to false may cause behavioral compatibility
> issue.
>
> Also, Setting it to false may not be correct all the times.
>
> Case #1 java
> "java -Dprop1=val1 <Class> arg1 arg2" is different from "java <Class> arg1
> arg2 -Dprop1=val1
>
> In this case it looks like parser.parse(opts, args, true) is correct
>
>
> Case #2 linux
> "ls -l /home" is same as "ls /home -l"
>
> In this case it looks like parser.parse(opts, args, false) is correct
>
> >> This is probably too late IIRC
> Hope, Stack also meant the same point here.
>
> > Could you please tell me the meaning of "IIRC"?
> IIRC - If I Recall/Remember Correctly
>
> --
> Regards,
> Laxman
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> > anil gupta
> > Sent: Friday, March 09, 2012 3:12 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Bulk loading a CSV file into HBase
> >
> > Yeah after digging further into the code: Line#374 in
> > GenericOptionsParser.java "commandLine = parser.parse(opts, args,
> > true);"
> > is the culprit. Nice find, Shrijeet. That answers my question. :)
> >
> > Stack:
> > Could you please tell me the meaning of "IIRC"? Updating the document
> > is
> > good but as per the behavior of parse() other -D option will also be
> > ignored if  tablename is followed by any -D option .
> > Duplicating the GOP functionality does not seems to be a good idea .
> > Maybe
> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
> > invoke "parser.parse(opts, args, false);" then all will be good. I
> > haven't
> > looked at the api to know about the possibility of same. This is just
> > food
> > for thought.
> >
> > Thanks,
> > Anil
> >
> >
> >
> > On Thu, Mar 8, 2012 at 12:06 PM, Shrijeet Paliwal
> > <[EMAIL PROTECTED]>wrote:
> >
> > > GenericOptionsParser stops parsing the arguments as soon as first non
> > > option is specified (refer :
> > >
> > > http://commons.apache.org/cli/api-
> > 1.2/org/apache/commons/cli/Parser.html#parse(org.apache.commons.cli.Opt
> > ions
> > > ,
> > > java.lang.String[], boolean))
> > >
> > > So in this cases as soon parses sees the table name arg , it ignore
> > all
> > > other properties specified with -D opt. Note it not only ignores
> > separator
> > > it is also ignoring importtsv.skip.bad.lines option in your run which
> > > failed.
> > >
> > >
> > >
> > > On Thu, Mar 8, 2012 at 11:27 AM, Stack <[EMAIL PROTECTED]> wrote:
> > >
> > > > On Thu, Mar 8, 2012 at 11:14 AM, anil gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > > > 1. Update the HBase bulk load documentation and specify that
> > separator
> > > > > argument should be next to program name.
> > > >
> > > > This would help.
> > > >
> > > > > 2. Fix the problem in the code itself by handling the separator
> > > argument
> > > > > explicitly. (Still, i am wondering why only separator value is
> > not
> > > being
> > > > > set in jobconf automatically if it is not provided next to
> > program
> > > > name??)
> > > > >
> > > >
> > > > This is probably too late IIRC.  I haven't looked at code but
> > > > GenericOptionsParser has probably already been run by the time the
> > > > application starts to process args.  Duplicating what GOP in the
> > > > application is probably not the way to go either?
> > > >
> > > > St.Ack
> > > >
> > >
> >
> >
> >
Thanks & Regards,
Anil Gupta