Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
Yeah after digging further into the code: Line#374 in
GenericOptionsParser.java "commandLine = parser.parse(opts, args, true);"
is the culprit. Nice find, Shrijeet. That answers my question. :)

Stack:
Could you please tell me the meaning of "IIRC"? Updating the document is
good but as per the behavior of parse() other -D option will also be
ignored if  tablename is followed by any -D option .
Duplicating the GOP functionality does not seems to be a good idea . Maybe
instead of invoking "parser.parse(opts, args, true);" if somehow we can
invoke "parser.parse(opts, args, false);" then all will be good. I haven't
looked at the api to know about the possibility of same. This is just food
for thought.

Thanks,
Anil

On Thu, Mar 8, 2012 at 12:06 PM, Shrijeet Paliwal
<[EMAIL PROTECTED]>wrote:

> GenericOptionsParser stops parsing the arguments as soon as first non
> option is specified (refer :
>
> http://commons.apache.org/cli/api-1.2/org/apache/commons/cli/Parser.html#parse(org.apache.commons.cli.Options
> ,
> java.lang.String[], boolean))
>
> So in this cases as soon parses sees the table name arg , it ignore all
> other properties specified with -D opt. Note it not only ignores separator
> it is also ignoring importtsv.skip.bad.lines option in your run which
> failed.
>
>
>
> On Thu, Mar 8, 2012 at 11:27 AM, Stack <[EMAIL PROTECTED]> wrote:
>
> > On Thu, Mar 8, 2012 at 11:14 AM, anil gupta <[EMAIL PROTECTED]>
> wrote:
> > > 1. Update the HBase bulk load documentation and specify that separator
> > > argument should be next to program name.
> >
> > This would help.
> >
> > > 2. Fix the problem in the code itself by handling the separator
> argument
> > > explicitly. (Still, i am wondering why only separator value is not
> being
> > > set in jobconf automatically if it is not provided next to program
> > name??)
> > >
> >
> > This is probably too late IIRC.  I haven't looked at code but
> > GenericOptionsParser has probably already been run by the time the
> > application starts to process args.  Duplicating what GOP in the
> > application is probably not the way to go either?
> >
> > St.Ack
> >
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB