Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Sqoop - utf-8 data load issue


+
varun kumar gullipalli 2013-07-16, 01:27
+
Jarek Jarcec Cecho 2013-07-16, 01:37
+
varun kumar gullipalli 2013-07-16, 01:52
+
Venkat Ranganathan 2013-07-16, 02:19
+
Jarek Jarcec Cecho 2013-07-16, 18:05
+
varun kumar gullipalli 2013-07-16, 23:24
+
Jarek Jarcec Cecho 2013-07-17, 15:36
+
varun kumar gullipalli 2013-07-18, 00:42
Copy link to this message
-
Re: Sqoop - utf-8 data load issue
Hi Varun,
 
Can you try changing the java code like this -
 
  public void set_type(String type) {
    this.type = new String(type.getbytes(),"UTF8");
  }
  public QueryResult with_type(String type) {
    this.type = new String(type.getbytes(),"UTF8");
    return this;
  }

 
Thanks
Sumit
________________________________
From: varun kumar gullipalli <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Wednesday, 17 July 2013 5:42 PM
Subject: Re: Sqoop - utf-8 data load issue

Thanks Jarcec.
sqoop version is 1.4.2
 
 
I was verifying the QueryResult.java file that sqoop creates; type is the column name which has multi-byte data(utf-8).
Does declaring type as string work for multi-byte data?
 
grep type QueryResult.java
  private String type;
  public String get_type() {
    return type;
  public void set_type(String type) {
    this.type = type;
  public QueryResult with_type(String type) {
    this.type = type;
    equal = equal && (this.type == null ? that.type == null : this.type.equals(that.type));
    this.type = JdbcWritableBridge.readString(5, __dbResults);
    JdbcWritableBridge.writeString(type, 5 + __off, 12, __dbStmt);
        this.type = null;
    this.type = Text.readString(__dataIn);
    if (null == this.type) {
    Text.writeString(__dataOut, type);
    __sb.append(FieldFormatter.escapeAndEnclose(type==null?"\\N":type (file://n%22:type/), delimiters));
    if (__cur_str.equals("null")) { this.type = null; } else {
      this.type = __cur_str;
    __sqoop$field_map.put("type", this.type);
    else    if ("type".equals(__fieldName)) {
      this.type = (String) __fieldVal;

  

________________________________
From: Jarek Jarcec Cecho <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; varun kumar gullipalli <[EMAIL PROTECTED]>
Sent: Wednesday, July 17, 2013 8:36 AM
Subject: Re: Sqoop - utf-8 data load issue
Thank you Varun,
the sequence c3 83 c2 a9 indeed do not correspond to correct character. I was able to google out one entry in stack overflow [1] that might be relevant to your issue somehow. I've tried to reproduce this on my cluster, but I was not able to. Do you think that you can do mysqldump of the table in question?  If you could share it with the Sqoop version and exact command line I would like to explore that a bit.

Jarcec

Links:
1: http://stackoverflow.com/questions/8499852/xmldocument-mis-reads-utf-8-e-acute-character

On Tue, Jul 16, 2013 at 04:24:49PM -0700, varun kumar gullipalli wrote:
> Here is the output Jarcec...
>  
>  
>
>
> ________________________________
> From: Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; varun kumar gullipalli <[EMAIL PROTECTED]>
> Sent: Tuesday, July 16, 2013 11:05 AM
> Subject: Re: Sqoop - utf-8 data load issue
>
>
> Thank you for the additional information Varun! Would you mind doing something like the following:
>
> hadoop dfs -text THE_FILE  | hexdump -C
>
> And sharing the output? I'm trying to see the actual content of the file rather than any interpreted value.
>
> Jarcec
>
> On Mon, Jul 15, 2013 at 06:52:11PM -0700, varun kumar gullipalli wrote:
> > Hi Jarcec,
> >
> > I am validating the data by running the following command,
> >
> > hadoop fs -text <hdfs cluster>
> >
> > I think there is no issue with the shell (correct me if am wrong) because I am connecting to MySQL database from the same shell(command line) and  could view the source data properly.
> >
> > Initially we observed that the following conf files doesn't have utf-8 encoding. 
> > <?xml version="1.0" encoding="UTF-8"?>
> >
> > sqoop-site.xml
> > sqoop=site-template.xml
> >
> > But no luck after making the changes too.
> >
> > Thanks,
> > Varun
> >
> >
> > ________________________________
> >  From: Jarek Jarcec Cecho <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]; varun kumar gullipalli <[EMAIL PROTECTED]>
> > Sent: Monday, July 15, 2013 6:37 PM
> > Subject: Re: Sqoop - utf-8 data load issue
> > 
> >
Here is a sample command line ....
  sqoop --options-file $CONN_FILE --lines-terminated-by '\n' --verbose --query "<<QUERY>>' and  \$CONDITIONS" -m 1 --target-dir $YYYY/$MM/$DD/${TBL_NAME} --null-string '\\N' --null-non-string '\\N' >> $LOGFILE 2>&1
+
Jarek Jarcec Cecho 2013-07-21, 16:57
+
varun kumar gullipalli 2013-07-23, 04:30
+
Jarek Jarcec Cecho 2013-07-23, 15:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB