Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Question about the org.apache.hadoop.hive.contrib.serde2.RegexSerDe


Copy link to this message
-
Question about the org.apache.hadoop.hive.contrib.serde2.RegexSerDe

Hi,
I have a question about the behavior of the class org.apache.hadoop.hive.contrib.serde2.RegexSerDe. Here is the example I tested using the Cloudra hive-0.7.1-cdh3u3 release. The above class did NOT do what I expect, any one knows the reason?
user:~/tmp> more Test.javaimport java.io.*;import java.text.*;
class Test {    public static void main (String[] argv) throws Exception    {        String line = "aaa,\"bbb\",\"cc,c\"";        String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");        int i = 1;        for(String t : tokens) {            System.out.println(i + "> "+t);            i++;        }    }}
:~/tmp> java Test1> aaa2> "bbb"3> "cc,c"
As you can see, the Java regular expression ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" did what I want it to do, it parse the string aaa,"bbb","cc,c" to 3 tokens: (aaa), ("bbb"), and ("cc,c"). So the regular expression works fine.
Now in the hive:
:~> more test.txtaaa,"bbb","cc,c":~> hiveHive history file=/tmp/user/hive_job_log_user_201204031242_591028210.txthive> create table test(    >  c1 string,    >  c2 string,    >  c3 string    > )    > row format    > SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'    > WITH SERDEPROPERTIES (    > "input.regex" = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"    > )    > STORED AS TEXTFILE;OKTime taken: 0.401 secondshive> load data local inpath 'test.txt' overwrite into table test;Copying data from file:/home/user/test.txtCopying file: file:/home/user/test.txtLoading data to table dev.testDeleted hdfs://host/user/hive/warehouse/dev.db/testOKTime taken: 0.282 secondshive> select * from test;                                         OKNULL    NULL    NULL
When I query this table, I don't get what I expected. I expect the output should be the 3 strings like this ----->        aaa        "bbb"       "cc,c"
Why the output gives me 3 NULLs?
Thanks for your help.

     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB