|
|
-
Loading json files into hive table is giving NULL as output(data is in s3 bucket)
Chunky Gupta 2013-02-18, 12:43
Hi, I have data in s3 bucket, which is in json format and is a zip file. I have added this jar file in hive console :- http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&qI tried the following steps to create table and load data :- 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION 's3://BUCKET_NAME/test_data/' I tried this also :- CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION 's3://BUCKET_NAME/test_data/' 2. alter table table_test add partition (dt='13Feb2012') location 's3n://BUCKET_NAME/test_data/13Feb2012'; and json file is like this :- ------------------------------------- {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"} {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"} {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} When I query :- select uname from table_test; Output :- NULL 13Feb2012 NULL 13Feb2012 NULL 13Feb2012 Please help me and let me know how to add json data in a table. Thanks, Chunky.
-
Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)
Dean Wampler 2013-02-18, 12:53
The "uname="$._u" is the correct form. We also hacked on this SerDe at Think Big Analytics. I don't know if you'll see an improvement though. https://github.com/thinkbiganalytics/hive-json-serdeI wonder if there's a problem handling the leading underscore? Also, I know it's just an example, but in case it was taken from a real situation, the dates in your example are for January. dean On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <[EMAIL PROTECTED]>wrote: > Hi, > > I have data in s3 bucket, which is in json format and is a zip file. I > have added this jar file in hive console :- > > http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q>> I tried the following steps to create table and load data :- > > 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt > STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" > WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION > 's3://BUCKET_NAME/test_data/' > > I tried this also :- > > CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt > STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" > WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION > 's3://BUCKET_NAME/test_data/' > > > > 2. alter table table_test add partition (dt='13Feb2012') location > 's3n://BUCKET_NAME/test_data/13Feb2012'; > > and json file is like this :- > ------------------------------------- > {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"} > {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"} > {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} > > > When I query :- > select uname from table_test; > > Output :- > NULL 13Feb2012 > NULL 13Feb2012 > NULL 13Feb2012 > > > Please help me and let me know how to add json data in a table. > > Thanks, > Chunky. > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
-
Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)
Chunky Gupta 2013-02-18, 15:17
Hi Dean, I tried with removing underscore too, and getting the same output which means problem is not with underscore. Yes, it was an example. Actual json file is like :- {"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]} I tried extracting one column only as I mentioned in last mail. There are values not in double quotes, some are null and some keys are having multiple values. Dean, is this json file correct for HIVE to handle it ? Thanks, Chunky. On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler < [EMAIL PROTECTED]> wrote: > The "uname="$._u" is the correct form. We also hacked on this SerDe at > Think Big Analytics. I don't know if you'll see an improvement though. > > https://github.com/thinkbiganalytics/hive-json-serde> > I wonder if there's a problem handling the leading underscore? > > Also, I know it's just an example, but in case it was taken from a real > situation, the dates in your example are for January. > > dean > > On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> I have data in s3 bucket, which is in json format and is a zip file. I >> have added this jar file in hive console :- >> >> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q>>>> I tried the following steps to create table and load data :- >> >> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION >> 's3://BUCKET_NAME/test_data/' >> >> I tried this also :- >> >> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION >> 's3://BUCKET_NAME/test_data/' >> >> >> >> 2. alter table table_test add partition (dt='13Feb2012') location >> 's3n://BUCKET_NAME/test_data/13Feb2012'; >> >> and json file is like this :- >> ------------------------------------- >> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"} >> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"} >> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} >> >> >> When I query :- >> select uname from table_test; >> >> Output :- >> NULL 13Feb2012 >> NULL 13Feb2012 >> NULL 13Feb2012 >> >> >> Please help me and let me know how to add json data in a table. >> >> Thanks, >> Chunky. >> > > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > >
-
Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)
Chunky Gupta 2013-02-18, 15:31
Hi Dean, I was using *hive-json-serde-0.2.jar* earlier. Now I tried *hive-json-serde-0.3.jar *as you suggested and it is working fine, I am getting the output as expected. Can you please tell me that what code change from 0.2 to 0.3 could have solved this problem ? Thanks, Chunky. On Mon, Feb 18, 2013 at 8:47 PM, Chunky Gupta <[EMAIL PROTECTED]>wrote: > Hi Dean, > > I tried with removing underscore too, and getting the same output which > means problem is not with underscore. Yes, it was an example. > > Actual json file is like :- > > > {"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]} > > I tried extracting one column only as I mentioned in last mail. > > There are values not in double quotes, some are null and some keys are > having multiple values. > Dean, is this json file correct for HIVE to handle it ? > > Thanks, > Chunky. > > > > > > On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler < > [EMAIL PROTECTED]> wrote: > >> The "uname="$._u" is the correct form. We also hacked on this SerDe at >> Think Big Analytics. I don't know if you'll see an improvement though. >> >> https://github.com/thinkbiganalytics/hive-json-serde>> >> I wonder if there's a problem handling the leading underscore? >> >> Also, I know it's just an example, but in case it was taken from a real >> situation, the dates in your example are for January. >> >> dean >> >> On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <[EMAIL PROTECTED]>wrote: >> >>> Hi, >>> >>> I have data in s3 bucket, which is in json format and is a zip file. I >>> have added this jar file in hive console :- >>> >>> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q>>>>>> I tried the following steps to create table and load data :- >>> >>> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >>> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION >>> 's3://BUCKET_NAME/test_data/' >>> >>> I tried this also :- >>> >>> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >>> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION >>> 's3://BUCKET_NAME/test_data/' >>> >>> >>> >>> 2. alter table table_test add partition (dt='13Feb2012') location >>> 's3n://BUCKET_NAME/test_data/13Feb2012'; >>> >>> and json file is like this :- >>> ------------------------------------- >>> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"} >>> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"} >>> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} >>> >>> >>> When I query :- >>> select uname from table_test; >>> >>> Output :- >>> NULL 13Feb2012 >>> NULL 13Feb2012 >>> NULL 13Feb2012 >>> >>> >>> Please help me and let me know how to add json data in a table. >>> >>> Thanks, >>> Chunky. >>> >> >> >> >> -- >> *Dean Wampler, Ph.D.* >> thinkbiganalytics.com >> +1-312-339-1330 >> >> >
-
Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)
Dean Wampler 2013-02-18, 17:01
I don't know what changed. I didn't work on this SerDe myself. I'm glad it's working now for you, though! dean On Mon, Feb 18, 2013 at 9:31 AM, Chunky Gupta <[EMAIL PROTECTED]>wrote: > Hi Dean, > > I was using *hive-json-serde-0.2.jar* earlier. Now I tried *hive-json-serde-0.3.jar > *as you suggested and it is working fine, I am getting the output as > expected. > > Can you please tell me that what code change from 0.2 to 0.3 could have > solved this problem ? > > > Thanks, > Chunky. > > > On Mon, Feb 18, 2013 at 8:47 PM, Chunky Gupta <[EMAIL PROTECTED]>wrote: > >> Hi Dean, >> >> I tried with removing underscore too, and getting the same output which >> means problem is not with underscore. Yes, it was an example. >> >> Actual json file is like :- >> >> >> {"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]} >> >> I tried extracting one column only as I mentioned in last mail. >> >> There are values not in double quotes, some are null and some keys are >> having multiple values. >> Dean, is this json file correct for HIVE to handle it ? >> >> Thanks, >> Chunky. >> >> >> >> >> >> On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler < >> [EMAIL PROTECTED]> wrote: >> >>> The "uname="$._u" is the correct form. We also hacked on this SerDe at >>> Think Big Analytics. I don't know if you'll see an improvement though. >>> >>> https://github.com/thinkbiganalytics/hive-json-serde>>> >>> I wonder if there's a problem handling the leading underscore? >>> >>> Also, I know it's just an example, but in case it was taken from a real >>> situation, the dates in your example are for January. >>> >>> dean >>> >>> On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <[EMAIL PROTECTED]>wrote: >>> >>>> Hi, >>>> >>>> I have data in s3 bucket, which is in json format and is a zip file. I >>>> have added this jar file in hive console :- >>>> >>>> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q>>>>>>>> I tried the following steps to create table and load data :- >>>> >>>> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >>>> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION >>>> 's3://BUCKET_NAME/test_data/' >>>> >>>> I tried this also :- >>>> >>>> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt >>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" >>>> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION >>>> 's3://BUCKET_NAME/test_data/' >>>> >>>> >>>> >>>> 2. alter table table_test add partition (dt='13Feb2012') location >>>> 's3n://BUCKET_NAME/test_data/13Feb2012'; >>>> >>>> and json file is like this :- >>>> ------------------------------------- >>>> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"} >>>> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"} >>>> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} >>>> >>>> >>>> When I query :- >>>> select uname from table_test; >>>> >>>> Output :- >>>> NULL 13Feb2012 >>>> NULL 13Feb2012 >>>> NULL 13Feb2012 >>>> >>>> >>>> Please help me and let me know how to add json data in a table. >>>> >>>> Thanks, >>>> Chunky. >>>> >>> >>> >>> >>> -- >>> *Dean Wampler, Ph.D.* >>> thinkbiganalytics.com >>> +1-312-339-1330 >>> >>> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330
|
|