Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Select statements return null


Copy link to this message
-
Re: Select statements return null
Hello Sanita,

If you use a JSON try to add the jar  'hive-json-serde.jar' before you
upload your data in the final table. And also try to make your date
attributes in String format first to debug (if this is the cause).

I don't know if you are using an external table with regular expressions
(regexp) to pasre your data?; if this is, can you send us the definition of
table and the structure of a row from your data.
the final way that I can suggest is to run an operation mapreduce over the
table (select count (1) from your_table) and then see the log of jobtracker
to debug the issue.

hope this can help you ;)
2013/7/30 Sunita Arvind <[EMAIL PROTECTED]>

> Hi,
>
> I have written a script which generates JSON files, loads it into a
> dictionary, adds a few attributes and uploads the modified files to HDFS.
> After the files are generated, if I perform a select * from..; on the table
> which points to this location, I get "null, null...." as the result. I also
> tried without the added attributes and it did not make a difference. I
> strongly suspect the data.
> Currently I am using strip() to eliminate trailing and leading whitespaces
> and newlines. Wondering if embedded "\n" that is, json string objects
> containing "\n" in the value, causes such issues.
> There are no parsing errors, so I am not able to debug this issue. Are
> there any flags that I can set to figure out what is happening within the
> parser code?
>
> I set this:
> hive -hiveconf hive.root.logger=DEBUG,console
>
> But the output is not really useful:
>
> blocks=[LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
> getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
> 192.168.1.66:50010, 192.168.1.63:50010]}]
>
> lastLocatedBlock=LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;
> getBlockSize()=20635; corrupt=false; offset=0; locs=[192.168.1.61:50010,
> 192.168.1.66:50010, 192.168.1.63:50010]}
>   isLastBlockComplete=true}
> 13/07/30 11:49:41 DEBUG hdfs.DFSClient: Connecting to datanode
> 192.168.1.61:50010
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> 13/07/30 11:49:41 INFO exec.
>
> Also, the attributes I am adding are current year, month day and time. So
> they are not null for any record. I even moved existing files which did not
> have these fields set so that there are no records with these fields as
> null. However, I dont think this is an issue as the advantage of JSON/Hive
> JSON serde is that it allows object struct to be dynamic. Right?
>
> Any suggestion regarding debugging would be very helpful.
>
> thanks
> Sunita
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB