Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Select statements return null


Copy link to this message
-
Re: Select statements return null
Matouk IFTISSEN 2013-08-02, 08:18
Hello Sanita,
I looked today for your case and if I have a time I will test it, I just
took a look at the json-serde.jar    source code of the clas*s* JSONSerDe and
found ( I'm not a java developer ;) ) some thing that I suppose it can make
the wrong results looking to your data:

// Get a list of the table's column names.
    String colNamesStr = tbl.getProperty(serdeConstants.LIST_COLUMNS);
    colNames = Arrays.asList(colNamesStr.split(","));

the comma "," that your fields values contain also (additional to your
columns separators).

What I suggest is to start debugging step by step like this:
in the first make just  on or tow columns in your entity structure
(struct in your table definition)    and adjust your data also to make
your table parse this data (10 records with one or two attributes).

And test the select *, if that does not work make select count (1) and
look at the jobtracker log to   see what is happened.
If it works, it is good way and then add in the progress 'in french au
fure et à mesure ;)'  the rest of your attribute and test them.
Hope this can help you.
Matouk

2013/8/1 Sunita Arvind <[EMAIL PROTECTED]>

> Thanks for your help Matouk,
> I am using a JSON serde -
> http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
> (mentioned on this page - https://github.com/cloudera/cdh-twitter-example)
>
> Attached is the table definition. I have tried with the fetch_xx fields in
> the input file and table and also tried without them. The logs usually are
> useful when there is an error in parsing. If there are no errors, the logs
> don't show me anything. Or am I missing anything here?
>
> I am also attaching 2 output samples. One of them (xxx20130729_0) has the
> profileid and fetch_year, fetch_month, fetch_day and fetch_time added. I
> need to get this working. For debugging purposes, I tried without these
> fields (xx09July2013_20) and loading just the response json without
> manipulations. While using this, I used the create table without the
> corresponding fields in the definition as well. The results are not
> consistent with this.
>
> Is there a way (a debug flag) to make the jackson parser emit the current
> token?
>
> Sunita
>
>
>
>
>
> On Wed, Jul 31, 2013 at 4:06 AM, Matouk IFTISSEN <
> [EMAIL PROTECTED]> wrote:
>
>> Hello Sanita,
>>
>> If you use a JSON try to add the jar  'hive-json-serde.jar' before you
>> upload your data in the final table. And also try to make your date
>> attributes in String format first to debug (if this is the cause).
>>
>> I don't know if you are using an external table with regular expressions
>> (regexp) to pasre your data?; if this is, can you send us the definition of
>> table and the structure of a row from your data.
>> the final way that I can suggest is to run an operation mapreduce over
>> the table (select count (1) from your_table) and then see the log of
>> jobtracker to debug the issue.
>>
>> hope this can help you ;)
>>
>>
>>
>>
>> 2013/7/30 Sunita Arvind <[EMAIL PROTECTED]>
>>
>>> Hi,
>>>
>>> I have written a script which generates JSON files, loads it into a
>>> dictionary, adds a few attributes and uploads the modified files to HDFS.
>>> After the files are generated, if I perform a select * from..; on the table
>>> which points to this location, I get "null, null...." as the result. I also
>>> tried without the added attributes and it did not make a difference. I
>>> strongly suspect the data.
>>> Currently I am using strip() to eliminate trailing and leading
>>> whitespaces and newlines. Wondering if embedded "\n" that is, json string
>>> objects containing "\n" in the value, causes such issues.
>>> There are no parsing errors, so I am not able to debug this issue. Are
>>> there any flags that I can set to figure out what is happening within the
>>> parser code?
>>>
>>> I set this:
>>> hive -hiveconf hive.root.logger=DEBUG,console
>>>
>>> But the output is not really useful:
>>>
>>> blocks=[LocatedBlock{BP-330966259-192.168.1.61-1351349834344:blk_-6076570611719758877_116734;