Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> unable to partition the table


Copy link to this message
-
Re: unable to partition the table
sorry for the spam. but one correction.  json_pp appears to be part of the
perl core so if you have a modern perl you may already have it installed.
On Mon, Jul 1, 2013 at 1:31 PM, Stephen Sprague <[EMAIL PROTECTED]> wrote:

> ok. so i just learned of a perl script called "json_pp"  (for json
> pretty_prrint) that is a binary included in the distro for the perl JSON
> module.  You gotta figure there are analogous tools in other languages as
> well but given this is a binary it doesn't matter much what language its
> written in - it just works.
>
> so doing this:  $ hive -e 'select <your_json_column> from <your_table>' |
> json_pp
>
> you'd actually be able to see what your json looks like in a human
> readable fashion.
>
> get the binary (and the module) here: https://metacpan.org/module/JSON
>
>
> On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind <[EMAIL PROTECTED]>wrote:
>
>> Ok. Thanks Stephen. I will try that out.
>> Will update the group if I am able to get this to work. For now, I will
>> continue with non-partitioned table.
>>
>> regards
>> Sunita
>>
>>
>> On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>
>>> it would appear to be that you may partition only by non-nested
>>> columns.  I would recommend transforming your original dataset into one
>>> where the first column is YYYYMM and the rest is your json object.  During
>>> this transformation you may also wish to make further optimizations as well
>>> since you'll be scanning every record.
>>>
>>> as always my 2 cents only.
>>>
>>>
>>> On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>>
>>>> I am unable to create a partitioned table.
>>>> The error I get is:
>>>> FAILED: ParseException line 37:16 mismatched input
>>>> '"jobs.values.postingDate.year"' expecting Identifier near '(' in column
>>>> specification
>>>>
>>>> I tried referring to the columns in various ways,
>>>> S.jobs.values.postingDate.year, with quotes, without quotes, get the same
>>>> error. Also tried creating a partition by year alone. Still get the same
>>>> error.
>>>>
>>>> Here is the create table statement:
>>>>
>>>> create external table linkedin_JobSearch (
>>>> jobs STRUCT<
>>>> values : ARRAY<STRUCT<
>>>> company : STRUCT<
>>>> id : STRING,
>>>> name : STRING>,
>>>> postingDate : STRUCT<
>>>> day : STRING>,
>>>> descriptionSnippet : STRING,
>>>> expirationDate : STRUCT<
>>>> ......
>>>> .......
>>>> locationDescription : STRING>>>
>>>> )
>>>> PARTITIONED BY ("jobs.values.postingDate.year" STRING,
>>>> "jobs.values.postingDate.month" STRING)
>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>> WITH SERDEPROPERTIES (
>>>> "company"="$.jobs.values.company.name",
>>>>  "position"="$.jobs.values.position.title",
>>>> "customerJobCode"="$.jobs.values.customerJobCode",
>>>> "locationDescription"="$.jobs.values.locationDescription",
>>>> "jobPoster"="$.jobs.values.jobposter.headline"
>>>> )
>>>> LOCATION '/user/sunita/Linkedin/JobSearch';
>>>>
>>>> I need to be able to partition this information. Please help.
>>>>
>>>> regards
>>>> Sunita
>>>>
>>>
>>>
>>
>