Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> unable to partition the table


Copy link to this message
-
Re: unable to partition the table
sorry for the spam. but one correction.  json_pp appears to be part of the
perl core so if you have a modern perl you may already have it installed.
On Mon, Jul 1, 2013 at 1:31 PM, Stephen Sprague <[EMAIL PROTECTED]> wrote:

> ok. so i just learned of a perl script called "json_pp"  (for json
> pretty_prrint) that is a binary included in the distro for the perl JSON
> module.  You gotta figure there are analogous tools in other languages as
> well but given this is a binary it doesn't matter much what language its
> written in - it just works.
>
> so doing this:  $ hive -e 'select <your_json_column> from <your_table>' |
> json_pp
>
> you'd actually be able to see what your json looks like in a human
> readable fashion.
>
> get the binary (and the module) here: https://metacpan.org/module/JSON
>
>
> On Wed, Jun 26, 2013 at 4:38 PM, Sunita Arvind <[EMAIL PROTECTED]>wrote:
>
>> Ok. Thanks Stephen. I will try that out.
>> Will update the group if I am able to get this to work. For now, I will
>> continue with non-partitioned table.
>>
>> regards
>> Sunita
>>
>>
>> On Wed, Jun 26, 2013 at 7:11 PM, Stephen Sprague <[EMAIL PROTECTED]>wrote:
>>
>>> it would appear to be that you may partition only by non-nested
>>> columns.  I would recommend transforming your original dataset into one
>>> where the first column is YYYYMM and the rest is your json object.  During
>>> this transformation you may also wish to make further optimizations as well
>>> since you'll be scanning every record.
>>>
>>> as always my 2 cents only.
>>>
>>>
>>> On Wed, Jun 26, 2013 at 3:47 PM, Sunita Arvind <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>>
>>>> I am unable to create a partitioned table.
>>>> The error I get is:
>>>> FAILED: ParseException line 37:16 mismatched input
>>>> '"jobs.values.postingDate.year"' expecting Identifier near '(' in column
>>>> specification
>>>>
>>>> I tried referring to the columns in various ways,
>>>> S.jobs.values.postingDate.year, with quotes, without quotes, get the same
>>>> error. Also tried creating a partition by year alone. Still get the same
>>>> error.
>>>>
>>>> Here is the create table statement:
>>>>
>>>> create external table linkedin_JobSearch (
>>>> jobs STRUCT<
>>>> values : ARRAY<STRUCT<
>>>> company : STRUCT<
>>>> id : STRING,
>>>> name : STRING>,
>>>> postingDate : STRUCT<
>>>> day : STRING>,
>>>> descriptionSnippet : STRING,
>>>> expirationDate : STRUCT<
>>>> ......
>>>> .......
>>>> locationDescription : STRING>>>
>>>> )
>>>> PARTITIONED BY ("jobs.values.postingDate.year" STRING,
>>>> "jobs.values.postingDate.month" STRING)
>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>> WITH SERDEPROPERTIES (
>>>> "company"="$.jobs.values.company.name",
>>>>  "position"="$.jobs.values.position.title",
>>>> "customerJobCode"="$.jobs.values.customerJobCode",
>>>> "locationDescription"="$.jobs.values.locationDescription",
>>>> "jobPoster"="$.jobs.values.jobposter.headline"
>>>> )
>>>> LOCATION '/user/sunita/Linkedin/JobSearch';
>>>>
>>>> I need to be able to partition this information. Please help.
>>>>
>>>> regards
>>>> Sunita
>>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB