Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - accessing the schema within a LoadFunc


+
Costin Leau 2013-12-19, 14:08
+
Costin Leau 2013-12-19, 14:09
+
Cheolsoo Park 2013-12-23, 23:05
+
Costin Leau 2013-12-24, 09:41
Copy link to this message
-
Re: accessing the schema within a LoadFunc
Cheolsoo Park 2013-12-29, 04:40
Like Alan said in the thread that you're referring to, user-defined schema
in the as-clause is not available within a LoadFunc. HBaseStorage is
different since its schema is passed via a constructor parameter. As far as
I know, most popular Pig storages do not require users to define schema in
a load statement. For example, HCatLoader gets it from Hive metastore,
AvroStorage get it from Avro file, etc.

But it shouldn't be hard to change this, and contribution is welcome! Feel
free to file a jira. Thanks!

On Tue, Dec 24, 2013 at 1:41 AM, Costin Leau <[EMAIL PROTECTED]> wrote:

> Thanks for the pointers regarding 1).
>
> Any ideas on 2) - namely why only the deferenced schema is available and
> how to get a hold of the actual user declaration?
>
> Cheers and Merry Christmas!
>
>
> On 24/12/2013 1:05 AM, Cheolsoo Park wrote:
>
>> As for #1, pushdownProject() is called only if it's applicable-
>> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/
>> newplan/optimizer/PlanOptimizer.java#L108
>>
>> Set a breakpoint in ColumnMapKeyPrune.java and see whether check() returns
>> true or false-
>> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/
>> newplan/logical/rules/ColumnMapKeyPrune.java#L85
>>
>> It probably returns false in your case, and that's why your
>> pushProjection() is never called.
>>
>>
>> On Thu, Dec 19, 2013 at 6:09 AM, Costin Leau <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Forgot to specify the aforementioned thread [1]
>>>
>>> [1] http://www.mail-archive.com/[EMAIL PROTECTED]/msg06285.html
>>>
>>>
>>> On 19/12/2013 4:08 PM, Costin Leau wrote:
>>>
>>>  Hi,
>>>>
>>>> I'm trying to get a hold of the schema specified for a loader through
>>>> 'AS' using Apache Pig 0.12 :
>>>>
>>>> A = LOAD 'pig/tupleartists' USING MyStorage() AS (name: chararray,
>>>> links:
>>>> (url:chararray, picture:chararray));
>>>> B = FOREACH A GENERATE name, links.url;
>>>> DUMP B;
>>>>
>>>> 1.
>>>> My loader implements LoadPushDown#pushProjection() which does not seem
>>>> to
>>>> be called at all (tried breakpoints,
>>>> System.out - nothing). The API docs and this thread [1] suggest it
>>>> should
>>>> be call yet in my tests (using a local
>>>> PigServer) this does not happen. Am I missing something?
>>>>
>>>> 2.
>>>> As an alternative, I'm loading the POStore objects (from  pig.map.store
>>>> and pig.reduce.store) but the schema that I'm
>>>> getting is incorrect, namely:
>>>> "(name: chararray, url: charray)" without any mention of the "links"
>>>> field. Is there any way to recreate/retrieve the
>>>> actual schema defined by the user or at least determine which fields are
>>>> nested ("links.url") as oppose to the top level
>>>> ones ("name")?
>>>>
>>>> Thanks,
>>>>
>>>>
>>> --
>>> Costin
>>>
>>>
>>
> --
> Costin
>
+
Costin Leau 2013-12-29, 13:31