Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Tuples in UDF and null


Copy link to this message
-
Re: Tuples in UDF and null
So I did this. I took your example and put it in a file and ran some pig
commands through grunt but I am getting same results from a bag and
generating from tuple. I might be doing something wrong here.

grunt> A = LOAD '/user/apuser/test/data1' AS b:bag{t:tuple(a:chararray,
b:chararray)};
grunt> dump A;
2013-03-07 14:55:25,125 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
({(1,)})
({(3,)})
({(5,10)})
({(7,)})

grunt> b = foreach A generate b;
grunt> dump b;
2013-03-07 14:57:59,509 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
({(1,)})
({(3,)})
({(5,10)})
({(7,)})
grunt>

I get the same output again.
On Thu, Mar 7, 2013 at 11:40 AM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> good suggestion. Let me try that
>
>
> On Thu, Mar 7, 2013 at 11:27 AM, Harsha <[EMAIL PROTECTED]> wrote:
>
>> It will be easier if you have some sample data and run it through grunt
>> shell.
>> Lets say you have a dataset like this
>> ({(1,)})
>> ({(3,)})
>> ({(5,10)})
>> ({(7,)})
>>
>> some of them are nulls in your "b" and some rows has values for "b"
>> and if you do a "generate" for above it will run through each row
>> and try to fetch values for b if there is none it will do ()
>> something like this
>>
>> ({()})
>> ({()})
>> ({(10)})
>> ({()})
>>
>>
>>
>>
>> --
>> Harsha
>>
>>
>> On Thursday, March 7, 2013 at 11:15 AM, Mohit Anchlia wrote:
>>
>> > sorry, yes my question was about accessing b not $1. What's the effect
>> of
>> > writing empty() to a file. Say if I did store b into temp then should I
>> > expect a line or nothing gets writen at all in the file.
>> >
>> > On Thu, Mar 7, 2013 at 10:53 AM, Harsha <[EMAIL PROTECTED] (mailto:
>> [EMAIL PROTECTED])> wrote:
>> >
>> > > from your schema b:bag{t:tuple(a:chararray, b:chararray)}
>> > > your tuple is inside a bag so on the next line if you are trying to
>> access
>> > > through $1 pig will
>> > > throw up an error saying non-existent column.
>> > > but if your question is about accessing b than it will print empty ()
>> if
>> > > the there is no value present (as you are setting it as null).
>> > >
>> > > --
>> > > Harsha
>> > >
>> > >
>> > > On Thursday, March 7, 2013 at 10:35 AM, Mohit Anchlia wrote:
>> > >
>> > > > Thanks! Does "generate" skip over that? if I did b = for B generate
>> $1
>> > > what
>> > > > should be expected outcome of alias "b"
>> > > >
>>  > > > On Thu, Mar 7, 2013 at 10:31 AM, Harsha <[EMAIL PROTECTED] (mailto:
>> [EMAIL PROTECTED]) (mailto:
>> > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
>> > > >
>> > > > > Hi Mohit,
>> > > > > it won't convert into string literal 'NULL' since its a tuple
>> > > > > you'll see results like
>> > > > > ('Hello',)
>> > > > >
>> > > > > --
>> > > > > Harsha
>> > > > >
>> > > > >
>> > > > > On Thursday, March 7, 2013 at 10:10 AM, Mohit Anchlia wrote:
>> > > > >
>> > > > > > Any help would be appreciated. I'll also write something
>> shortly and
>> > > see
>> > > > > > what happens.
>> > > > > >
>> > > > > > On Wed, Mar 6, 2013 at 4:58 PM, Mohit Anchlia <
>> > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])(mailto:
>> > > > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))>wrote:
>> > > >
>> > >
>> > > > > >
>> > > > > > > If I define and set tuple like this:
>> > > > > > >
>> > > > > > > Tuple t1 = mTupleFactory.newTuple(2);
>> > > > > > > t1.set(0, "Hello");
>> > > > > > > t1.set(1, NULL);
>> > > > > > >
>> > > > > > > and have schema like:
>> > > > > > >
>> > > > > > > b:bag{t:tuple(a:chararray, b:chararray)
>> > > > > > >
>> > > > > > > and then in the pig script if I do:
>> > > > > > >
>> > > > > > > page = foreach B generate b;
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > What should be expected outcome? Would "generate" convert
>> NULL into
>> > > > > > > literal 'NULL' as a string? Or does it skip over that NULL.
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >