Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Read Hive LazySimpleSerde with Pig


+
Shawn Hermans 2013-03-12, 18:17
+
Dmitriy Ryaboy 2013-03-12, 21:53
+
Shawn Hermans 2013-03-12, 23:35
Copy link to this message
-
Re: Read Hive LazySimpleSerde with Pig
Shawn Hermans 2013-03-13, 15:39
Solved the issue with a Jython UDF.

REGISTER 'lazysimpleserde.py' USING jython AS myfuncs;
A = LOAD '000000_0' using PigStorage('\\u001') AS (params:chararray);
B = FOREACH pixels GENERATE myfuncs.extractMap(params);

@outputSchema("params:map[]")
def extractMap(lazy_map):
    extracted = {}
    entries = lazy_map.split('\x02')

    for entry in entries:
        split_entry = entry.split('\x03')

        if len(split_entry) == 2:
            extracted[split_entry[0]] = split_entry[1]

    return extracted
On Tue, Mar 12, 2013 at 4:35 PM, Shawn Hermans <[EMAIL PROTECTED]>wrote:

> It uses ^A for record separator.  That would be easy enough as I could
> just use PigStorage("\001") to pull in the records.  The only issue is how
> to extract maps.  It uses ^C to separate entires within the map and ^B to
> separate key/value pairs in the map.  It wouldn't be too difficult to write
> a UDF to parse the map entries, I was just wondering if there was a
> built-in way of doing that.
>
> Thanks,
> Shawn
>
>
> On Tue, Mar 12, 2013 at 2:53 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote:
>
>> How does LazySimpleSerde store data?
>>
>>
>> On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans <[EMAIL PROTECTED]
>> >wrote:
>>
>> > All,
>> > Is there an easy way to read Hive LazySimpleSerde encoded files in Pig?
>>  I
>> > did some research and found support for Hive's columnar format and for
>> > SequenceFiles, but did not see anything for LazySimpleSerde.
>> >
>> > Thanks,
>> > Shawn
>> >
>>
>
>