Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Read Hive LazySimpleSerde with Pig


Copy link to this message
-
Re: Read Hive LazySimpleSerde with Pig
Solved the issue with a Jython UDF.

REGISTER 'lazysimpleserde.py' USING jython AS myfuncs;
A = LOAD '000000_0' using PigStorage('\\u001') AS (params:chararray);
B = FOREACH pixels GENERATE myfuncs.extractMap(params);

@outputSchema("params:map[]")
def extractMap(lazy_map):
    extracted = {}
    entries = lazy_map.split('\x02')

    for entry in entries:
        split_entry = entry.split('\x03')

        if len(split_entry) == 2:
            extracted[split_entry[0]] = split_entry[1]

    return extracted
On Tue, Mar 12, 2013 at 4:35 PM, Shawn Hermans <[EMAIL PROTECTED]>wrote:

> It uses ^A for record separator.  That would be easy enough as I could
> just use PigStorage("\001") to pull in the records.  The only issue is how
> to extract maps.  It uses ^C to separate entires within the map and ^B to
> separate key/value pairs in the map.  It wouldn't be too difficult to write
> a UDF to parse the map entries, I was just wondering if there was a
> built-in way of doing that.
>
> Thanks,
> Shawn
>
>
> On Tue, Mar 12, 2013 at 2:53 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote:
>
>> How does LazySimpleSerde store data?
>>
>>
>> On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans <[EMAIL PROTECTED]
>> >wrote:
>>
>> > All,
>> > Is there an easy way to read Hive LazySimpleSerde encoded files in Pig?
>>  I
>> > did some research and found support for Hive's columnar format and for
>> > SequenceFiles, but did not see anything for LazySimpleSerde.
>> >
>> > Thanks,
>> > Shawn
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB