Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: PIG script - PIGStorage


Copy link to this message
-
Re: PIG script - PIGStorage
The default loader can't handle this. You  would need a custom InputFormat,
which isn't too bad.
2012/12/9 L N <[EMAIL PROTECTED]>

> Hi,
>
>
>
> > I have an unstructured file format. Assume below is the data in a file
> >
> > <x1,value1 ><y <x2,values> <x3,value3> > <x4, value 4> <x5, value5>
> >
>      abxcd xyxc
>
> > <x6, value6>
> > <x7,value7>
> >
> > I need to process the data in between < > only and neglect other
> characters
>
> >
> > How do i write a pig script like below for loading the data in above file
> > log =LOAD 'input'  USING PigStorage(' ')
> >
> > What would be the delimiter here that needs to be used for PigStorage and
> > how should i specify variable names. What else I need to take care
> >
> >
> > Thanks
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB