Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Loading Files with Comment Lines


Copy link to this message
-
RE: Loading Files with Comment Lines
I do that kind of streaming on hdfs files using Hadoop streaming, outside of pig. I assume you could do it from inside pig too, but haven’t tested.

 

William F Dowling

Sr Technical Specialist, Software Engineering

Thomson Reuters

0 +1 215 823 3853

 

From: Moore, Michael A. [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 07, 2011 3:14 PM
To: [EMAIL PROTECTED]
Subject: Re: Loading Files with Comment Lines

 

Possibly.  Can I do that if the file is already in HDFS?

______________________________________

Michael Moore :: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>

The Johns Hopkins University Applied Physics Laboratory

0B7B17EE1AE2A80B pgp

BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint

 

 

On Jun 7, 2011, at 3:12 PM, <[EMAIL PROTECTED]> wrote:

Can you stream it through

 grep -v ‘^#’

?

William F Dowling

Sr Technical Specialist, Software Engineering

Thomson Reuters

0 +1 215 823 3853

From: Moore, Michael A. [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 07, 2011 3:04 PM
To: [EMAIL PROTECTED]
Subject: Loading Files with Comment Lines

Hello all-

I've got a quick question and Google isn't proving to be much help.

I've got a big file, that has a few lines in it prefaced with a pound sign (#) to indicate they are to be ignored.  I would like to LOAD this file using PigStorage.  Is there a way to do this, or is it handled automatically?

The data might look something like this:

# Data Source: Project A

# Contact MMoore with Questions

# SenderId      RecipientId

1          2

3          5

6          7

#2        1

3          6

11        7

Thanks!

-Michael

______________________________________

Michael Moore :: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>

The Johns Hopkins University Applied Physics Laboratory

0B7B17EE1AE2A80B pgp

BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
 

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB