Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Reading from local and writing to HDFS?


Copy link to this message
-
Re: Reading from local and writing to HDFS?
Your pretty much stuck to options 1 and 2, with option 1 being the accepted
solution. The whole idea of MapReduce is that you're not able to use a
single machine to compute your answers. You can put an 'fs -put' command in
your script that can stage the output on HDFS first before running your
script in MR mode.

Local mode is mainly there for testing purposes. Not for production use.
On Thu, Nov 7, 2013 at 5:47 AM, Carl-Daniel Hailfinger <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm processing squid log files with Pig courtesy of MyRegexLoader. After
> a first processing step (saving with PigStorage) there's quite a lot of
> data processing to do.
>
> There's a catch, though. A superfluous copy operation:
> 1. variant: Copy the original Squid logs manually to HDFS with "hdfs dfs
> -copyFromLocal", then read them in Pig (distributed mode) from HDFS with
> MyRegexLoader, then store them in HDFS with PigStorage.
> 2. variant: Read the original Logs from local filesystem in Pig (local
> mode) with MyRegexLoader, store the on the local filesystem with
> PigStorage, then copy the result to HDFS with "hdfs dfs -copyFromLocal".
>
> Is there a way to have Pig read files from local fs, but store the
> result in HDFS? Given that reading files from local fs can't be done in
> distributed mode, I'd be totally happy to have that operation only run
> on the local node as long as the stored file is accessible via HDFS
> afterwards.
> I tried various ways to specify file locations as hdfs:// and file://,
> but that didn't work out. AFAICS the documentation is pretty silent on
> this.
>
> Any ideas or hints about what to do?
>
> Regards,
> Carl-Daniel
> --
> http://www.hailfinger.org/
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB