|
Alex WANG
2010-09-21, 20:48
Alex Wang
2010-09-21, 20:50
hc busy
2010-09-21, 21:48
Alex Wang
2010-09-21, 21:53
hc busy
2010-09-21, 22:21
Thejas M Nair
2010-09-22, 22:53
|
-
Problem with Pig Store commandAlex WANG 2010-09-21, 20:48
Hi,
I am using pig 0.7.0 in hadoop mapreduce mode. The problem I have is that I simply can't use STORE INTO alias USING PigStorage(); I can load dataset in, write UDFs to manipulate the dataset, but I can't store it. The output is a directory in HDFS with 0 bytes. As an example, I've been testing with a simple script: W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, name:chararray, type:chararray); store W into 'wordtesting' using PigStorage(' '); I run the code in grunt, and the output of hadoop fs -ls is: drwxr-xr-x - awang supergroup 0 2010-09-21 13:45 /user/awang/wordtesting The grunt messages are: grunt> store filteredW into 'wordtesting' using PigStorage(' '); 2010-09-21 13:45:35,210 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for W 2010-09-21 13:45:35,210 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for W 2010-09-21 13:45:35,440 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) - 1-46 Operator Key: 1-46) 2010-09-21 13:45:35,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2010-09-21 13:45:35,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2010-09-21 13:45:35,549 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2010-09-21 13:45:38,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2010-09-21 13:45:38,166 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2010-09-21 13:45:38,173 [Thread-15] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-09-21 13:45:38,307 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2010-09-21 13:45:38,307 [Thread-15] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2010-09-21 13:45:38,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201009211320_0002 2010-09-21 13:45:38,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002 2010-09-21 13:45:38,673 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2010-09-21 13:45:48,755 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2010-09-21 13:45:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2010-09-21 13:45:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: "hdfs://pineal:9000/user/awang/wordtesting" 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 1 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 20 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0 2010-09-21 13:45:53,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0 2010-09-21 13:45:53,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! I've been struggling with this for a long time.... It works if I have a one bytearray in my tuple, but once I defined my schema, it no longer works. Anyone has any idea? Please help!! Thanks! Best regards, Alex
-
Problem with Pig Store commandAlex Wang 2010-09-21, 20:50
Hi,
I am using pig 0.7.0 in hadoop mapreduce mode. The problem I have is that I simply can't use STORE INTO alias USING PigStorage(); I can load dataset in, write UDFs to manipulate the dataset, but I can't store it. The output is a directory in HDFS with 0 bytes. As an example, I've been testing with a simple script: W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, name:chararray, type:chararray); store W into 'wordtesting' using PigStorage(' '); I run the code in grunt, and the output of hadoop fs -ls is: drwxr-xr-x - awang supergroup 0 2010-09-21 13:45 /user/awang/wordtesting The grunt messages are: grunt> store filteredW into 'wordtesting' using PigStorage(' '); 2010-09-21 13:45:35,210 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for W 2010-09-21 13:45:35,210 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for W 2010-09-21 13:45:35,440 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) - 1-46 Operator Key: 1-46) 2010-09-21 13:45:35,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2010-09-21 13:45:35,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2010-09-21 13:45:35,549 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2010-09-21 13:45:38,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2010-09-21 13:45:38,166 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2010-09-21 13:45:38,173 [Thread-15] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-09-21 13:45:38,307 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2010-09-21 13:45:38,307 [Thread-15] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2010-09-21 13:45:38,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201009211320_0002 2010-09-21 13:45:38,670 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002 2010-09-21 13:45:38,673 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2010-09-21 13:45:48,755 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2010-09-21 13:45:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2010-09-21 13:45:53,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: "hdfs://pineal:9000/user/awang/wordtesting" 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 1 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 20 2010-09-21 13:45:53,846 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0 2010-09-21 13:45:53,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0 2010-09-21 13:45:53,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! I've been struggling with this for a long timeā¦. It works if I have a one bytearray in my tuple, but once I defined my schema, it no longer works. Anyone has any idea? Please help!! Thanks! Best regards, Alex
-
Re: Problem with Pig Store commandhc busy 2010-09-21, 21:48
probly because load failed.
W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, name:chararray, type:chararray); T = group W all; U = foreach T generate COUNT(W); dump U; will probably say that the wordbag contained nothing. Debug the loading portion to fix this problem. On Tue, Sep 21, 2010 at 1:50 PM, Alex Wang <[EMAIL PROTECTED]> wrote: > Hi, > > > > I am using pig 0.7.0 in hadoop mapreduce mode. > > > > The problem I have is that I simply can't use > > > > STORE INTO alias USING PigStorage(); > > > > I can load dataset in, write UDFs to manipulate the dataset, but I can't > store it. The output is a directory in HDFS with 0 bytes. > > > > As an example, I've been testing with a simple script: > > > > W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, > name:chararray, > type:chararray); > > store W into 'wordtesting' using PigStorage(' '); > > > > I run the code in grunt, and the output of hadoop fs -ls is: > > > > drwxr-xr-x - awang supergroup 0 2010-09-21 13:45 > /user/awang/wordtesting > > > > The grunt messages are: > > > > grunt> store filteredW into 'wordtesting' using PigStorage(' '); > > 2010-09-21 13:45:35,210 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > - No column pruned for W > > 2010-09-21 13:45:35,210 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > - No map keys pruned for W > > 2010-09-21 13:45:35,440 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine > - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) - > 1-46 Operator Key: 1-46) > > 2010-09-21 13:45:35,498 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > > 2010-09-21 13:45:35,498 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > > 2010-09-21 13:45:35,549 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > > 2010-09-21 13:45:38,100 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > > 2010-09-21 13:45:38,166 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > > 2010-09-21 13:45:38,173 [Thread-15] WARN > org.apache.hadoop.mapred.JobClient > - Use GenericOptionsParser for parsing the arguments. Applications should > implement Tool for the same. > > 2010-09-21 13:45:38,307 [Thread-15] INFO > org.apache.hadoop.mapreduce.lib.input.FileInputFormat > - Total input paths to process : 1 > > 2010-09-21 13:45:38,307 [Thread-15] INFO > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil > - Total input paths to process : 1 > > 2010-09-21 13:45:38,670 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - HadoopJobId: job_201009211320_0002 > > 2010-09-21 13:45:38,670 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - More information at: > http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002 > > 2010-09-21 13:45:38,673 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > > 2010-09-21 13:45:48,755 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 50% complete > > 2010-09-21 13:45:53,835 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > > 2010-09-21 13:45:53,835 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Successfully stored result in: > "hdfs://pineal:9000/user/awang/wordtesting" > > 2010-09-21 13:45:53,846 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
-
Re: Problem with Pig Store commandAlex Wang 2010-09-21, 21:53
Hi hc,
Sorry that I didn't mention it. But load works ok. Here is a portion of the output of dump W (2162,4111,yellow,a) (4652,1317,yep,interjection) (157,60592,yes,interjection) (533,19459,yesterday,adv) (265,35058,yet,adv) (4040,1626,yield,n) (3339,2139,yield,v) Only the store command is not working... Alex On Tue, Sep 21, 2010 at 2:48 PM, hc busy <[EMAIL PROTECTED]> wrote: > probly because load failed. > > W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, > name:chararray, > type:chararray); > T = group W all; > U = foreach T generate COUNT(W); > dump U; > > will probably say that the wordbag contained nothing. Debug the loading > portion to fix this problem. > > > > > On Tue, Sep 21, 2010 at 1:50 PM, Alex Wang <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > > > I am using pig 0.7.0 in hadoop mapreduce mode. > > > > > > > > The problem I have is that I simply can't use > > > > > > > > STORE INTO alias USING PigStorage(); > > > > > > > > I can load dataset in, write UDFs to manipulate the dataset, but I can't > > store it. The output is a directory in HDFS with 0 bytes. > > > > > > > > As an example, I've been testing with a simple script: > > > > > > > > W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, > > name:chararray, > > type:chararray); > > > > store W into 'wordtesting' using PigStorage(' '); > > > > > > > > I run the code in grunt, and the output of hadoop fs -ls is: > > > > > > > > drwxr-xr-x - awang supergroup 0 2010-09-21 13:45 > > /user/awang/wordtesting > > > > > > > > The grunt messages are: > > > > > > > > grunt> store filteredW into 'wordtesting' using PigStorage(' '); > > > > 2010-09-21 13:45:35,210 [main] INFO > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > > - No column pruned for W > > > > 2010-09-21 13:45:35,210 [main] INFO > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > > - No map keys pruned for W > > > > 2010-09-21 13:45:35,440 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine > > - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) > - > > 1-46 Operator Key: 1-46) > > > > 2010-09-21 13:45:35,498 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size before optimization: 1 > > > > 2010-09-21 13:45:35,498 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size after optimization: 1 > > > > 2010-09-21 13:45:35,549 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - mapred.job.reduce.markreset.buffer.percent is not set, set to default > 0.3 > > > > 2010-09-21 13:45:38,100 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting up single store job > > > > 2010-09-21 13:45:38,166 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map-reduce job(s) waiting for submission. > > > > 2010-09-21 13:45:38,173 [Thread-15] WARN > > org.apache.hadoop.mapred.JobClient > > - Use GenericOptionsParser for parsing the arguments. Applications should > > implement Tool for the same. > > > > 2010-09-21 13:45:38,307 [Thread-15] INFO > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat > > - Total input paths to process : 1 > > > > 2010-09-21 13:45:38,307 [Thread-15] INFO > > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil > > - Total input paths to process : 1 > > > > 2010-09-21 13:45:38,670 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - HadoopJobId: job_201009211320_0002 > > > > 2010-09-21 13:45:38,670 [main] INFO > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - More information at: > > http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002 > > > > 2010-09-21 13:45:38,673 [main] INFO
-
Re: Problem with Pig Store commandhc busy 2010-09-21, 22:21
I'm not sure then. Maybe ask other ppl for suggestions.
The fact that the output is not absolute seem suspicious, also try using ',' instead of space, did u try store W into '*/tmp/*wordtesting' using PigStorage(','); and see if that does the trick? err, let's see... maybe you're looking at the wrong hadoop cluster? did you try within the same grunt where you do the above store, do ls /tmp/wordtesting and see if that results in something, if so, your hadoop and pig are pointing to different hadoop clusters. imo. On Tue, Sep 21, 2010 at 2:53 PM, Alex Wang <[EMAIL PROTECTED]> wrote: > Hi hc, > > Sorry that I didn't mention it. But load works ok. Here is a portion of the > output of dump W > > (2162,4111,yellow,a) > (4652,1317,yep,interjection) > (157,60592,yes,interjection) > (533,19459,yesterday,adv) > (265,35058,yet,adv) > (4040,1626,yield,n) > (3339,2139,yield,v) > > Only the store command is not working... > > Alex > > > On Tue, Sep 21, 2010 at 2:48 PM, hc busy <[EMAIL PROTECTED]> wrote: > > > probly because load failed. > > > > W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, > > name:chararray, > > type:chararray); > > T = group W all; > > U = foreach T generate COUNT(W); > > dump U; > > > > will probably say that the wordbag contained nothing. Debug the loading > > portion to fix this problem. > > > > > > > > > > On Tue, Sep 21, 2010 at 1:50 PM, Alex Wang <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > > > > > > > I am using pig 0.7.0 in hadoop mapreduce mode. > > > > > > > > > > > > The problem I have is that I simply can't use > > > > > > > > > > > > STORE INTO alias USING PigStorage(); > > > > > > > > > > > > I can load dataset in, write UDFs to manipulate the dataset, but I > can't > > > store it. The output is a directory in HDFS with 0 bytes. > > > > > > > > > > > > As an example, I've been testing with a simple script: > > > > > > > > > > > > W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, > > > name:chararray, > > > type:chararray); > > > > > > store W into 'wordtesting' using PigStorage(' '); > > > > > > > > > > > > I run the code in grunt, and the output of hadoop fs -ls is: > > > > > > > > > > > > drwxr-xr-x - awang supergroup 0 2010-09-21 13:45 > > > /user/awang/wordtesting > > > > > > > > > > > > The grunt messages are: > > > > > > > > > > > > grunt> store filteredW into 'wordtesting' using PigStorage(' '); > > > > > > 2010-09-21 13:45:35,210 [main] INFO > > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > > > - No column pruned for W > > > > > > 2010-09-21 13:45:35,210 [main] INFO > > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns > > > - No map keys pruned for W > > > > > > 2010-09-21 13:45:35,440 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine > > > - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' > ')) > > - > > > 1-46 Operator Key: 1-46) > > > > > > 2010-09-21 13:45:35,498 [main] INFO > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > > - MR plan size before optimization: 1 > > > > > > 2010-09-21 13:45:35,498 [main] INFO > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > > - MR plan size after optimization: 1 > > > > > > 2010-09-21 13:45:35,549 [main] INFO > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > > - mapred.job.reduce.markreset.buffer.percent is not set, set to default > > 0.3 > > > > > > 2010-09-21 13:45:38,100 [main] INFO > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > > - Setting up single store job > > > > > > 2010-09-21 13:45:38,166 [main] INFO > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > - 1 map-reduce job(s) waiting for submission. > > > > > > 2010-09-21 13:45:38,173 [Thread-15] WARN > > > org.apache.hadoop.mapred.JobClient
-
Re: Problem with Pig Store commandThejas M Nair 2010-09-22, 22:53
On 9/21/10 1:50 PM, "Alex Wang" <[EMAIL PROTECTED]> wrote: > It works if I have a one > bytearray in my tuple, but once I defined my schema, it no longer works. > > I didn't understand what you meant here. Can you give an example of query where the store works ? That might help in understanding the cause for what you are seeing. -Thejas |