Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 41 to 50 from 64 (0.138s).
Loading phrases to help you
refine your search...
Re: Project the last field of a tuple - Pig - [mail # user]
...Hi Fabian,  I don't know whether there is a built-in feature for this, but here is the idea: try to load the whole line as one field (ignoring the delimiter at this step) and then try t...
   Author: Ruslan Al-Fakikh, 2012-08-23, 14:09
Re: Loading data from a SQL database? - Pig - [mail # user]
...Hi,  In my project, we had to develop our own Loader for that purpose.  Thanks  On Fri, Aug 10, 2012 at 2:28 PM, Vincent Barat  wr ote: . op ).    Best Regards,...
   Author: Ruslan Al-Fakikh, 2012-08-10, 13:39
Re: how can I delete a file in pig only after checking if the file exists? - Pig - [mail # user]
...hi Sheng,  Try something like sh bash -c 'if hadoop fs -test -e $LOOKUP_HDFS_TEMP; then echo Deleting old local file lookup; hadoop fs -rm $LOOKUP_HDFS_TEMP; else echo Local file lookup...
   Author: Ruslan Al-Fakikh, 2012-07-23, 12:50
Re: Best Practice: store depending on data content - Pig - [mail # user]
...That is a very interesting offtopic:) I think I will reinvestigate HCatalog some day and come up with specific questions.  Thanks a lot for explaining  On Wed, Jul 4, 2012 at 4:37 ...
   Author: Ruslan Al-Fakikh, 2012-07-05, 15:01
Re: Using average function is really slow - Pig - [mail # user]
...Hi James,  AVG is Algebraic which means that it will use combiner when it can. It seems that your job is not using combiner. Can you give the full script? Also check the job config of t...
   Author: Ruslan Al-Fakikh, 2012-07-04, 21:05
Re: Does pig support in clause? - Pig - [mail # user]
...Hi Johannes,  Try this C = LOAD 'in.dat' AS (A1); A = LOAD 'in2.dat' AS (A1);  joined = JOIN A BY A1 LEFT OUTER, C BY A1;  DESCRIBE joined;  newEntries = FILTER joined BY...
   Author: Ruslan Al-Fakikh, 2012-07-04, 13:53
Re: What is the best way to do counting in pig? - Pig - [mail # user]
...Hi,  As it was said, COUNT is algebraic and should be fast, because it forces combiner. You should make sure that combiner is really used here. It can be disabled in some situations. I'...
   Author: Ruslan Al-Fakikh, 2012-07-03, 10:03
Re: Best Practice: store depending on data content - Pig - [mail # user]
...Dmirtiy,  In our organization we use file paths for this purpose like this: /incoming/datasetA /incoming/datasetB /reports/datasetC etc  On Mon, Jul 2, 2012 at 9:37 PM, Dmitriy Rya...
   Author: Ruslan Al-Fakikh, 2012-07-03, 09:56
Re: Best Practice: store depending on data content - Pig - [mail # user]
...Hey Alan,  I am not familiar with Apache processes, so I could be wrong in my point 1, I am sorry. Basically my impressions was that Cloudera is pushing Avro format for intercommunicati...
   Author: Ruslan Al-Fakikh, 2012-07-02, 12:57
Re: suggestion - Pig - [mail # user]
...Hey Yang,  For debugging you may want the local mode, try pig -x local  Also there are some useful commands like, DESCRIBE, ILLUSTRATE  Ruslan  On Fri, Jun 29, 2012 at 7:...
   Author: Ruslan Al-Fakikh, 2012-06-29, 12:02
Sort:
project
Pig (64)
Hive (19)
Sqoop (9)
MapReduce (6)
Avro (4)
Hadoop (3)
type
mail # user (64)
date
last 7 days (0)
last 30 days (3)
last 90 days (22)
last 6 months (26)
last 9 months (64)
author
Dmitriy Ryaboy (1352)
Alan Gates (955)
Jonathan Coveney (731)
Daniel Dai (543)
Russell Jurney (485)
Olga Natkovich (453)
Prashant Kommireddi (367)
Bill Graham (334)
Cheolsoo Park (246)
Mridul Muralidharan (201)
Thejas Nair (195)
Ashutosh Chauhan (169)
Julien Le Dem (154)
Jeff Zhang (146)
Santhosh Srinivasan (142)
Ruslan Al-Fakikh