Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 146 (0.187s).
Loading phrases to help you
refine your search...
Re: How to disable input split - Spark - [mail # user]
...I am not sure if this works, but SparkContext seems to have hadoopFile andhadoopRDD methods which can accept Hadoop Input formats. If you have anInputFormat with isSplittable false, maybe it...
   Author: Sonal Goyal, 2014-10-18, 10:17
Re: Optimizing pairwise similarity computation or how to avoid RDD.cartesian operation ? - Spark - [mail # user]
...Cartesian joins of large datasets are usually going to be slow. If thereis a way you can reduce the problem space to make sure you only joinsubsets with each other, that may be helpful. Mayb...
   Author: Sonal Goyal, 2014-10-17, 12:03
Re: key class requirement for PairedRDD ? - Spark - [mail # user]
...We use our custom classes which are Serializable and have well definedhashcode and equals methods through the Java API. Whats the issue you aregetting?Best Regards,SonalNube Technologies On ...
   Author: Sonal Goyal, 2014-10-17, 07:04
Re: Join with large data set - Spark - [mail # user]
...Hi Ankur,If your rdds have common keys, you can look at partitioning both yourdatasets using a custom partitioner based on keys so that you can avoidshuffling and optimize join performance.H...
   Author: Sonal Goyal, 2014-10-17, 06:06
Re: Dedup - Spark - [mail # user]
...What is your data like? Are you looking at exact matching or are youinterested in nearly same records? Do you need to merge similar records toget a canonical value?Best Regards,SonalNube Tec...
   Author: Sonal Goyal, 2014-10-09, 05:00
Re: Running time is significantly unbalanced - Spark - [mail # user]
...Is your data balanced across each partition or do some keys have far morerecords than others?Best Regards,SonalNube Technologies On Tue, Aug 19, 2014 at 8:00 PM, Bin  wrote:  ...
   Author: Sonal Goyal, 2014-08-21, 04:48
[expand - 1 more] - Re: Running GraphX through Java - Spark - [mail # user]
...Hi All,Sorry reposting this again in the hope to get some clues.Best Regards,SonalNube Technologies On Wed, Aug 13, 2014 at 3:53 PM, Sonal Goyal  wrote: ...
   Author: Sonal Goyal, 2014-08-14, 05:55
Re: Unit Testing (JUnit) with Spark - Spark - [mail # user]
...You can take a look athttps://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/JavaAPISuite.javaand model your junits based on it.Best Regards,SonalNube Technologies O...
   Author: Sonal Goyal, 2014-07-29, 16:57
Re: Simple record matching using Spark SQL - Spark - [mail # user]
...Hi Sarath,Are you explicitly stopping the context?sc.stop()Best Regards,SonalNube Technologies On Thu, Jul 17, 2014 at 12:51 PM, Sarath Chandra <[EMAIL PROTECTED]> wrote: ...
   Author: Sonal Goyal, 2014-07-17, 07:36
Re: Can Spark stack scale to petabyte scale without performance degradation? - Spark - [mail # user]
...Hi Rohit,I think the 3rd question on the FAQ may help you.https://spark.apache.org/faq.htmlSome other links that talk about building bigger clusters and processingmore data:http://spark-summ...
   Author: Sonal Goyal, 2014-07-16, 04:17
Sort:
project
Hadoop (39)
HBase (38)
MapReduce (28)
Hive (25)
Spark (13)
HDFS (2)
Pig (1)
type
mail # user (138)
mail # dev (4)
mail # general (4)
date
last 7 days (4)
last 30 days (5)
last 90 days (8)
last 6 months (13)
last 9 months (146)
author
Ted Yu (1698)
Harsh J (1295)
Jun Rao (1062)
Todd Lipcon (1001)
Stack (977)
Jonathan Ellis (844)
Andrew Purtell (825)
Jean-Daniel Cryans (754)
jacques@... (738)
Yusaku Sako (731)
stack (718)
Jarek Jarcec Cecho (703)
Eric Newton (698)
Jonathan Hsieh (675)
Brock Noland (667)
Roman Shaposhnik (665)
Neha Narkhede (663)
Namit Jain (649)
Hitesh Shah (627)
Owen O'Malley (625)
Steve Loughran (622)
Siddharth Seth (614)
Josh Elser (592)
Eli Collins (545)
Arun C Murthy (543)
Sonal Goyal