Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 485 (0.162s).
Loading phrases to help you
refine your search...
Re: paging through an RDD that's too large to collect() all at once - Spark - [mail # user]
...Hey Dave, try out RDD.toLocalIterator -- it gives you an iterator that reads one RDD partition at a time. Scala iterators also have methods like grouped() that let you get fixed-size groups....
   Author: Matei Zaharia, 2014-09-19, 03:26
Re: Short Circuit Local Reads - Spark - [mail # user]
...I'm pretty sure it does help, though I don't have any numbers for it. In any case, Spark will automatically benefit from this if you link it to a version of HDFS that contains this.MateiOn S...
   Author: Matei Zaharia, 2014-09-17, 18:19
Re: Spark as a Library - Spark - [mail # user]
...If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a clust...
   Author: Matei Zaharia, 2014-09-16, 17:31
Re: Complexity/Efficiency of SortByKey - Spark - [mail # user]
...sortByKey is indeed O(n log n), it's a first pass to figure out even-sized partitions (by sampling the RDD), then a second pass to do a distributed merge-sort (first partition the data on ea...
   Author: Matei Zaharia, 2014-09-16, 05:56
[expand - 1 more] - Re: NullWritable not serializable - Spark - [mail # dev]
...Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the ...
   Author: Matei Zaharia, 2014-09-16, 05:52
Re: Does Spark always wait for stragglers to finish running? - Spark - [mail # user]
...It's true that it does not send a kill command right now -- we should probably add that. This code was written before tasks were killable AFAIK. However, the *job* should still finish while ...
   Author: Matei Zaharia, 2014-09-16, 01:10
[expand - 1 more] - Re: scala 2.11? - Spark - [mail # user]
...I think the current plan is to put it in 1.2.0, so that's what I meant by "soon". It might be possible to backport it too, but I'd be hesitant to do that as a maintenance release on 1.1.x an...
   Author: Matei Zaharia, 2014-09-16, 00:20
Re: compiling spark source code - Spark - [mail # user]
...I've seen the "file name too long" error when compiling on an encrypted Linux file system -- some of them have a limit on file name lengths. If you're on Linux, can you try compiling inside ...
   Author: Matei Zaharia, 2014-09-14, 19:49
Re: Announcing Spark 1.1.0! - Spark - [mail # user]
...Thanks to everyone who contributed to implementing and testing this release!MateiOn September 11, 2014 at 11:52:43 PM, Tim Smith ([EMAIL PROTECTED]) wrote:Thanks for all the good work. Very ...
   Author: Matei Zaharia, 2014-09-12, 04:09
[expand - 4 more] - Re: Mapping Hadoop Reduce to Spark - Spark - [mail # user]
...BTW you can also use rdd.partitions() to get a list of Partition objects and see how many there are.On September 4, 2014 at 5:18:30 PM, Matei Zaharia ([EMAIL PROTECTED]) wrote:Partitioners a...
   Author: Matei Zaharia, 2014-09-05, 00:21
Spark (413)
Hadoop (37)
MapReduce (34)
Pig (1)
issue (283)
mail # user (144)
mail # dev (50)
mail # general (7)
wiki (1)
last 7 days (8)
last 30 days (39)
last 90 days (135)
last 6 months (264)
last 9 months (485)
Ted Yu (1645)
Harsh J (1293)
Jun Rao (1030)
Todd Lipcon (1002)
Stack (974)
Jonathan Ellis (842)
Andrew Purtell (796)
Jean-Daniel Cryans (753)
jacques@... (738)
stack (716)
Yusaku Sako (708)
Jarek Jarcec Cecho (699)
Eric Newton (696)
Jonathan Hsieh (675)
Brock Noland (656)
Roman Shaposhnik (656)
Namit Jain (649)
Neha Narkhede (647)
Hitesh Shah (626)
Owen O'Malley (625)
Steve Loughran (616)
Siddharth Seth (614)
Josh Elser (563)
Eli Collins (545)
Arun C Murthy (543)
Matei Zaharia