Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 488 (0.251s).
Loading phrases to help you
refine your search...
Re: Spark Code to read RCFiles - Spark - [mail # user]
...Is your file managed by Hive (and thus present in a Hive metastore)? In that case, Spark SQL (https://spark.apache.org/docs/latest/sql-programming-guide.html) is the easiest way.MateiOn Sept...
   Author: Matei Zaharia, 2014-09-24, 00:24
Re: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset? - Spark - [mail # user]
...File takes a filename to write to, while Dataset takes only a JobConf. This means that Dataset is more general (it can also save to storage systems that are not file systems, such as key-val...
   Author: Matei Zaharia, 2014-09-22, 08:12
[expand - 1 more] - Re: A couple questions about shared variables - Spark - [mail # dev]
...Hmm, good point, this seems to have been broken by refactorings of the scheduler, but it worked in the past. Basically the solution is simple -- in a result stage, we should not apply the up...
   Author: Matei Zaharia, 2014-09-21, 22:36
Re: paging through an RDD that's too large to collect() all at once - Spark - [mail # user]
...Hey Dave, try out RDD.toLocalIterator -- it gives you an iterator that reads one RDD partition at a time. Scala iterators also have methods like grouped() that let you get fixed-size groups....
   Author: Matei Zaharia, 2014-09-19, 03:26
Re: Short Circuit Local Reads - Spark - [mail # user]
...I'm pretty sure it does help, though I don't have any numbers for it. In any case, Spark will automatically benefit from this if you link it to a version of HDFS that contains this.MateiOn S...
   Author: Matei Zaharia, 2014-09-17, 18:19
Re: Spark as a Library - Spark - [mail # user]
...If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a clust...
   Author: Matei Zaharia, 2014-09-16, 17:31
Re: Complexity/Efficiency of SortByKey - Spark - [mail # user]
...sortByKey is indeed O(n log n), it's a first pass to figure out even-sized partitions (by sampling the RDD), then a second pass to do a distributed merge-sort (first partition the data on ea...
   Author: Matei Zaharia, 2014-09-16, 05:56
[expand - 1 more] - Re: NullWritable not serializable - Spark - [mail # dev]
...Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the ...
   Author: Matei Zaharia, 2014-09-16, 05:52
Re: Does Spark always wait for stragglers to finish running? - Spark - [mail # user]
...It's true that it does not send a kill command right now -- we should probably add that. This code was written before tasks were killable AFAIK. However, the *job* should still finish while ...
   Author: Matei Zaharia, 2014-09-16, 01:10
[expand - 1 more] - Re: scala 2.11? - Spark - [mail # user]
...I think the current plan is to put it in 1.2.0, so that's what I meant by "soon". It might be possible to backport it too, but I'd be hesitant to do that as a maintenance release on 1.1.x an...
   Author: Matei Zaharia, 2014-09-16, 00:20
Spark (416)
Hadoop (37)
MapReduce (34)
Pig (1)
issue (283)
mail # user (146)
mail # dev (51)
mail # general (7)
wiki (1)
last 7 days (1)
last 30 days (20)
last 90 days (125)
last 6 months (266)
last 9 months (488)
Ted Yu (1649)
Harsh J (1291)
Jun Rao (1036)
Todd Lipcon (1001)
Stack (973)
Jonathan Ellis (842)
Andrew Purtell (799)
Jean-Daniel Cryans (753)
jacques@... (738)
Yusaku Sako (718)
stack (716)
Jarek Jarcec Cecho (699)
Eric Newton (697)
Jonathan Hsieh (674)
Roman Shaposhnik (660)
Brock Noland (656)
Neha Narkhede (652)
Namit Jain (649)
Hitesh Shah (626)
Owen O'Malley (625)
Steve Loughran (615)
Siddharth Seth (614)
Josh Elser (570)
Eli Collins (545)
Arun C Murthy (543)
Matei Zaharia