Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 30 (0.25s).
Loading phrases to help you
refine your search...
Re: Rdd of Rdds - Spark - [mail # user]
...On Wednesday, October 22, 2014 9:06 AM, Sean Owen  wrote:Depending on one's needs, one could also consider the matrix (RDD[Vector]) operations provided by MLLib, such ashttps://spark.ap...
   Author: Michael Malak, 2014-10-22, 21:38
Re: UpdateStateByKey - How to improve performance? - Spark - [mail # user]
...Depending on the density of your keys, the alternative signaturedef updateStateByKey[S](updateFunc: (Iterator[(K, Seq[V], Option[S])]) ? Iterator[(K, S)], partitioner: Partitioner, rememberP...
   Author: Michael Malak, 2014-08-06, 21:07
Re: relationship of RDD[Array[String]] to Array[Array[String]] - Spark - [mail # user]
...It's really more of a Scala question than a Spark question, but the standard OO (not Scala-specific) way is to create your own custom supertype (e.g. MyCollectionTrait), inherited/implemente...
   Author: Michael Malak, 2014-07-21, 17:21
15 new MLlib algorithms - Spark - [mail # dev]
...At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would "roughly double" in 1.1 from the current approx. 15.http://spark-summit.org/wp-content/uploads/2014/07/Future-...
   Author: Michael Malak, 2014-07-09, 18:43
Re: parallel Reduce within a key - Spark - [mail # user]
...How about a treeReduceByKey? :-)On Friday, June 20, 2014 11:55 AM, DB Tsai  wrote: Currently, the reduce operation combines the result from mappersequentially, so it's O(n).Xiangru...
   Author: Michael Malak, 2014-06-20, 18:09
[SPARK-1817] RDD zip erroneous when partitions do not divide RDD count - Spark - [issue]
...Example:scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collectres1: Array[(Long, Int)] = Array((2,11))But more generally, it's whenever the number of partitions...
http://issues.apache.org/jira/browse/SPARK-1817    Author: Michael Malak, 2014-06-04, 17:16
GraphX triplets on 5-node graph - Spark - [mail # dev]
...Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I missing something fundamental?val nodes = sc.parallelize(Array((1L, "N1"), (2L, "N2"), (3L, "N3"), (4L, "N4"),...
   Author: Michael Malak, 2014-05-29, 06:48
[SPARK-1836] REPL $outer type mismatch causes lookup() and equals() problems - Spark - [issue]
...Anand Avati partially traced the cause to REPL wrapping classes in $outer classes. There are at least two major symptoms:1. equals()=========In REPL equals() (required in custom classes used...
http://issues.apache.org/jira/browse/SPARK-1836    Author: Michael Malak, 2014-05-28, 20:40
Re: rdd ordering gets scrambled - Spark - [mail # user]
...Mohit Jaggi:A workaround is to use zipWithIndex (to appear in Spark 1.0, but if you're still on 0.9x you can swipe the code from https://github.com/apache/spark/blob/master/core/src/main/sca...
   Author: Michael Malak, 2014-05-28, 15:44
[SPARK-1857] map() with lookup() causes exception - Spark - [issue]
...Using map() and lookup() in conjunction throws an exceptionval a = sc.parallelize(Array(11))val m = sc.parallelize(Array((11,21)))a.map(m.lookup(_)(0)).collect14/05/14 15:03:35 ERROR Executo...
http://issues.apache.org/jira/browse/SPARK-1857    Author: Michael Malak, 2014-05-25, 22:01
Sort:
project
Spark (15)
Hive (13)
Avro (1)
Pig (1)
type
mail # user (20)
issue (5)
mail # dev (5)
date
last 7 days (1)
last 30 days (1)
last 90 days (2)
last 6 months (15)
last 9 months (30)
author
Ted Yu (1698)
Harsh J (1295)
Todd Lipcon (994)
Stack (978)
Jun Rao (970)
Jonathan Ellis (844)
Andrew Purtell (816)
Jean-Daniel Cryans (753)
Yusaku Sako (720)
stack (714)
Jarek Jarcec Cecho (703)
Eric Newton (688)
Jonathan Hsieh (673)
Roman Shaposhnik (663)
Namit Jain (649)
Hitesh Shah (627)
Owen O'Malley (625)
Steve Loughran (624)
Siddharth Seth (614)
Josh Elser (561)
Brock Noland (549)
Neha Narkhede (546)
Eli Collins (545)
Arun C Murthy (543)
Doug Cutting (536)
Michael Malak