Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 1 to 10 from 35 (0.307s).
Loading phrases to help you
refine your search...
[expand - 2 more] - Re: CUDA in spark, especially in MLlib? - Spark - [mail # user]
...Thank you Debasish.I am fine with either Scala or Java. I would like to get a quick evaluation on the performance gain, e.g., ALS on GPU. I would like to try whichever library does the busin...
   Author: Wei Tan, 2014-08-28, 18:34
[expand - 2 more] - Re: MLLib: implementing ALS with distributed matrix - Spark - [mail # user]
...Hi Deb, thanks for sharing your result. Please find my comments inline in blue.Best regards,WeiFrom:   Debasish Das To:     Wei Tan/Watson/IBM@IBMUS, Cc:     Xiangru...
   Author: Wei Tan, 2014-08-18, 02:38
RE: executor-cores vs. num-executors - Spark - [mail # user]
...Thanks for sharing your experience. I got the same experience -- multiple moderate JVMs beat a single huge JVM.Besides the minor JVM starting overhead, is it always better to have multiple J...
   Author: Wei Tan, 2014-07-16, 18:31
[expand - 1 more] - Re: parallel stages? - Spark - [mail # user]
...Thanks Sean. In Oozie you can use fork-join, however using Oozie to drive Spark jobs, jobs will not be able to share RDD (Am I right? I think multiple jobs submitted by Oozie will have diffe...
   Author: Wei Tan, 2014-07-16, 04:01
Re: Recommended pipeline automation tool? Oozie? - Spark - [mail # user]
...Just curious: how about using scala to drive the workflow? I guess if you use other tools (oozie, etc) you lose the advantage of reading from RDD -- you have to read from HDFS.Best regards,W...
   Author: Wei Tan, 2014-07-11, 19:07
[expand - 1 more] - Re: rdd.cache() is not faster? - Spark - [mail # user]
...Hi Gaurav, thanks for your pointer. The observation in the link is (at least qualitatively) similar to mine.Now the question is, if I do have big data (40GB, cached size is 60GB) and even bi...
   Author: Wei Tan, 2014-06-18, 14:40
[expand - 2 more] - Re: long GC pause during file.cache() - Spark - [mail # user]
...BTW: nowadays a single machine with huge RAM (200G to 1T) is really common. With virtualization you lose some performance. It would be ideal to see some "best practice" on how to use Spark i...
   Author: Wei Tan, 2014-06-16, 14:56
Re: How to compile a Spark project in Scala IDE for Eclipse? - Spark - [mail # user]
...This will make the compilation pass but you may not be able to run it correctly.I used maven adding these two jars (I use Hadoop 1), maven added their dependent jars (a lot) for me. &nb...
   Author: Wei Tan, 2014-06-08, 16:02
[expand - 1 more] - Re: best practice: write and debug Spark application in scala-ide and maven - Spark - [mail # user]
...Thank you all, Madhu, Gerard and Ryan. All your suggestions work. Personally I prefer running Spark locally in Eclipse for debugging purpose.Best regards,WeiWei Tan, PhDResearch Staff Member...
   Author: Wei Tan, 2014-06-08, 06:03
[expand - 1 more] - Re: reuse hadoop code in Spark - Spark - [mail # user]
...Thanks Matei.Using your pointers I can import data frrom HDFS, what I want to do now is something like this in Spark:import myown.mapperrdd.map (mapper.map)The reason why I want this: myown....
   Author: Wei Tan, 2014-06-05, 15:14
HBase (25)
Spark (10)
mail # user (32)
mail # dev (3)
last 7 days (0)
last 30 days (0)
last 90 days (2)
last 6 months (10)
last 9 months (35)
Ted Yu (1698)
Harsh J (1297)
Todd Lipcon (994)
Stack (978)
Jun Rao (971)
Jonathan Ellis (844)
Andrew Purtell (816)
Jean-Daniel Cryans (754)
Yusaku Sako (719)
stack (714)
Jarek Jarcec Cecho (703)
Eric Newton (688)
Jonathan Hsieh (673)
Roman Shaposhnik (662)
Namit Jain (649)
Hitesh Shah (627)
Owen O'Malley (625)
Steve Loughran (625)
Siddharth Seth (614)
Josh Elser (557)
Brock Noland (549)
Eli Collins (545)
Neha Narkhede (545)
Arun C Murthy (543)
Doug Cutting (533)
Wei Tan