Big Data / Search / DevOps
  • About
  • project

    • Crunch (94)
    • HBase (13)
    • HDFS (6)
    • Hadoop (1)
    • Hive (1)
    • MapReduce (1)

    author

    • Chao Shi (94)
    • Josh Wills (1050)
    • Gabriel Reid (300)
    • Matthias Friedrich (285)
    • Micah Whitacre (170)
    • David Ortiz (87)
    • Christian Tzolov (52)
    • Rahul (46)
    • Dave Beech (43)
    • Tom White (41)
    • Danny Morgan (40)
    • Brock Noland (37)
    • Dmitriy Lyubimov (37)
    • David Whiting (36)
    • Jinal Shah (36)
    • Patrick Hunt (34)
    • Everett Anderson (28)
    • Jeff Quinn (28)
    • Victor Iacoban (26)
    • Rahul Sharma (24)
    • Som Satpathy (24)
    • Ashish (22)
    • Kristoffer Sjögren (21)
    • Lucy Chen (20)
    • Nithin Asokan (20)
    • Robinson, Landon - Landon... (20)
    • Joseph Adler (19)
    • Jeremy Lewi (17)
    • Clément MATHIEU (15)
    • Allan Shoup (14)
    • Barry,Nathan (14)
    • Mike Barretta (14)
    • Bryan Baugher (12)
    • Jay Vyas (12)
    • Tim van Heugten (12)
    • Vinod Kumar Vavilapalli (12)
    • 陈竞 (12)
    • Ben Juhn (11)
    • Jonathan Natkins (11)
    • Kiyan Ahmadizadeh (11)
    • Marvin (11)
    • Roman Shaposhnik (11)
    • Ron Hashimshony (11)
    • Stephen Durfey (11)
    • Ben Roling (10)
    • Sean Owen (10)
    • Chandan Biswas (9)
    • Deepak Vohra (9)
    • Danushka Menikkumbura (8)
    • Jason Gauci (8)
    • Mārtiņš Kalvāns (8)
    • Quentin Ambard (8)
    • Samik Raychaudhuri (8)
    • Surbhi Mungre (8)
    • Tahir Hameed (8)
    • Durfey,Stephen (7)
    • Giovanni GATTI PINHEIRO (7)
    • Jakob Homan (7)
    • John Jensen (7)
    • Attila Sasvari (6)

    type

    • mail # dev (42)
    • issue (33)
    • mail # user (19)
  • date

    • last 7 days (0)
    • last 30 days (0)
    • last 90 days (0)
    • last 6 months (1)
    • last 9 months (94)
clear query| facets| time Search criteria: .   Results from 1 to 10 from 94 (0.0s).
Loading phrases to help you
refine your search...
[CRUNCH-340] Create HCatSource and HCatTarget - Crunch - [issue]
...This patch adds HCatSource, which enables crunch pipeline to read from Hive tables. This is the very first version, leaving a few TODOs in code.It adds new dependency from crunch-core to hca...
http://issues.apache.org/jira/browse/CRUNCH-340    Author: Chao Shi , 2017-12-10, 16:57
  
[expand - 1 more] [collapse] - Update Crunch Team List - Crunch - [mail # dev]
...Hi Micah, I would prefer officially retire. Thank you.  2015-12-07 10:24 GMT+08:00 Micah Whitacre :  > Site is now updated with the exception of removing Chao.  Asked if he...
   Author: Chao Shi , 2015-12-07, 06:16
  
Update Crunch Team List - Crunch - [mail # dev]
...I'm working at Alibaba since one year ago. My working area is changed from computing to storage. As you may noticed, I've been not active in this project for a long time. I'd like to remove ...
   Author: Chao Shi , 2015-12-06, 13:25
[CRUNCH-408] HFileSource does not estimate the size of input correctly when there is a wildcard in path - Crunch - [issue]
...The cause is that it calls FileSystem#listStatus rather than FileSystem#globStatus to retrieve the list of files under the given path. So the fix is straight forward....
http://issues.apache.org/jira/browse/CRUNCH-408    Author: Chao Shi , 2015-04-24, 20:18
  
Question about HBaseSourceTarget#getSize() - Crunch - [mail # user]
...Hi Nithin,  Because HBaseSourceTarget supports custom Scan criteria (i.e. you can apply filters), I think it can hardly make a guess on the resulting data size. Even HBase itself, becau...
   Author: Chao Shi , 2015-03-28, 04:27
  
[CRUNCH-341] Move test resources used across multiple modules to crunch-test - Crunch - [issue]
...There are duplicated test resource files in multiple modules. This patch moves them into crunch-test, which is accessiable in classpath during unit testing.chaoshi@vm3 ~/projects/crunch (mas...
http://issues.apache.org/jira/browse/CRUNCH-341    Author: Chao Shi , 2014-09-19, 00:07
  
[CRUNCH-351] Improve performance of Shard#shard on large records - Crunch - [issue]
...   This avoids sorting on the input data, which may be long and make    shuffle phase slow. The improvement is to sort on pseudo-random numbers....
http://issues.apache.org/jira/browse/CRUNCH-351    Author: Chao Shi , 2014-06-20, 03:49
  
[CRUNCH-355] Rename jobs to show how many stages have done before job submission - Crunch - [issue]
...The naming mechanism introduced in CRUNCH-262 has a flaw. It adds (m/n) to the end of job name, where m is the current stage number at planning time and n is the total number of stages.Suppo...
http://issues.apache.org/jira/browse/CRUNCH-355    Author: Chao Shi , 2014-06-20, 03:49
  
[CRUNCH-364] Fix failure on mvn dependency:tree - Crunch - [issue]
...I got the "NoClassDefFoundError: org/sonatype/aether/graph/DependencyNode" when running "mvn dependency:tree". According to [1], this can be fixed by simply upgrading maven-dependenc...
http://issues.apache.org/jira/browse/CRUNCH-364    Author: Chao Shi , 2014-06-20, 03:49
  
[CRUNCH-315] Empty collection - Crunch - [issue]
...As discussed in the mailing list [1] and [2], I'd like to add an empty collection feature. On the API side, I think we can add a new method in Pipeline to create an empty col...
http://issues.apache.org/jira/browse/CRUNCH-315    Author: Chao Shi , 2014-06-20, 03:49
  
[CRUNCH-368] TupleWritable.Comparator - Crunch - [issue]
...This patch should improve comparison performance on TupleWritables. It saves the deserialization overhead. It is particularly useful when the input tuple are large, e.g. contains long string...
http://issues.apache.org/jira/browse/CRUNCH-368    Author: Chao Shi , 2014-06-20, 03:49
  
1 2 3 4 5 Next >
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext