Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - MapReduce job with mixed data sources: HBase table and HDFS files


+
S. Zhou 2013-07-03, 04:34
+
Azuryy Yu 2013-07-03, 05:06
+
S. Zhou 2013-07-03, 15:34
+
Michael Segel 2013-07-03, 21:19
+
Azuryy Yu 2013-07-04, 01:02
+
Ted Yu 2013-07-04, 04:29
+
S. Zhou 2013-07-04, 03:41
+
Michael Segel 2013-07-05, 21:06
+
S. Zhou 2013-07-10, 17:15
+
Ted Yu 2013-07-10, 17:21
+
S. Zhou 2013-07-10, 17:55
+
Ted Yu 2013-07-10, 18:21
+
S. Zhou 2013-07-11, 22:44
+
Ted Yu 2013-07-11, 22:51
+
S. Zhou 2013-07-12, 04:49
+
Ted Yu 2013-07-12, 04:54
+
S. Zhou 2013-07-12, 05:19
Copy link to this message
-
Re: MapReduce job with mixed data sources: HBase table and HDFS files
S. Zhou 2013-07-12, 15:49
Ted & Azurry, after some investigation on log files, I figured out why it happens. In the code, I use the same path "inputPath1" for two inputs (see below) since I thought the input path is not effective for HBase table. But it turns out that the input path of HBase table could affect the input path of HDFS file. I changed the input path for HBase table to a different value, then it works!
Ted & Azurry, I really appreciate your help!
MultipleInputs.addInputPath(job1, inputPath1, TextInputFormat.class, Map.class);
MultipleInputs.addInputPath(job1, inputPath1,  TableInputFormat.class, TableMap.class);
________________________________
 From: S. Zhou <[EMAIL PROTECTED]>
To: Ted Yu <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, July 11, 2013 10:19 PM
Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files
 

i use org.apache.hadoop.mapreduce.lib.input.MultipleInputs

I run on pseudo-distributed hadoop (1.2.0) and Pseudo-distributed HBase (0.95.1-hadoop1).
________________________________
From: Ted Yu <[EMAIL PROTECTED]>
To: S. Zhou <[EMAIL PROTECTED]>
Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, July 11, 2013 9:54 PM
Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files

Did you use org.apache.hadoop.mapreduce.lib.input.MultipleInputs or the one from org.apache.hadoop.mapred.lib ?

Which hadoop version do you use ?

Cheers
On Thu, Jul 11, 2013 at 9:49 PM, S. Zhou <[EMAIL PROTECTED]> wrote:

Thanks Ted &Azurry. Your hint helped me solve that particular issue.
>
>But now I run into a new problem with multipleInputs. This time I add a HTable and a HDFS file as inputs. (see the new code below). The problem is: whatever data source added later overrides the data source added before. For example, if I add HBase table first to MultipleInputs first then HDFS file, then the final result from reducer only contains the results of HDFS file. On the other hand, if I add HDFS file first then HBase table, then the final result from reducer only contains the results of HBase table.
>
>public class MixMR {
>       public static class Map extends Mapper<Object, Text, Text, Text> {
>
>                public
void map(Object key, Text value, Context context) throws IOException,   InterruptedException {
>                        String s = value.toString();
>                        String[] sa = s.split(",");
>                        if (sa.length == 2) {
>                                context.write(new Text(sa[0]), new
Text(sa[1]));
>
>                        }
>
>                }
>
>        }
>
>    public static class TableMap extends TableMapper<Text, Text>  {
>        public static final byte[] CF = "cf".getBytes();
>        public static final byte[] ATTR1 = "c1".getBytes();
>
>        public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
>          
>            String key Bytes.toString(row.get());
>            String val = new String(value.getValue(CF, ATTR1));
>          
>            context.write(new Text(key), new Text(val));
>        }
>    }
>
>
>    public static class Reduce extends Reducer  <Object, Text, Object, Text> {
>        public void reduce(Object key, Iterable<Text> values, Context context)
>                throws IOException, InterruptedException {
>            String ks = key.toString();
>            for (Text val :
values){
>                context.write(new Text(ks), val);
>            }
>
>        }
>    }
>
> public static void main(String[] args) throws Exception {
>                  Path inputPath1 = new Path(args[0]);
>                Path outputPath = new Path(args[1]);
>
>                String tableName1 = "sezhou-test";
>
>                Configuration config1 HBaseConfiguration.create();
>                Job job1 = new Job(config1, "ExampleRead");
>                job1.setJarByClass(com.shopping.test.dealstore.MixMR.class);     // class that contains mapper
jobs
value
MultipleInputs.addInputPath(job1, inputPath1,  TableInputFormat.class, TableMap.class); // inputPath1 here has no effect for HBase table
+
S. Zhou 2013-07-03, 15:18
+
Ted Yu 2013-07-03, 04:57
+
S. Zhou 2013-07-03, 15:17