Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - hdfs data into Hbase


+
kun yan 2013-09-09, 09:06
Copy link to this message
-
Re: hdfs data into Hbase
Shahab Yunus 2013-09-09, 12:52
Some quick thoughts, well your size is bound to increase because recall
that the rowkey is stored in every cell. So when in CSV if you have let us
say 5 columns and when you imported them to HBASE using the first column as
key, then you will end up with essentially 9 (1 for the rowkey and then 2
each for rest of 4 'rowkey-column' pairs) columns (I know very crude and
high-level estimation.)

Also, how are you measuring the size in hdfs after import to HBase. Are you
excluding any replication of data?

Regards,
Shahab
On Mon, Sep 9, 2013 at 5:06 AM, kun yan <[EMAIL PROTECTED]> wrote:

> Hello everyone, I wrote a mapreduce program to import data(HDFS) into
> hbase, but when I import data into hbase later increased a lot, my original
> data size is 69MB (HDFS), import HBase, My HDFS increase the size 3GB, I
> wrote the program do what is wrong
>
> thanks
>
> public class MRImportHBaseCsv {
>     public static void main(String[] args) throws IOException,
>     InterruptedException, ClassNotFoundException {
>
> Configuration conf = new Configuration();
> conf.set("fs.defaultFS", "hdfs://hydra0001:8020");
> conf.set("yarn.resourcemanager.address", "hydra0001:8032");
> Job job = createSubmitTableJob(conf, args);
> job.submit();
>
>     }
>     public static Job createSubmitTableJob(Configuration conf, String[]
> args)
>     throws IOException {
> String tableName = args[0];
> Path inputDir = new Path(args[1]);
> Job job = new Job(conf, "HDFS_TO_HBase");
> job.setJarByClass(HourlyImporter.class);
> FileInputFormat.setInputPaths(job, inputDir);
> job.setInputFormatClass(TextInputFormat.class);
> job.setMapperClass(HourlyImporter.class);
> // ++++ insert into table directly using TableOutputFormat ++++
> TableMapReduceUtil.initTableReducerJob(tableName, null, job);
> job.setNumReduceTasks(0);
> TableMapReduceUtil.addDependencyJars(job);
> return job;
>     }
>
>     static class HourlyImporter extends
>     Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>
> private long ts;
> // var column family
> static byte[] family = Bytes.toBytes("s");
> static String columns = "HBASE_ROW_KEY,STATION,YEAR,MONTH,DAY,HOUR,MINUTE";
>
> @Override
> protected void cleanup(Context context) throws IOException,
> InterruptedException {
>     ts = System.currentTimeMillis();
> }
>
> @Override
> protected void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException {
>
>     ArrayList<String> columnsList = Lists.newArrayList(Splitter.on(',')
>     .trimResults().split(columns));
>
>     String line = value.toString();
>     ArrayList<String> columnValues = Lists.newArrayList(Splitter
>     .on(',').trimResults().split(line));
>     byte[] bRowKey = Bytes.toBytes(columnValues.get(0));
>
>     ImmutableBytesWritable rowKey = new ImmutableBytesWritable(bRowKey);
>
>
>     Put p = new Put(Bytes.toBytes(columnValues.get(0)));
>     for (int i = 1; i < columnValues.size(); i++) {
> p.add(family, Bytes.toBytes(columnsList.get(i)),
> Bytes.toBytes(columnValues.get(i)));
>     }
>     context.write(rowKey, p);
> }
>     }
> }
>
>
> --
>
> In the Hadoop world, I am just a novice, explore the entire Hadoop
> ecosystem, I hope one day I can contribute their own code
>
> YanBit
> [EMAIL PROTECTED]
>
+
kun yan 2013-09-10, 01:27