|
|
I have a dataset which is several terabytes in size. I would like to query this data using hbase (sql). Would I need to setup mapreduce to use hbase? Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the data and pipe it into stdin. -- --- Get your facts first, then you can distort them as you please.--
-
Re: large data and hbase
Robert Evans 2011-07-11, 14:54
Rita, My understanding is that you do not need to setup map/reduce to use Hbase, but I am not an expert on it. Contacting the Hbase mailing list would probably be the best option to get your questions answered. [EMAIL PROTECTED] Their setup page might be able to help you out too http://hbase.apache.org/book/notsoquick.htmlI don't believe that Hbase supports SQL though. You can use Hive ( http://hive.apache.org/) It supports a lot of SQL, but it does batch processing to run the queries and requires you to set up map/reduce to use. --Bobby Evans On 7/11/11 6:31 AM, "Rita" <[EMAIL PROTECTED]> wrote: I have a dataset which is several terabytes in size. I would like to query this data using hbase (sql). Would I need to setup mapreduce to use hbase? Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the data and pipe it into stdin. -- --- Get your facts first, then you can distort them as you please.--
-
Re: large data and hbase
Bharath Mundlapudi 2011-07-11, 17:40
Another option to look at is Pig Or Hive. These need MapReduce. -Bharath
________________________________ From: Rita <[EMAIL PROTECTED]> To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> Sent: Monday, July 11, 2011 4:31 AM Subject: large data and hbase
I have a dataset which is several terabytes in size. I would like to query this data using hbase (sql). Would I need to setup mapreduce to use hbase? Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the data and pipe it into stdin. -- --- Get your facts first, then you can distort them as you please.--
-
Re: large data and hbase
Rita 2011-07-12, 10:01
This is encouraging.
¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure it started properly by testing the *put* and *get* of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.¨
On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi <[EMAIL PROTECTED]>wrote:
> Another option to look at is Pig Or Hive. These need MapReduce. > > > -Bharath > > > > ________________________________ > From: Rita <[EMAIL PROTECTED]> > To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> > Sent: Monday, July 11, 2011 4:31 AM > Subject: large data and hbase > > I have a dataset which is several terabytes in size. I would like to query > this data using hbase (sql). Would I need to setup mapreduce to use hbase? > Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the > data and pipe it into stdin. > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-- --- Get your facts first, then you can distort them as you please.--
-
Re: large data and hbase
Harsh J 2011-07-12, 13:01
For a query to work in a fully distributed manner, MapReduce may still be required (atop HBase, i.e.). There's been work ongoing to assist the same at the HBase side as well, but you're guaranteed better responses on their mailing lists instead.
On Tue, Jul 12, 2011 at 3:31 PM, Rita <[EMAIL PROTECTED]> wrote: > This is encouraging. > > ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by > running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure > it started properly by testing the *put* and *get* of files into the Hadoop > filesystem. HBase does not normally use the mapreduce daemons. These do not > need to be started.¨ > > On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi > <[EMAIL PROTECTED]>wrote: > >> Another option to look at is Pig Or Hive. These need MapReduce. >> >> >> -Bharath >> >> >> >> ________________________________ >> From: Rita <[EMAIL PROTECTED]> >> To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> >> Sent: Monday, July 11, 2011 4:31 AM >> Subject: large data and hbase >> >> I have a dataset which is several terabytes in size. I would like to query >> this data using hbase (sql). Would I need to setup mapreduce to use hbase? >> Currently the data is stored in hdfs and I am using `hdfs -cat ` to get the >> data and pipe it into stdin. >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >> > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-- Harsh J
-
Re: large data and hbase
Rita 2011-07-13, 10:29
Thanks.
If you mean asking to ask the MapReduce list they will naturally recommend it :)
I suppose I will look into it eventually but we invested a lot of time into Torque.
On Tue, Jul 12, 2011 at 9:01 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> For a query to work in a fully distributed manner, MapReduce may still > be required (atop HBase, i.e.). There's been work ongoing to assist > the same at the HBase side as well, but you're guaranteed better > responses on their mailing lists instead. > > On Tue, Jul 12, 2011 at 3:31 PM, Rita <[EMAIL PROTECTED]> wrote: > > This is encouraging. > > > > ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons > by > > running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can > ensure > > it started properly by testing the *put* and *get* of files into the > Hadoop > > filesystem. HBase does not normally use the mapreduce daemons. These do > not > > need to be started.¨ > > > > On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi > > <[EMAIL PROTECTED]>wrote: > > > >> Another option to look at is Pig Or Hive. These need MapReduce. > >> > >> > >> -Bharath > >> > >> > >> > >> ________________________________ > >> From: Rita <[EMAIL PROTECTED]> > >> To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> > >> Sent: Monday, July 11, 2011 4:31 AM > >> Subject: large data and hbase > >> > >> I have a dataset which is several terabytes in size. I would like to > query > >> this data using hbase (sql). Would I need to setup mapreduce to use > hbase? > >> Currently the data is stored in hdfs and I am using `hdfs -cat ` to get > the > >> data and pipe it into stdin. > >> > >> > >> -- > >> --- Get your facts first, then you can distort them as you please.-- > >> > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > > > -- > Harsh J >
-- --- Get your facts first, then you can distort them as you please.--
-
Re: large data and hbase
Harsh J 2011-07-13, 12:26
I meant asking the [EMAIL PROTECTED] list, its pretty active just as these are :)
On Wed, Jul 13, 2011 at 3:59 PM, Rita <[EMAIL PROTECTED]> wrote: > Thanks. > > If you mean asking to ask the MapReduce list they will naturally recommend > it :) > > I suppose I will look into it eventually but we invested a lot of time into > Torque. > > > > On Tue, Jul 12, 2011 at 9:01 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> For a query to work in a fully distributed manner, MapReduce may still >> be required (atop HBase, i.e.). There's been work ongoing to assist >> the same at the HBase side as well, but you're guaranteed better >> responses on their mailing lists instead. >> >> On Tue, Jul 12, 2011 at 3:31 PM, Rita <[EMAIL PROTECTED]> wrote: >> > This is encouraging. >> > >> > ¨Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons >> by >> > running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can >> ensure >> > it started properly by testing the *put* and *get* of files into the >> Hadoop >> > filesystem. HBase does not normally use the mapreduce daemons. These do >> not >> > need to be started.¨ >> > >> > On Mon, Jul 11, 2011 at 1:40 PM, Bharath Mundlapudi >> > <[EMAIL PROTECTED]>wrote: >> > >> >> Another option to look at is Pig Or Hive. These need MapReduce. >> >> >> >> >> >> -Bharath >> >> >> >> >> >> >> >> ________________________________ >> >> From: Rita <[EMAIL PROTECTED]> >> >> To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> >> >> Sent: Monday, July 11, 2011 4:31 AM >> >> Subject: large data and hbase >> >> >> >> I have a dataset which is several terabytes in size. I would like to >> query >> >> this data using hbase (sql). Would I need to setup mapreduce to use >> hbase? >> >> Currently the data is stored in hdfs and I am using `hdfs -cat ` to get >> the >> >> data and pipe it into stdin. >> >> >> >> >> >> -- >> >> --- Get your facts first, then you can distort them as you please.-- >> >> >> > >> > >> > >> > -- >> > --- Get your facts first, then you can distort them as you please.-- >> > >> >> >> >> -- >> Harsh J >> > > > > -- > --- Get your facts first, then you can distort them as you please.-- >
-- Harsh J
|
|