Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Help in designing row key


+
Flavio Pompermaier 2013-07-02, 16:13
+
Ted Yu 2013-07-02, 16:25
+
Flavio Pompermaier 2013-07-02, 17:35
Copy link to this message
-
Re: Help in designing row key
For #1, yes - the client receives less data after filtering.

For #2, please take a look at TestMultiVersions
(./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94)
for time range:

    scan = new Scan();

    scan.setTimeRange(1000L, Long.MAX_VALUE);
For row key selection, you need a filter. Take a look at FuzzyRowFilter.java

Cheers

On Tue, Jul 2, 2013 at 10:35 AM, Flavio Pompermaier <[EMAIL PROTECTED]>wrote:

>  Thanks for the reply! I thus have two questions more:
>
> 1) is it true that filtering on timestamps doesn't affect performance..?
> 2) could you send me a little snippet of how you would do such a filter (by
> row key + timestamps)? For example get all rows whose key starts with
> 'someid-' and whose timestamps is greater than some timestamp?
>
> Best,
> Flavio
>
>
> On Tue, Jul 2, 2013 at 6:25 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. Using timestamp in row-keys is discouraged
> >
> > The above is true.
> > Prefixing row key with timestamp would create hot region.
> >
> > bq. should I filter by a simpler row-key plus a filter on timestamp?
> >
> > You can do the above.
> >
> > On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi to everybody,
> > >
> > > in my use case I have to perform batch analysis skipping old data.
> > > For example, I want to process all rows created after a certain
> > timestamp,
> > > passed as parameter.
> > >
> > > What is the most effective way to do this?
> > > Should I design my row-key to embed timestamp?
> > > Or just filtering by timestamp of the row is fast as well? Or what
> else?
> > >
> > > Initially I was thinking to compose my key as:
> > > timestamp|source|title|type
> > >
> > > but:
> > >
> > > 1) Using timestamp in row-keys is discouraged
> > > 2) If this design is ok, using this approach I still have problems
> > > filtering by timestamp because I cannot found a way to numerically
> filer
> > > (instead of alphanumerically/by string). Example:
> > > 1372776400441|something has timestamp lesser
> > > than 1372778470913|somethingelse but I cannot filter all row whose key
> is
> > > "numerically" greater than 1372776400441. Is it possible to overcome
> this
> > > issue?
> > > 3) If this design is not ok, should I filter by a simpler row-key plus
> a
> > > filter on timestamp? Or what else?
> > >
> > > Best,
> > > Flavio
> > >
> >
>
>
>
> --
>
> Flavio Pompermaier
> *Development Department
> *_______________________________________________
> *OKKAM**Srl **- www.okkam.it*
>
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* [EMAIL PROTECTED]
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
>
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any
> manner.
>
+
Flavio Pompermaier 2013-07-03, 08:05
+
Mike Axiak 2013-07-03, 08:12
+
Flavio Pompermaier 2013-07-03, 09:14
+
Anoop John 2013-07-03, 09:58
+
James Taylor 2013-07-03, 10:33
+
Flavio Pompermaier 2013-07-03, 11:25
+
James Taylor 2013-07-03, 11:42
+
Flavio Pompermaier 2013-07-03, 10:20
+
Ted Yu 2013-07-03, 11:35
+
Asaf Mesika 2013-07-03, 21:23
+
Flavio Pompermaier 2013-07-04, 09:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB