Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Help in designing row key


+
Flavio Pompermaier 2013-07-02, 16:13
+
Ted Yu 2013-07-02, 16:25
+
Flavio Pompermaier 2013-07-02, 17:35
Copy link to this message
-
Re: Help in designing row key
For #1, yes - the client receives less data after filtering.

For #2, please take a look at TestMultiVersions
(./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94)
for time range:

    scan = new Scan();

    scan.setTimeRange(1000L, Long.MAX_VALUE);
For row key selection, you need a filter. Take a look at FuzzyRowFilter.java

Cheers

On Tue, Jul 2, 2013 at 10:35 AM, Flavio Pompermaier <[EMAIL PROTECTED]>wrote:

>  Thanks for the reply! I thus have two questions more:
>
> 1) is it true that filtering on timestamps doesn't affect performance..?
> 2) could you send me a little snippet of how you would do such a filter (by
> row key + timestamps)? For example get all rows whose key starts with
> 'someid-' and whose timestamps is greater than some timestamp?
>
> Best,
> Flavio
>
>
> On Tue, Jul 2, 2013 at 6:25 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. Using timestamp in row-keys is discouraged
> >
> > The above is true.
> > Prefixing row key with timestamp would create hot region.
> >
> > bq. should I filter by a simpler row-key plus a filter on timestamp?
> >
> > You can do the above.
> >
> > On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi to everybody,
> > >
> > > in my use case I have to perform batch analysis skipping old data.
> > > For example, I want to process all rows created after a certain
> > timestamp,
> > > passed as parameter.
> > >
> > > What is the most effective way to do this?
> > > Should I design my row-key to embed timestamp?
> > > Or just filtering by timestamp of the row is fast as well? Or what
> else?
> > >
> > > Initially I was thinking to compose my key as:
> > > timestamp|source|title|type
> > >
> > > but:
> > >
> > > 1) Using timestamp in row-keys is discouraged
> > > 2) If this design is ok, using this approach I still have problems
> > > filtering by timestamp because I cannot found a way to numerically
> filer
> > > (instead of alphanumerically/by string). Example:
> > > 1372776400441|something has timestamp lesser
> > > than 1372778470913|somethingelse but I cannot filter all row whose key
> is
> > > "numerically" greater than 1372776400441. Is it possible to overcome
> this
> > > issue?
> > > 3) If this design is not ok, should I filter by a simpler row-key plus
> a
> > > filter on timestamp? Or what else?
> > >
> > > Best,
> > > Flavio
> > >
> >
>
>
>
> --
>
> Flavio Pompermaier
> *Development Department
> *_______________________________________________
> *OKKAM**Srl **- www.okkam.it*
>
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* [EMAIL PROTECTED]
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
>
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any
> manner.
>
+
Flavio Pompermaier 2013-07-03, 08:05
+
Mike Axiak 2013-07-03, 08:12
+
Flavio Pompermaier 2013-07-03, 09:14
+
Anoop John 2013-07-03, 09:58
+
James Taylor 2013-07-03, 10:33
+
Flavio Pompermaier 2013-07-03, 11:25
+
James Taylor 2013-07-03, 11:42
+
Flavio Pompermaier 2013-07-03, 10:20
+
Ted Yu 2013-07-03, 11:35
+
Asaf Mesika 2013-07-03, 21:23
+
Flavio Pompermaier 2013-07-04, 09:46