Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: [EMAIL PROTECTED]
Cc: David Radley <[EMAIL PROTECTED]>, atlas
<[EMAIL PROTECTED]>, Sarath Subramanian <[EMAIL PROTECTED]>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: [EMAIL PROTECTED]
LinkedIn:
http://www.linkedin.com/pub/mandy-chessell/22/897/a49Assistant: Janet Brooks - [EMAIL PROTECTED]
From: Madhan Neethiraj <[EMAIL PROTECTED]>
To: David Radley <[EMAIL PROTECTED]>, Sarath Subramanian
<[EMAIL PROTECTED]>
Cc: atlas <[EMAIL PROTECTED]>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <[EMAIL PROTECTED]>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <[EMAIL PROTECTED]>
Cc: atlas <[EMAIL PROTECTED]>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we