[WIP] refactor access to dataset through events #1918
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #1894 , #1907
Context PR:
Since we created Events app, We now want to provide temporary access to the participants to the datasets that have been linked to event. We havenot implemented this access yet, because we wanted to first figure out the right way to do it.
In this PR, i started refactoring the
has_accessmethod fromPublishedProjectManagerand added a similar methodhas_accesson theEventDataset.Here is the explanation of the the current methods/functions
PublishedProjects
on physionet, access to
PublishedProjectsis determined byhas_accessmethod on thePublishedProjectsmodel. thishas_accessmethod is used byviewsonprojectapp which is used check if the current user should have access to this project.on healthdatanexus, we use
PublishedProjects.objects.accessible_byto load projects, that the current user should have access to.Events
On physionet, we haven't implemented logic that lets participants of event access a dataset temporarily.
On healthdatanexus, the same
PublishedProjects.objects.accessible_byis responsible to provide temporary access to the participants of eventsOn Event app, we also have
EventDataset.has_accessmethod which is identical to thePublishedProjects.has_accessmethod in the sense that this method can be used to determine if the current user should have access to this particular event dataset. This method is used in template when a user visits an Event page to determine if they should have access to the Datasets linked to Event.( this only does checks in context of event)This is my thoughts
EventDataset.has_accessbe as it is as if we were to remove this, we would still need some kind of similar logic to use on the Event detail page to determine if this particular event dataset should be accessible by the current user.This methods does checks such as if event is active, event dataset is active, if the current user is a participant or a host, and if the current user has signed the Event Agreement.
For controlling temporary access through event and Normal access
2. The
EventDatasetmodel has two access mechanism a) Google BigQuery(GBQ) b)Research Environment(RE) for physionet and healthdatanexus respectively. Based on this when a user has access to the project/dataset from Event, they should not be able to access the project files/content like usual.For example in case of physionet, they should not see other access mechanism like wget, be able to see files and navigate folders, etc.
Which means we need to be able to differentiate if the user's access is temporary through
Eventor its a regular access. So we cant simply edit thePublishedProject.has_accessmethod and add bunch of logic for event.To my point/suggestions, maybe we could have refactor the access in the following way
a.
PublishedProject.has_access_basic- we add a new method that checks for common stuff like if download is paused/depreciated, or on embargob.
PublishedProject.has_access- uses thePublishedProject.has_access_basicand does the regular checks like it does now,c.
PublishedProject.has_access_temporary- uses thePublishedProject.has_access_basicand then check if the access is temporary(currently through Event)on project view, we use both
has_accessandhas_access_temporaryand use these to determine what the users sees or how they access data on project template.For example: If the user visits the project though a link like this http://localhost:8000/content/demowave/1.0.0, the current logic(from the
published_project.html) executes and normal access check is done.and if the user visits the project like http://localhost:8000/content/demowave/1.0.0?temporary_access=event-slug
then the
has_access_temporaryis used in the template to check if the user has access to the project and display/give them access to the dataset accordingly.Now for healthdatanexus, we use the
PublishedProjects.objects.accessible_byto check the access, so it already should be good. for HDN, we dont allow file download or direct data access, so its a bit simpler(although in future, we still need to be able to automatically close the research environment once the event is over, which i think we could do by adding logic on thehdn-research-environmentapp.[WIP]
Finally the last thing remaining is to implement/enforce the access type GBQ and RE.(for example for physionet, if the dataset was linked to event as RE, the physionet participants should not be able to access the project/dataset)
I dont think it makes sense to hardcode the logic as in future physionet/ or other org might want to implement the Research Environment. Here maybe we can add a .env variable
RESEARCH_ENVIRONMENT_ENABLEDand use that.So it would be like
by default
RESEARCH_ENVIRONMENT_ENABLED=falseand this will be used with thePublishedProject.has_access_temporaryto implement the event related access.(which solves the problem for non RE access)For RE access, maybe we can have a additional parameter on
PublishedProjects.objects.accessible_bylikefilter_research_environmentand use that to get datasets linked to event which haveaccess_type=Research EnvironmentI apologize in advance if the above doesn't sound like a good idea or even worse(doesnot make sense at all :D)