1

For our development we work primarily on AWS. We have data on S3 with varying levels of security, prod, integration, development, etc.

When working with data, we often have to run ad-hoc analysis. We may not know what the final structure of the directories will be when we start. We may be running on large datasets that fail in intermediate steps and we have to continue halfway through.

However, we have no where that we can delete files/folders. We simply have to overwrite the existing data (we have replace, not delete) or do something like

s3://some/analysis/<date1>,s3://some/analysis/<date2>s3://some/analysis/<date3>

While this does technically satisfy the requirements of letting us do work, it make working more challenging, specifically in the ad-hoc situation.

For example, I may run something that worked on 1 week of data locally, but has some memory issues on 1 years worth of data. So, 1/10 of the way through it errors. Ok, now I have to start another run in another directory because the software we are using doesn't allow replace (replace is a delete then write) and it can error at any step along the way, I may need to change something halfway through, and by the end I have many directories in which my data is scattered.

Yes, I did my work, but the process is error prone as I have to track my analysis' versions and at the end put it all together.

I've never been in an environment where I couldn't delete any files, even in my own personal work directory. (We have no audit log requirement on random analysis, or at all) The security admin says this is standard practice and we should look at best practices for security.

It sounds like he is strong arming the least privilege principle because, like I said, I can technically do my work... Just in a very error prone, not efficient, against the grain kind of way.

Obviously, security and convenience are often at ends, but my question is: Assuming there is no outright requirement to save intermediary datasets from analysis, is it standard practice to not allow deleting of files anywhere?

Edit: I work for an advertising company. None of the data I'm referring to here has PII. In addition, the data in these folders is deleted 30 days after creation.

jyo kam
  • 3
  • 1

1 Answers1

1

Whether or not this is standard is going to depend on your specific industry.

Not knowing anything about yours or the nature of your projects, I can say this is a standard practice for data under legal hold, where a company is anticipating litigation and don't want to be accused of destroying evidence. Or the company wants to be able to demonstrate prior art in the event of patent litigation.

Ivan
  • 6,288
  • 3
  • 18
  • 22