1
1
I currently have a couple of scripts and Android apps that together do the following for a set of member devices (smartphones, PCs, digital cameras):
- all pictures taken by all member devices are automatically synced with Dropbox
- for smartphones, this is done using the Dropsync app
- for digital cameras, a script is run as soon as the camera connects via USB
- once a week, all pictures from all members synced this way are moved to a different directory on Dropbox, for long-term storage.
- After the move, a deduplication takes place — this archive directory (let's call it
Dropbox/PicsArchive/
) is scanned, and all duplicates are detected and removed. Currently, I usefdupes
to detect the duplicates, but to my knowledge, this only detects exact duplicates, i.e., files that have identical checksums.
The problem
There is however nontrivial linkage between all the members.
For example, when connecting a specific kind of digital camera to the USB of the PC running these scripts, the pictures on its memory card are moved to Dropbox, and downsized copies are generated and then sent to a subset of the smartphone members (using the brilliant Autoremote app). These resized copies can very easily end up in a location on the smartphone that is also being synced by Dropsync. Therefore, the camera's pictures, as well as these resized copies, are then both eventually synced into Dropbox/PicsArchive/
.
Another example is taking a picture with the smartphone's camera (high resolution) and sharing it to a WhatsApp contact -- often, WhatsApp reduces the resolution of that image. But I want both locations synced (the camera and the relevant WhatsApp media directory), meaning, Dropsync will sync two pictures (one with high and the other with lower resolution) to dropbox, and both will eventually end up in Dropbox/PicsArchive/
.
Obviously, I wish to keep only the highest resolution/quality images. Perhaps a better backup strategy is what is needed here, not a more generic tool to clean up mess that is preventable somehow.
Here are a couple of crude pictures of the current setup. Here's the original use case:
I then implemented linkage, to stimulate the users to make higher quality images with the digital camera, while still being able to easily send those pics to WhatsApp users:
Note also that the path Phone camera → WhatsApp creates a duplicate on the phone (both the phone camera directory and the WhatsApp media directory are synced, which is of course necessary to allow pictures not meant for WhatsApp to be synced as well).
So, how can I deduplicate all these pictures?
Let us continue this discussion in chat.
– Rody Oldenhuis – 2015-02-10T18:06:51.320Feel free to re-ask this on [softwarerecs.SE]. As it stands, this is just a request for list software that does XYZ. A valid alternative would be to just include whatever you've tried and describe the actual problem you're trying to solve. You'll find lots of people who are up for hacking something together (even based on something you started), but I have to agree with Jake here that this looks like a whishlist for a magic program that may or may not exist. Those kinds of questions are not encouraged here (and on most SE sites). – slhck – 2015-02-10T18:32:10.053
@slhck OK, how about this? – Rody Oldenhuis – 2015-02-10T19:03:13.913
I tried to remove the (now irrelevant) part where you're asking for a tool. For me the question is fine that way, you may however 1) try to further reduce it to the bare essentials needed to provide an answer and 2) notify those who already answered that their answers are now no longer valid. Generally it's not so nice to change a question that radically, but given that the answers you have now don't look like a solution to your original question either (and aren't upvoted), I'd let the rewrite pass here. – slhck – 2015-02-10T19:18:08.090
@slhck is it perhaps better to roll it back to what it was and just ask a new question? – Rody Oldenhuis – 2015-02-10T20:01:11.717
Like I said, it's not necessary in that case, in my opinion. – slhck – 2015-02-10T20:07:13.177
Says this deduplicator runs on Linux: http://www.hardcoded.net/dupeguru_pe/
– Sun – 2015-02-11T20:11:09.147@sunk818: the question is tagged linux...and in the original question I did mention a strong preference for a linux solution. – Rody Oldenhuis – 2015-02-12T05:47:42.877
@sunk818 dupeguru however sounds nice; that definitely did not turn up on my Google searches. How'd you find it? Do you have any experience with it? – Rody Oldenhuis – 2015-02-12T05:50:40.847
@sunk818 Sadly does not seem to have a command line interface... – Rody Oldenhuis – 2015-02-12T05:59:50.670
1
And this: http://www.jhnc.org/findimagedupes/manpage.html
– Sun – 2015-02-12T16:41:35.110