How to clean up a Zotfile-managed directory for duplicates?

0

I intend to keep a local copy of all my PDF files that are linked to Zotero and have thus chosen to use Zotfile to link and migrate newly downloaded PDFs to entries in Zotero. Lately, I decided to move the storage location of the Zotfile and this created two variants of duplicates.

Out of 1000+ entries in my Zotero, I ended up having 2000+ PDFs in the Zotfile-managed storage folder. The majority of entries in my Zotero library should have only one version of the PDF. This brought me to investigate and arrive at the following observation:

  • For each of the 516 of the entries of the following sort, I shall find duplicates of the following sort:
    de_Paula_et_al_Identifying_Preferences_in_Networks_with_Bounded_Degree2.pdf
    de_Paula_et_al_Identifying_Preferences_in_Networks_with_Bounded_Degree.pdf
    
    Upon checking, the Zotero entry is linked to ..._Degree2.pdf. To clean things up, I need to:
    1. In the storage folder, delete the ..._Degree.pdf and,
    2. In Zotero, run the Manage Attachments context-menu command to Rename Attachments.
  • To make things worse, among the 516 entries, some indeed have two distinctive file attachments.

    Question 1: can I identify entries in Zotero that has more than 1 attachments?

    Question 2: after bulk-deletion of the duplicates, can I only rename the attachments for entries that have only one attachment, in Zotero?

  • Lastly, there are a simpler case where white spaces were not replaced by _, like:
    Abeler et al_2011_Reference Points and Effort Provision.pdf
    Abeler_et_al_2011_Reference_Points_and_Effort_Provision.pdf
    
    Here, the second file is properly attached to Zotero and the cleaning task is simpler ==> simply delete the files with white-spaces in any other part of the string except for string "et al".

I wonder if there is any hope of automating the process? In particular, I am looking for tools that can automate some of the Zotero operations, as of:

  1. Selecting a set of entries at the same time. The simple search box can only find one thing at a time. (Well, if the query text is not exact, multiple entries will be returned.)
  2. Identifying entries in Zotero that has more than 1 attachments?

Also, on the Zotfile side, here goes the trouble when I bulk-rename all entries in Zotero: for Zotero entries that have more than one attachment, I tend to add suffixes such as: _WP, _APPENDIX, _Supp etc. Bulk-renaming will wipe all these individual markers.


How I got myself into such troubles: Over the years, I host all my Zotero-linked PDFs in a dedicated folder on a synced-folder hosted by Nutstore. Recently, I have experienced trouble using their clients to sync and have decided to waste money for Dropbox.

llinfeng

Posted 2019-07-18T15:19:31.950

Reputation: 461

No answers