What's a good solution for file-tagging in linux?

74

40

I've been looking for a way to tag my files and search/filter them based on those tags.

Here are my (updated) requirements :

  • any file readable by the user can be tagged freely
  • a user can search for files matching one or several tags
  • files can be moved around without losing the previously associated tags
  • the system could be backed up easily
  • no dependencies on any desktop environment
  • if any gui is involved, there must be a cli fallback

I've been hoping for some basic filesystem & coreutils hackery to handle this, but I haven' thought about this hard enough yet.
Meanwhile I'll review beagle and metatracker, which have been mentionned here, and see how they perform.


Ok so beagle has huge gnome dependencies, and tracker is okish, but still has some dependencies I don't like...

Been doing some more research, and the way to go could very well be extended file attributes.
That's a native solution for most recent filesystems, but they aren't very well supported yet (most coreutils destroys them by default, cp for example needs the -a flag to preserve them). Would like to hear some thoughts on using them while I try my hand at some hacks myself, eventhough this might warrant a new question.

julien

Posted 2009-12-10T20:46:15.300

Reputation: 1 276

Question was closed 2017-12-07T13:59:48.207

In PC-BSD Forums, with reference to the 2010 edition of this question: PC-BSD, extended attributes and tagging; OpenMeta and Apple's approach

– Graham Perrin – 2015-08-04T04:12:09.303

1

Unsurprisingly, Reddit has much better and newer answers for this question.

– Dan Dascalescu – 2016-08-05T22:35:58.093

PytagsFS http://superuser.com/a/89140/129520

– n611x007 – 2012-11-21T10:41:52.323

2Issues with extended file attributes: (i) In my experience, they are a nuisance when you want to backup. (ii) You can't use them when you move between filesystems. Apart from that, they would be the Right Thing. – Charles Stewart – 2010-01-19T09:28:00.383

Answers

13

It's not clear what kind of searching you want. If you want it to work anywhere in unix, rather that just your home directory, and you only want to do pathname-based searches, the following scheme is workable, with a little bit of shell hackery, and using the standard locatedb:

  1. Each directory that contains at least one tagged file needs a standard subdirectory, say .path-tags;
  2. Each file in the directory $FILE with link $TAG (which should not contain the char _) has a link $TAG_$FILE -> ../$FILE

I leave the details of the locate-tag script to you; it should be a two- or three-liner, using only the locate command and shell hackery. (If you're interested, I could write one).

Some of the KDE chaps talked about this sort of scheme for metadata, although I don't recall the details.

It should also be possible to do more sophisticated, content-examining tests based on this scheme with a similar script wrapped around find.

Thoughts on updated requirements

  1. any file readable by the user can be tagged freely - Yes, should be no problem
  2. a user can search for files matching one or several tags - Likewise
  3. files can be moved around without losing the previously associated tags - The directories they inhabit can be freely moved about, but if the file is moved from the directory, we are in trouble. If the tags took the form $TAG_$INODE_$FILE and we have an efficient way to find which paths have a given inode, then we can do this, losing tags only if we move out of filesystems. Copying files might make some trouble, and this is clearly more complicated than my original suggestion.
  4. the system could be backed up easily - not essentially difficult.
  5. no dependencies on any desktop environment - none
  6. if any gui is involved, there must be a cli fallback - that's where we live!

Postscript The "reverse-inode-lookup" file described by the link (2) you showed me in your answer to (1) can be used to give some additional infrastructure. We can run a service on the reverse lookup file, which checks that each inode given in the filename of a tag matches the inode of the file (if any) the tag points to. If there is no match, then the required surgery can be performed (does the inode still exists? where is it?), and the reverse lookup file being either mutated or regenerated, and the tag symlinks being updated.

I anticipate one tricky case: what if the tagged file is not where the tags say it should be, the reverse lookup file says it still exists, but the prodigal file is not where the lookup file says it is, the lookup file being out of date? There are a few ways to handle this case, none obviously ideal. Apart from this, this whole task seems to be the kind of thing Perl is well-suited to...

Charles Stewart

Posted 2009-12-10T20:46:15.300

Reputation: 2 624

This sounds fun, I have quite a fewthings on my plate right now, and can't tackli this at the moment, but I will be experimenting with this too, let's keep in touch. – julien – 2010-01-21T09:27:01.467

1I missed the punchline, what software can accommodate this? I was hoping for something I can use casually without writing my own infrastructure. (But plain so that I can extent it myself when desired) – ThorSummoner – 2015-08-21T03:43:02.990

@ThorSummoner - There is no freely available software written that uses the scheme I outline: it is a suggestion for how to write one's own tagging system. But I have written some code for tagging for personal use that does use something like this scheme. – Charles Stewart – 2015-08-28T09:02:33.980

1This is nice, and I've been thinking about using symlinks too. The problem is, a file can't be moved around without losing its tags. Ideally, tags would be path agnostic, and searching for a tag should return the actual file, rather than a dead symlink... PS : I'm all for a shell based solution, but I think the problem domain make it so that It'd be pretty painful to maintain only through shell scripts, I hope someone proves me wrong – julien – 2010-01-18T14:40:18.447

I've edited my question to (hopefully) make it clearer what kind of solution I'm after. cheers – julien – 2010-01-18T14:54:15.997

Damn I had never realized that inodes where like persistent guids for files, that's food for thought! – julien – 2010-01-18T17:29:19.853

1inodes are uids, but they are tied to a given fs, so they are not guids. This is not a bad thing, since copying, backups, archiving, &c, mean that files get duplicated and stored within other files, and you want the fs state to give you enough info to disentangle the results. – Charles Stewart – 2010-01-19T11:32:16.867

25

I've just released an alpha of my new program that attempts to provide this functionality. It currently meets some, but not all, of your requirements. It may be of interest to you anyway. It provides a command-line tool for tagging and a virtual file-system for browsing (where tags are represented by directories).

http://www.tmsu.org/

any file readable by the user can be tagged freely

Yes.

a user can search for files matching one or several tags

Yes. Either via the command-line tool or by browsing the tag directories in the virtual file-system.

files can be moved around without losing the previously associated tags

No. However the application stores fingerprints of the files tagged which are used to help identify moved files. A 'repair' command is provided that will update the paths of moved files. (Obviously this mechanism breaks down if a file is both moved and modified.)

the system could be backed up easily

Yes. It's a simple Sqlite 3 database file.

no dependencies on any desktop environment

Yes. No dependencies and as it can be run as a virtual file-system it is available to peruse as a file-system in any program that supports symbolic links.

if any gui is involved, there must be a cli fallback

No GUI at present.

Paul Ruane

Posted 2009-12-10T20:46:15.300

Reputation: 533

@student TMSU now includes some scripts that perform filesystem operations whilst keeping the database up to date: tmsu-fs-mv, tmsu-fs-rm and tmsu-fs-merge. – Paul Ruane – 2015-05-19T10:52:00.073

Excuse my question but... ¿why not simply clone tags when move a file automaticly? Do i need to manually update files when moving? – m3nda – 2015-07-31T14:41:40.600

@erm3nda. Not sure I 100% understand your question. TMSU currently has no filesystem 'watcher' so it is not aware when you move files. There is a 'repair' subcommand to identify file moves after the event or you can use the supplied script which both moves the file in the filesystem and updates the TMSU database. – Paul Ruane – 2015-07-31T15:47:40.590

Can you please compare this program to TagSpaces and Linux Desktop Search? It sounds very promising if it can detect typos in tagnames. - - Unfortunately not in Debian 9 apt. – Léo Léopold Hertz 준영 – 2017-07-09T07:25:42.723

Looks very interesting. Do you have any idea how to implement the possibility to move files around without losing the associated tags? – student – 2012-08-06T18:02:00.900

@student: currently there is a 'repair' command which deals with moved and modified files. (If you both move and modify a file, however, this won't be detected.) – Paul Ruane – 2012-08-07T19:10:11.940

Perhaps one could write variants of mv, cp and rm which handle your tags as well (call them for example tmv, tcp and trm) then one wouldn't lose tags at least if one uses the commandline to move files around... – student – 2012-08-09T20:57:35.553

7

Nobody mentioned, but you definitely should look at extended file system attributes. ext4 for example has them. there are tools getfattr and setfattr to deal with them. Of course you will have to write some shell scripts to search for files tagged with sometag. Regarding mentioned questions all the answers are "Yes". You should only take into account that it's depended on file system.

alik

Posted 2009-12-10T20:46:15.300

Reputation: 71

Inode data of the file should be definetively the correct way to do that on a ext4 fs, but will not offer any backward compatability. Right? – m3nda – 2015-07-31T14:45:07.060

6

Suprised that nobody has mentioned TagSpaces. It meets all your requirements because tags are stored in the filename and TagSpaces is cross-platform.

TagSpaces

Dan Dascalescu

Posted 2009-12-10T20:46:15.300

Reputation: 3 406

1tagspaces doens't have a CLI fallback, so it doesn't meet all the requirements. Or does it have a CLI? If it does, please, let me know! – TomCho – 2017-01-06T07:22:15.747

There is no support for the application in Debian 9 apt. Anything coming? - - You can install the app by these instructions https://www.tagspaces.org/products/

– Léo Léopold Hertz 준영 – 2017-07-09T07:15:15.163

Can you please compare your proposal to Linux Desktop Search Tools? – Léo Léopold Hertz 준영 – 2017-07-09T07:23:28.903

6

I think this might meet all your requirements. In any case, it is a cool piece of code:

http://pages.stern.nyu.edu/~marriaga/software/oyepa

The GUI requires Qt, but there is a command-line application for searching and the fact that all tags are actually in the filename makes it trivial to manipulate tags|files from the cli.

laramichaels

Posted 2009-12-10T20:46:15.300

Reputation: 679

@laramichaels I know this is pretty old, but I found the approach very interested. If it weren't for the lack of documentation (nowhere there it's explained how the file-naming works) I would adopt it. If you have any news on such tools, please do let me know, – TomCho – 2017-01-06T07:19:02.880

1From the page: "Tag information is stored in the filename" - so what do the tagged filenames look like? BTW, the links on that page are very interesting: +1. – Charles Stewart – 2010-01-19T09:28:30.910

report-for-bill[work stuff,hr,produced by me].odt – laramichaels – 2010-01-19T16:22:49.367

5

You probably don't need to install entire KDE desktop for their tagging library, Nepomuk. You would still have to install KDE base libraries, though...

anon

Posted 2009-12-10T20:46:15.300

Reputation:

1yeah well I was hoping to find an alternative to this, but it doesn't look so... – julien – 2009-12-11T10:54:43.913

2

Some other alternatives might be tagasistant, tagfs or dantalian.

student

Posted 2009-12-10T20:46:15.300

Reputation: 455

2

This recent article on Linux Desktop Search Tools mentions that Tracker supports tagging. Unfortunately it's supposed to be half-broken in the old version they tested. Maybe it's fixed now?

  1. Not system wide.
  2. You can back it up.
  3. It's bundled with Gnome.

Iain

Posted 2009-12-10T20:46:15.300

Reputation: 4 399

2

Try Beagle. I find it is pretty good.

It may not meet all the requirements, and I'm not sure what could. For example, do FIFO files support extended attributes? If they don't, Beagle has a fallback database.

pcapademic

Posted 2009-12-10T20:46:15.300

Reputation: 3 283

Can you please compare to TagSpaces? – Léo Léopold Hertz 준영 – 2017-07-09T07:22:29.903

That link does not refer to a project about document organisation. – detly – 2014-05-15T03:58:35.510

Can beagle handle non-regular files? – Charles Stewart – 2010-01-18T10:46:39.887

@Charles Stewart - do you mean non-text files? – pcapademic – 2010-01-18T17:13:04.927

No, I mean device files, symlinks, FIFOs, &c – Charles Stewart – 2010-01-18T18:55:57.577

1

So you won't find Nepomuk integration in gnome, at the command line, or elsewhere in Linux.

Conversely, with Tracker you won't find kde integration AFAIK. Not sure on CLI.

So unfortunately, the answer appears to be "no".

Even more unfortunately, this doesn't mean there's a good opportunity here for building one either. Linux commandline utilities dont have much in common with the GUI file manager, for example, so architecturally there's no common componentry which could be extended to support the concept.

pbr

Posted 2009-12-10T20:46:15.300

Reputation: 1 285

0

TMSU

TMSU is a tool for tagging your files. It provides a simple command-line utility for applying tags and a virtual filesystem to give you a tag-based view of your files from any other program.

TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever your put them. TMSU maintains its own database and you simply gain an additional view, which you can mount where you like, based upon the tags you set up.

Surprised no one has mentioned it.

justsomeguy

Posted 2009-12-10T20:46:15.300

Reputation: 1

2you missed it... it's the highest voted answer – pufferfish – 2018-02-16T15:00:47.467

0

I made a little program that uses SQLite for this purpose. It solved my need, but maybe it helps you too:

https://github.com/alvatar/dfym

The only issue with this approach is that does not synchronize with moves and deletions, but it solves the problem for relatively static files.

alvatar

Posted 2009-12-10T20:46:15.300

Reputation: 109

-1

I suggest taking a look at a version control system such as Subversion for these kinds of features above and beyond the file system. Some may be a better fit for you than others but generally:

  • Many support tagging (certainly subversion).
  • Many are cross platform; Windows, Mac, Linux, pretty much all Unixes.
  • Many have both GUI front ends and command line clients.
  • Many already have bindings for your favourite programming/scripting language.
  • Many are easily backed up.
  • Many are designed to be very easily shareable in one way or another.
  • Many allow you to control access.
  • You don't have to re-invent the wheel.
    • You learn and use standard commands/tools already used by millions.
  • You can install it today for your favourite OS repo; apt-get install, yum install
  • You also get version management "for free".

A cli example with Subversion: ~/svn/atestrepository: $ svn propset mytag "something" dir1 property 'mytag' set on 'dir1' $ svn propset myothertag "nothing" dir1/file1 property 'myothertag' set on 'dir1/file1' $ svn propset anemptytag "" dir1/file2 property 'anemptytag' set on 'dir1/file2'

$ svn propget -R mytag dir1 - something ~/svn/atestrepository: $ svn propget -R myothertag dir1/file1 - nothing $ svn propget -R anemptytag dir1/file2 - $ svn proplist dir1/file2 Properties on 'dir1/file2': anemptytag svn:keywords

I wouldn't recommend these tools is for large (gigabyte sized) regularly changing binary files but for everything else they are already well proven and scale to very large sizes.

Colin

Posted 2009-12-10T20:46:15.300

Reputation: 32