1

I have a Samba-based fileserver with lots of gigs of data on it, mostly Word, Excel, OpenOffice and PDF documents.

I've set up a simple web based search interface (Apache, PHP, mlocate) that just goes on filepaths + mtime. It works, for that, but it would be great to have all the documents indexed by Apache Solr, as by all accounts this is blazingly fast and can cope with all these different document types.

But it's a fileserver, not a website, so I'd need something to crawl all files, and keep crawling and re-indexing the updated ones; people aren't "POST"ing documents, they're just pressing Save.

Is there a project out there that does this?

artfulrobot
  • 2,627
  • 11
  • 30
  • 56

2 Answers2

1

Check out inotify. It will notify you about file system events instantaneously.

belteshazzar
  • 292
  • 4
  • 9
  • Useful, thanks. As the case is so general ("I want people to be able to search for files on the fileserver") do you know of any projects that do this? I want to avoid having to write my own stuff if there's a good project already written. – artfulrobot Aug 29 '12 at 10:29
  • Not that I know of. If I had to do it though, I would write a daemon using [pyinotify](http://pyinotify.sourceforge.net/) that would store the info in a mysql database, which can then be searched by the webapp. – belteshazzar Aug 29 '12 at 17:51
0

I'm not sure if this is what the asker wants but others looking for a web interface to mlocate, have a look at this:

https://github.com/kaazoo/weblocate