I have a Samba-based fileserver with lots of gigs of data on it, mostly Word, Excel, OpenOffice and PDF documents.
I've set up a simple web based search interface (Apache, PHP, mlocate) that just goes on filepaths + mtime. It works, for that, but it would be great to have all the documents indexed by Apache Solr, as by all accounts this is blazingly fast and can cope with all these different document types.
But it's a fileserver, not a website, so I'd need something to crawl all files, and keep crawling and re-indexing the updated ones; people aren't "POST"ing documents, they're just pressing Save.
Is there a project out there that does this?