3
1
This question got me thinking about another project I've got on my 'to do' list. It's a 'multimedia' (blarg) piece with dynamically-generated video projection. I've run an early version of it, but I'd like the video to be partially generated by files on the performer's computer— the idea being that it's going to find your NSFW pics and show them (in mangled, blurred, BGRA-shifted form) buried within the piece. Blah blah, art, edgy, whatever.
Can anyone think of attributes those files would have that, say, vacation snapshots wouldn't? So far it's just image size: super-small images are probably internal support for applications, clip-art or icons. Super-big images probably came straight off a camera. (although maybe not!) But I think there's a way to get smarter than this-- I guess I'm thinking batches of files with non-identical but within-a-short-span creation dates (indicating download) and a "last opened" later than "last modified".
Surely there are some others, or reasons why the ones I list wouldn't work?
This will be a C++ binary but can call a shell script if that's a better way of doing what I need.
2Your method might, possibly, distinguish between "downloaded-from-web" and "copied-from-camera"; are you operating under the assumption that "99% of Net is porn, so statistically speaking, downloaded image == porn"? (Y'know, people do download other things - e.g., I have many vacation photos on my computer; those are SWF, even though they got there through that naughty Internet ;)) – Piskvor left the building – 2011-09-05T10:15:43.927
Is there a way to do this? I mean, files named
mEk202Wa3ptvozmqWeABYP7No1_500.jpeg
are a natural hit (though it will take some thinking to have software tell the difference between that andMSC1601.jpg
which would be a camera file). But, there's nothing in the file metadata that says where it came from, is there? – buildsucceeded – 2011-09-05T10:21:38.0604
I have yet to see the Evil Bit (or the complementing Naughty Bit) implemented in actual software. There's EXIF data, but that 1) is optional (and non-verifiable), and 2) usually contains data such as "camera make and model,flash used, GPS coordinates foo N bar W, date this-and-that". Note also that "vacation snapshots" and "spicy" could overlap (see the average Facebook profile); moreover, the eternal "what is spicy?" question - are bikinis spicy?
– Piskvor left the building – 2011-09-05T10:29:24.197Perhaps a scoring mechanism which (among other things) gives higher weight to directory paths including names like "stuff", "pers", "old", or "faxes"? – buildsucceeded – 2011-09-05T11:14:08.700
Is that supposed to be "spicy" or "SFW"? Looks perfectly neutral to me - what is the rationale for these (random-looking) names? What makes them special one way or another? – Piskvor left the building – 2011-09-05T12:16:58.113
@Piskvor The long one is what a photo from a content-managed website looks like (Facebook, etc) when you drag it to your desktop. The shorter one is a format similar to how pictures are named when you take them off a camera's SD card. – buildsucceeded – 2011-09-05T15:16:03.267
I was reacting to the "stuff","faxes" etc.; also, how is "this is from a website" indicate "therefore it's spicy"? Note also that I've had all sorts of weird picture names come off cameras:
Camera0001.jpg
,DCIM001.JPEG
,Andr0001.jpg
, etc etc. – Piskvor left the building – 2011-09-05T16:36:17.313This sounds vaguely similar to JWZ's WebCollage script - http://www.jwz.org/webcollage/ - aside from your idea pulling images from the hard drive, while WebCollage gets them from the net.
– Dave Sherohman – 2011-09-06T08:19:36.157I have to admit I'm a bit disappointed to see this closed. I think there's an interesting discussion to be had about maximizing the success of such a search, even though it's, yes, obviously not possible to separate these categories with anything like 100% success. Are there any system flags, sizes, filetypes (jpeg more than png?) or other things such a search could look for? Maybe this is better asked on StackOverflow? – buildsucceeded – 2011-09-06T20:24:16.647