Are there attributes that 'spicy' files tend to have which 'clean' files don't?

3

1

This question got me thinking about another project I've got on my 'to do' list. It's a 'multimedia' (blarg) piece with dynamically-generated video projection. I've run an early version of it, but I'd like the video to be partially generated by files on the performer's computer— the idea being that it's going to find your NSFW pics and show them (in mangled, blurred, BGRA-shifted form) buried within the piece. Blah blah, art, edgy, whatever.

Can anyone think of attributes those files would have that, say, vacation snapshots wouldn't? So far it's just image size: super-small images are probably internal support for applications, clip-art or icons. Super-big images probably came straight off a camera. (although maybe not!) But I think there's a way to get smarter than this-- I guess I'm thinking batches of files with non-identical but within-a-short-span creation dates (indicating download) and a "last opened" later than "last modified".

Surely there are some others, or reasons why the ones I list wouldn't work?

This will be a C++ binary but can call a shell script if that's a better way of doing what I need.

buildsucceeded

Posted 2011-09-05T09:59:15.440

Reputation: 159

Question was closed 2011-09-05T17:48:25.903

2Your method might, possibly, distinguish between "downloaded-from-web" and "copied-from-camera"; are you operating under the assumption that "99% of Net is porn, so statistically speaking, downloaded image == porn"? (Y'know, people do download other things - e.g., I have many vacation photos on my computer; those are SWF, even though they got there through that naughty Internet ;)) – Piskvor left the building – 2011-09-05T10:15:43.927

Is there a way to do this? I mean, files named mEk202Wa3ptvozmqWeABYP7No1_500.jpeg are a natural hit (though it will take some thinking to have software tell the difference between that and MSC1601.jpg which would be a camera file). But, there's nothing in the file metadata that says where it came from, is there? – buildsucceeded – 2011-09-05T10:21:38.060

4

I have yet to see the Evil Bit (or the complementing Naughty Bit) implemented in actual software. There's EXIF data, but that 1) is optional (and non-verifiable), and 2) usually contains data such as "camera make and model,flash used, GPS coordinates foo N bar W, date this-and-that". Note also that "vacation snapshots" and "spicy" could overlap (see the average Facebook profile); moreover, the eternal "what is spicy?" question - are bikinis spicy?

– Piskvor left the building – 2011-09-05T10:29:24.197

Perhaps a scoring mechanism which (among other things) gives higher weight to directory paths including names like "stuff", "pers", "old", or "faxes"? – buildsucceeded – 2011-09-05T11:14:08.700

Is that supposed to be "spicy" or "SFW"? Looks perfectly neutral to me - what is the rationale for these (random-looking) names? What makes them special one way or another? – Piskvor left the building – 2011-09-05T12:16:58.113

@Piskvor The long one is what a photo from a content-managed website looks like (Facebook, etc) when you drag it to your desktop. The shorter one is a format similar to how pictures are named when you take them off a camera's SD card. – buildsucceeded – 2011-09-05T15:16:03.267

I was reacting to the "stuff","faxes" etc.; also, how is "this is from a website" indicate "therefore it's spicy"? Note also that I've had all sorts of weird picture names come off cameras: Camera0001.jpg, DCIM001.JPEG, Andr0001.jpg, etc etc. – Piskvor left the building – 2011-09-05T16:36:17.313

This sounds vaguely similar to JWZ's WebCollage script - http://www.jwz.org/webcollage/ - aside from your idea pulling images from the hard drive, while WebCollage gets them from the net.

– Dave Sherohman – 2011-09-06T08:19:36.157

I have to admit I'm a bit disappointed to see this closed. I think there's an interesting discussion to be had about maximizing the success of such a search, even though it's, yes, obviously not possible to separate these categories with anything like 100% success. Are there any system flags, sizes, filetypes (jpeg more than png?) or other things such a search could look for? Maybe this is better asked on StackOverflow? – buildsucceeded – 2011-09-06T20:24:16.647

Answers

0

Fellow, the only relatively safe way to do that is to save EVERY spicy file in another hard drive which is not mounted on system startup.

Better yet, in an encrypted partition, so you do not "forget it mounted" by mistake.

But never, NEVER let clean and spicy files be mixed up.

Also: some site's favicons, although thumbnail-sized, may be VERY embarrassing.

heltonbiker

Posted 2011-09-05T09:59:15.440

Reputation: 129

I think you're misunderstanding the intent: The idea is that the performer is intentionally giving the software access to these files. So we don't want a "safe" way, we want an "unsafe" one. Good point about the favicons, though. – buildsucceeded – 2011-09-05T13:27:18.967

I know, but I actually believe there is no way, whatsoever, to be 100% sure that any slideshow software, as smart as it might be, could detect and block EVERY 'spicy' content from being shown, specially if the spicy and normal contents are mixed. If only one spicy photo is shown in the wrong time, the damage is done. Now, if it is not about spicy content the OP is talking about, the question's title should be rewritten. – heltonbiker – 2011-09-05T14:02:46.777

2That might be true, but it's not what the OP is asking. – buildsucceeded – 2011-09-05T15:14:52.270

Well, in that case (unfortunately) I think the answer is a solid "NO", because jpegs are jpegs. Only humans with their ethic and aesthetic senses could judge these images. – heltonbiker – 2011-09-05T17:44:20.673