Highest Voted 'web-crawler' Questions - IT Security Stack Exchange

85

votes

10 answers

How and why is my site being abused?

I own a popular website that allows people to enter a phone number and get information back about that phone number, such as the name of the phone carrier. It's a free service, but it costs us money for each query so we show ads on the site to help…

ip-spoofing botnet web-crawler

asked Dec 18 '19 at 14:15

Marc

699
1
4
4

3

votes

1 answer

How to crawl a web site if content is only visible to registered accounts?

I am reading abpit the attack amd defense strategies of web spiders. Assume I have sensitive information on my website, which should be protected from 3rd party web spiders. Use case #1: Me: I set the sensitive data only visible to registered user…

web-application web-crawler

asked Aug 02 '19 at 09:55

TJCLK

818
8
23

1

vote

1 answer

How do I prove that robots.txt was not provded

I want to scrape our university's learning platform website, to let myself know via notifications when a new entry added to any lesson. But, I'm scared that they'll put robots.txt afterwards and sue me or something, I don't know. I just don't have…

web-crawler

asked Mar 01 '21 at 09:56

Kenan

13
2

1

vote

1 answer

Why is my web site being scanned for license.txt, and should I be worried?

Lately I am seeing multiple daily 404s for variations of "license.txt", e.g., "wordpress/license.txt", "blog/license.txt", "old/license.txt", "new/license.txt". Here's a little snippet of slightly redacted logfile to illustrate: 5.189.164.217 - -…

http webserver apache log-analysis web-crawler

asked Apr 18 '20 at 19:48

C8H10N4O2

113
4

1

vote

1 answer

Questions about SOCKS5 security

I'm planning to start a distributed crawler in order to avoid common limitations imposed by servers/CDN like rate limit, region filter, and others. My idea is to have a central server and multiple agents that will run on different networks. These…

tls proxy socks5 web-crawler

asked Jan 01 '20 at 20:32

fenugurod

13
2

1

vote

1 answer

Why fingerprint a browser if a fingerprint can be replayed?

I'm facing an issue with rampant scraping and abuse on a website that costs me a good chunk of money to maintain. So, I have been looking to implement a few solutions, and apparently these solutions fingerprint the client in some form. However, the…

botnet fingerprinting web-crawler

asked Dec 24 '19 at 11:20

user22260

1

vote

1 answer

Does a searchable public database exist of (hostname; ip) mappings?

This question is not about the trivial usage of the forward/reverse DNS. Getting the IP of a hostname is trivial (DNS), and using reverse DNS, also we can get (typically) a single hostname of an IP. However, particularly for massive http…

dns attack-vector web-crawler virtualhost

asked Oct 13 '19 at 09:56

peterh

2,938
6
25
31

1

vote

0 answers

I run a web crawler on my localhost computer, can my ISP detects that?

I'm using an Internet plan of 100GB bandwidth monthly from my ISP, and I made a simple web crawler for fun and run it on my personal computer 24/7. The crawler is consuming all of the bandwidth, and I configured it to skip downloading media files…

isp bot web-crawler

asked Sep 03 '19 at 08:55

AccountantM

296
1
6

0

votes

0 answers

Threats that JavaScript poses to a web crawler

I'm writing a simple crawler with node.js, which searches for web pages and conditionally executes any JavaScript present. The problem is that in doing so, I execute code form untrusted sources in my node.js environment. Can running untrusted code…

javascript node.js web-crawler

asked Aug 08 '22 at 14:44

Trashbin2019np

1
2

0

votes

1 answer

How does msnbot keep finding my unpublished admin url?

I am a website developer (mainly using MVC.NET). Recently, we have been contacted by a hacker. He claimed that he knows our admin URL. The problem is we do not publish or put the admin URL anywhere on our webpage. The only place where the URL is…

sub-domain search-engines domain-admin web-crawler

asked Aug 02 '22 at 05:49

Sam

109
1

0

votes

0 answers

Risks of web crawlers on public buckets

So I have some data that isn't overly sensitive, but I'm still on the fence on whether or not we should invest the additional time into managing it as a private resource, vs just publicly available. The data (images & pdfs) are to be hosted on aws'…

data-leakage amazon-s3 web-crawler

asked Jan 26 '22 at 22:36

Francky_V

103
3

0

votes

1 answer

Why can't you give special security cookies to a specific crawler so that they could securely crawl the website?

In the current day and age we have the problem of malicious/spam crawlers and similar concerns. My suggestion would be implementing cookie support for crawling and by that I mean giving specific cookies with crawler ID (at best refreshed using…

cookies confidentiality web-crawler

asked Nov 15 '21 at 11:28

Munchkin

212
2
10

0

votes

0 answers

How to Spoof JA3 Signature?

I am using python requests library to make HTTP calls. However website bot detection is using JA3 fingerprint verification and blocking me. Is there any way I can spoof the JA3 signature.

tls http tcp web-crawler

asked Aug 11 '20 at 20:24

Ditti

1
1

0

votes

1 answer

How do attackers hit a website with thousands of similar but distinct IP addresses?

I have a website that is being hit with invalid URL requests by thousands of distinct IP addresses, never the same one used twice. Most of them are in a few ranges of IP addresses and often just go up sequentially. Could this be a zombie botnet of…

ddos web-crawler

asked Oct 03 '19 at 20:27

Pat James

141
1
6

0

votes

1 answer

Are AWS Signed URL's crawled by google?

I have used Amazon pre signed url to share content. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html Is google able to crawl this url? I'm sharing this url with just one client. What about other services? there…

google web-crawler

asked Sep 22 '19 at 03:35

Markus Bell

3
1

Questions tagged [web-crawler]