What software is needed for membership websites and how can they still be indexed by Google

2

1

I notice that in some cases paywalled news articles seem to have been indexed by Google because excerpts from the story appears in the search hit.

However, when I go to these web sites using a Googlebot (robot) identity the information is not there to crawl the article. This would seem to suggest that the publisher is somehow submitting their paywalled articles (and associated URLs) to Google and not having them crawled. Obviously such a submission would be non-trivial because it would have to have both the content of the article and various metadata concerning it, such as the URL where it is located and its expiration date.

Does such a mechanism exist? If so, can an ordinary webmaster such as myself, use it?

Tyler Durden

Posted 2016-03-29T17:42:39.800

Reputation: 4 710

Question was closed 2018-05-21T19:32:52.423

3

Have you tried Google to get the answer? It is a simple process and even if you word it partially wrong, Google knows what your getting at. https://support.google.com/webmasters/answer/6259634?hl=en

– acejavelin – 2016-03-29T18:14:49.477

@acejavelin That would be too meta. – Tyler Durden – 2016-03-29T18:29:37.917

1No, that is appropriate. Superuser isn't your personal information database, it is expected that users do there own research prior to asking questions here. – acejavelin – 2016-03-29T18:31:44.633

I did do research, which I described in my post. I could not find any such service that I describe, yet I concluded it might still exist based on my robots research. That's why I am asking. – Tyler Durden – 2016-03-29T18:36:32.077

well, I did give you a link to Google's official answer. – acejavelin – 2016-03-29T18:39:42.127

Was this in Google Search or Google News? – unor – 2016-03-31T14:58:58.947

@ DavidPostill: I think this post should not have been closed so fast, having both a bounty and upvotes. Membership websites, membership software and webbots are certainly related to software. See my answer if unconvinced. I'm a bit miffed to find the post was closed after attempting to give a good answer. – harrymc – 2018-05-21T20:16:28.240

@harrymc It wasn't particularly fast. I'm not sure what brought this to the front page yesterday, but the question is more than 2 years old. Also, it is very clearly not about computer hardware or software within the definition of the help center. If the question were fresh and had good answers, an argument could be better made. But a moldy oldy that's off topic and has no answers? I agree with Mod Postill on this. – music2myear – 2018-05-22T21:18:27.550

@music2myear: Membership websites are as relevant today as 2 years ago, and somebody cared enough just now to put up a bounty. Bounty posts were supposed to be protected, so it should only had been closed for an excellent reason. It was closed soon after I put up my answer in which I tried to cover comprehensively the subject, so there were answers. I thought that closing a bounty post is against SU rules, even if a moderator can override them. – harrymc – 2018-05-23T06:08:23.053

I don't recall that rule but will look for it. But on its face I disagree with the idea. A question asking about how to install an app on an iPhone is off-topic, whether it has a bounty or not. A question about wiring your house for electricity is off-topic, bounty or not. Stating that having a bounty is a guaranteed protection doesn't make sense to me at this moment. – music2myear – 2018-05-23T15:22:56.440

Did some Meta trawling and found the relevant posts on closing Bounties. The consensus appears to be that closing bounty questions is OK it they otherwise do not fit the rules of the site, but that because the bounty must be removed (due to the design of the site) prior to closing, the procedure is to flag it for Mod attention so that they can remove the bounty and then close the question. – music2myear – 2018-05-23T15:35:22.883

@music2myear: I must say I'm at a pain to understand why googlebot, a software program, is not for SU. Help: "Super User is for computer enthusiasts and power users. If you have a question about … - computer hardware, - computer software, or, - personal and home computer networking". Googlebot applies to at least 2 of these. – harrymc – 2018-05-24T06:44:25.663

1Wordpress is a software program, so is Amazon ECS and Azure. The key difference in this case being these primarily exist outside of the user's desktop computer and instead function and "live" entirely on that thing we call the Web, are interacted with via a web browser or possibly via a local application that approximates the web interface. – music2myear – 2018-05-24T15:37:58.790

1

What is on-topic[1]: If you have a question about <snip> and it is not about <snip> websites or web services like Facebook, Twitter, ... Obviously, search engine is a web service and this question should be redirected to WebApps or WebMasters. Actually, OP's previous attempt to bump this question had been clearly declined. Starting a bounty to prevent question from closing is an act of gaming the system and should not be honored.

– guest-vm – 2018-05-24T16:28:52.863

@music2myear: Google is a tool we use daily on the desktop, and the poster wanted explanations of results that he had from his desktop. Googlebot is the subject of several other posts here that were not closed - the boundaries between SU and other SO sites are often a gray area - usually the specialized sites migrate generalized software queries to SU, being more concerned with programming and configuring. – harrymc – 2018-05-25T06:15:32.603

@guest-vm: Starting a bounty to prevent a question from closing is often done here and I see nothing wrong with - not all closing votes are justified. Sometimes a good answer serves to recenter a post within the bounds of SU, and sometimes the members of the forum reword it to conform. Usually a moderator advertises his intention to close a post if not reworded, leaving some time for the poster and answerers to correct it. Closing a bounty post this way is a bit extreme. – harrymc – 2018-05-25T06:22:29.297

Answers

2

Yes, it is possible

Google has a page called Get your content on Google, which, as of today, 21 May 2018, is a comprehensive reference for how to get your contents indexed by Google. There are various links on it which you might want to try, including:

  • Add your URL
  • App crawling
  • Search Console
  • Search Engine Optimization (SEO) Starter Guide

This answer has been posted by @acejavelin two years and one month ago as a comment. Perhaps the page to which we linked was not as comprehensive as it is today, or else I don't see why he/she didn't post it as a full answer. Also, I see the OP deeming this page "too meta" at the time, but today, it is exactly what he/she wants.

Websites can detect bogus Googlebots

Websites sometimes prevent their web contents from being crawled by web browsers that use bogus Googlebot user agent strings. You can find more information about this subject in the Panopticlick website of the Electronic Frontier Foundation. But to put it short, Googlebot has a other features of identification than just a user agent.

user477799

Posted 2016-03-29T17:42:39.800

Reputation:

1

The fact that the company's webserver has returned the infamous HTTP error 404 to a URL does not mean that the resource does not exist. It only means that the webserver has decided that for you this resource does not exist.

The webserver can identify you as a paying customer by many methods, chief among them is an identifying HTTP cookie stored in your browser. When the cookie is not found, the webserver will usually ask you to login, and if successful will then return that cookie.

The question is then why is Googlebot allowed access, but you are not ?

Googlebot will eventually discover almost any website, but the webmaster can request an early visit by using the tools contained in Get your content on Google. He can also direct the bot to certain folders by using a Robots.txt file.

An example of such a file is :

User-agent: googlebot
User-agent: google
User-agent: bingbot
User-agent: bing
Disallow: /bedven/bedrijf/
Crawl-delay: 10

User-agent: *
Disallow: /

The bot identifies itself by using in the header of the HTTP request a User agent tag, for example googlebot.

However, assuming the identity of Googlebot is not an easy matter. The website can easily verify the bot's identity by doing a reverse DNS lookup on the accessing IP address. The returned domain name must in that case be either googlebot.com or google.com, which is something that you yourself cannot fake.

If you fully control your webserver, for example via PHP, you can duplicate this mechanism and create what is called a "membership website". Such software is called Membership Software.

If you are not a PHP programmer, or are unwilling for such an investment of your time, there exist many open-source software alternatives, but also lots of commercial products that will compete for your business. Be very critical if you decide to choose one, and check it thoroughly on the web for reviews.

For more information see these resources that I found via a search (not necessarily the best ones, and some are quite commercial in nature, but they will get you started) :

harrymc

Posted 2016-03-29T17:42:39.800

Reputation: 306 093