Most Popular


What Is Adsense and how can it Benefit You?
Adsense is a popular type of advertising through the commonlyused ...
How to Successfully Make Money with Your Craft
If you have chosen to make a particular craft or ...
Exercise and Arthritis
Your bones hang out in a lot of joints. Knee ...


What Search Engine Spiders Do...

Rated: , 0 Comments
Total visits: 82
Posted on: 31st Oct 2015
Search engine spiders are by far one of the most useful things
to come around in the last 10 years of the internet. They are
useful not only to the web sites(Google and many others) that
use them, but also to people who are searching for a particular
site and those who run web sites. Spiders allow your site to be
seen by the millions of people who use search engines every day.
In this newsletter, we will discuss what search engine spiders
do, how they work, and how to set up a robots.txt file and
upload that to your site to keep spiders from visiting your
site.

What are spiders and what purpose do they serve?

Spiders are essentially programs that �crawl� sites and report
back to their superior(Google or whatever search engine they
were created for) what their findings are. Their purpose is to
make it easy for sites to get listed in search engines.

You might be wondering, what does it mean to �crawl� a site?
Well it means to visit and site and copy the information.

How do spiders work?

Spiders work by finding links to web sites, visiting those web
sites, going through the content of a web site and then
reporting the content of the site back to the database of the
site which they are working for. Google spiders, thus, crawl
sites and report the information back to Google�s database. From
there, the information is added to Google�s search engine, and
the site then shows up in Google search results. Much the same
process happens with any other search engine spider.

How can I keep spiders from visiting my site?

You might be thinking, �why would I want to keep such a useful
thing from visiting my site?� Well, the short answer is,
sometimes site owners don�t want the spider to crawl on a
particular part of their site. Some site owners don�t want
spiders to crawl their site at all. The reasons for not wanting
a spider to crawl a site or a particular part of a site vary,
although most of the time it is because the site is either
completely spam or features a page or two of spam. 

If you�re one of those site owners, then you�ll want to create
and upload something called a robots.txt file. We will briefly
go over how to do this.

A robots.txt file

The whole purpose of a robots.txt file is to tell a search
engine spider not to crawl the site or part of the site on which
the robots.txt file resides.
Creating the file

Creating a robots.txt file that blocks out spiders is easy.
First, open up notepad. Then, copy and paste the following:

User-agent: *
Disallow: /

Once you�ve done that, save the file as �robots� and as a .txt
file.

Uploading the file

Next, you will upload the file to the part of your site which
you do not want the spider to visit. So, if you don�t want them
to visit yoursite.com/news/, you�ll upload robots.txt to the
news folder. If you don�t want the search engine spider to visit
your site as well, upload robots.txt to your index folder.
That�s all there is to it.

Using the robots.txt file to make sure search engine spiders DO
visit your site

Believe it or not, the robots.txt file can be used to both
disallow and allow search engine spiders to crawl your site.
Here�s how to create and upload such a file.

Creating the file

Open up notepad and copy and paste in the following:

User-agent: *
Disallow:

You�ll notice that the only difference between this and the
earlier example is that Disallow: is not followed with /. If it
were, that would tell spiders to go away. Once again, save the
file as robots.txt.

Uploading the file

All you�ll do is upload the robots.txt file to the part of your
site that you want the robot to pay a visit to. So if you want
the robot to see the whole site, just put the robots.txt file
right alongside the index file. And you�re done.

Creating and uploading a robots.txt file to help make sure
spiders don�t miss your site is fast and easy. So what are you
waiting for? Create and upload that file now! 


Comments
There are still no comments posted ...
Rate and post your comment


Login


Username:
Password:

Forgotten password?




MySQL Error in Query:

INSERT INTO blogsystem_user_referrals (`date`,`username`,`host`,`page`) VALUES ('05/19/2024','dgarrand1','172.69.58.210','/dgarrand1/post-what-search-engine-spiders-do-25933.html')

Table './12scblog/blogsystem_user_referrals' is marked as crashed and should be repaired