robots.txt is redirecting to default page
Hullo,
Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.
I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.
Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?
Notes:
- This site is the only one with a separate service provider that I
use. - CentOS release 6.10 (Final)
- Webmin
- robots.txt file permissions
are 644
redirect robots.txt
add a comment |
Hullo,
Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.
I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.
Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?
Notes:
- This site is the only one with a separate service provider that I
use. - CentOS release 6.10 (Final)
- Webmin
- robots.txt file permissions
are 644
redirect robots.txt
add a comment |
Hullo,
Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.
I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.
Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?
Notes:
- This site is the only one with a separate service provider that I
use. - CentOS release 6.10 (Final)
- Webmin
- robots.txt file permissions
are 644
redirect robots.txt
Hullo,
Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.
I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.
Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?
Notes:
- This site is the only one with a separate service provider that I
use. - CentOS release 6.10 (Final)
- Webmin
- robots.txt file permissions
are 644
redirect robots.txt
redirect robots.txt
asked 4 hours ago
ParapluieParapluie
1157
1157
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
add a comment |
To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "2"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
add a comment |
It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
add a comment |
It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 4 hours ago
Serge RivestSerge Rivest
461
461
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
add a comment |
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:
RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?– Parapluie
1 hour ago
Well blue blistering barnacles! You are right. And I did it to myself with this rewrite:
RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
Wishing I could upvote this twice!
– Parapluie
1 hour ago
add a comment |
To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
add a comment |
To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
add a comment |
To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.
To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.
answered 4 hours ago
yagmoth555♦yagmoth555
11.7k31742
11.7k31742
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
add a comment |
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?
– Parapluie
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
@Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case
– yagmoth555♦
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!
– Parapluie
1 hour ago
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown