robots.txt is redirecting to default page












2















Hullo,



Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



Notes:




  • This site is the only one with a separate service provider that I
    use.

  • CentOS release 6.10 (Final)

  • Webmin

  • robots.txt file permissions
    are 644










share|improve this question



























    2















    Hullo,



    Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



    I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



    Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



    Notes:




    • This site is the only one with a separate service provider that I
      use.

    • CentOS release 6.10 (Final)

    • Webmin

    • robots.txt file permissions
      are 644










    share|improve this question

























      2












      2








      2








      Hullo,



      Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



      I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



      Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



      Notes:




      • This site is the only one with a separate service provider that I
        use.

      • CentOS release 6.10 (Final)

      • Webmin

      • robots.txt file permissions
        are 644










      share|improve this question














      Hullo,



      Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.



      I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.



      Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?



      Notes:




      • This site is the only one with a separate service provider that I
        use.

      • CentOS release 6.10 (Final)

      • Webmin

      • robots.txt file permissions
        are 644







      redirect robots.txt






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 4 hours ago









      ParapluieParapluie

      1157




      1157






















          2 Answers
          2






          active

          oldest

          votes


















          3














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer








          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            1 hour ago











          • Wishing I could upvote this twice!

            – Parapluie
            1 hour ago



















          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer
























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            1 hour ago











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            1 hour ago











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            1 hour ago











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "2"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer








          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            1 hour ago











          • Wishing I could upvote this twice!

            – Parapluie
            1 hour ago
















          3














          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer








          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





















          • Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            1 hour ago











          • Wishing I could upvote this twice!

            – Parapluie
            1 hour ago














          3












          3








          3







          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.






          share|improve this answer








          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.










          It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.







          share|improve this answer








          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 4 hours ago









          Serge RivestSerge Rivest

          461




          461




          New contributor




          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Serge Rivest is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.













          • Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            1 hour ago











          • Wishing I could upvote this twice!

            – Parapluie
            1 hour ago



















          • Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

            – Parapluie
            1 hour ago











          • Wishing I could upvote this twice!

            – Parapluie
            1 hour ago

















          Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

          – Parapluie
          1 hour ago





          Well blue blistering barnacles! You are right. And I did it to myself with this rewrite: RewriteRule .(gif|jpg|js|txt)$ https://www.thesiteinquestion.com/index.php [L]. I did this to prevent direct access, but I forgot that I added txt files as well. Comment it out, and it works a trice. Question: is there anyway to conditionally exclude files (this robots.txt file, in particular) from a rewrite?

          – Parapluie
          1 hour ago













          Wishing I could upvote this twice!

          – Parapluie
          1 hour ago





          Wishing I could upvote this twice!

          – Parapluie
          1 hour ago













          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer
























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            1 hour ago











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            1 hour ago











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            1 hour ago
















          1














          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer
























          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            1 hour ago











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            1 hour ago











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            1 hour ago














          1












          1








          1







          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.






          share|improve this answer













          To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 4 hours ago









          yagmoth555yagmoth555

          11.7k31742




          11.7k31742













          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            1 hour ago











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            1 hour ago











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            1 hour ago



















          • The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

            – Parapluie
            1 hour ago











          • @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

            – yagmoth555
            1 hour ago











          • That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

            – Parapluie
            1 hour ago

















          The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

          – Parapluie
          1 hour ago





          The "web provider" is not forced to respect the standard? Am I misunderstanding?: Do you mean the crawler?

          – Parapluie
          1 hour ago













          @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

          – yagmoth555
          1 hour ago





          @Parapluie I mean the hoster is not forced to follow the robots.txt standard, and thus crawler must adapt to such case

          – yagmoth555
          1 hour ago













          That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

          – Parapluie
          1 hour ago





          That is interesting and germane. Thankfully, I have full access to the config in this case (even though my having access was the problem in the first place, at least I can fix it!) Thanks!

          – Parapluie
          1 hour ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Server Fault!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952682%2frobots-txt-is-redirecting-to-default-page%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Lallio

          Futebolista

          Jornalista