However, some sites may not allow you to view Sitemap.xml without any authorization. In this article, we will check how to work around this problem.
Using Postman
Postman is a convenient way to check the operation of the API, but this time we will use it to check the acquisition of sitemap.xml from www.sitecore.com.
First of all, what does it look like when accessed with a browser? The XML data will be displayed on the screen as shown below.
Check if this Sitemap.xml can be obtained by HTTP GET. To do so, start Postman and enter the URL for GET to access the file. You will see the following screen.
In this situation, the XML file has not been obtained. Therefore, for the Headers, specify User-Agent for the Key and sitecorebot for the Value. This will specify the User-Agent of the crawler accessing the server.
Thus, XML data could be retrieved.
By using these tips, you can confirm in advance whether the Sitecore Search crawler can retrieve the sitemap.xml file. Of course, it is possible to retrieve the file without User-Agent, but since User-Agent is required to retrieve the file in this case, we were able to confirm in advance the points to be considered when retrieving the data.
Summary
This is not a Sitecore Search-specific tip, but we were able to confirm that setting the User-Agent when retrieving data allows the procedure to proceed without incident.