Tailwind Logo

Sitecore Search - Retrieve site content

Search

Published: 2023-08-09

Sitecore Search is a crawl-based search engine. In this article, we will use the source to register data to be retrieved for search by Search.

Content Update

Please check the latest information on the following page


Adding Sources

The plug icon on the left side of the administration screen is the settings screen where you can enter information for crawling.

SearchAddSource00.png

When you click on the button, you will initially see a blank source screen. Click on the Add Source button in the upper right corner.

SearchAddSource01.png

We will set up resources related to the sources to be added. In this case, we will include the following data

Name

Value

Source name

haramizucom

Description

haramizu.com

Connector

Web Crawler ( Advanced )

SearchAddSource02.png

Click Save to save the new source. You will be asked to set what kind of data you want to import for this source.

SearchAddSource03.png

Web Crawler Settings

We will configure the crawler behavior. First, it is important to note that when crawling by URL, there is an "Allowed Domain" field. If this field is left blank, the crawler will crawl external links on the crawled page. In this case, we will set "haramizu.com" since it is only within the site.

SearchAddSource04.png

Next, configure the settings for the crawler's agent. This is set in the Header field, this time with the name "sitecorebot".

SearchAddSource09.png

Available Locales

Since this blog is only in Japanese, set the crawl language as Japanese.

SearchAddSource05.png

Triggers

For triggers, we will use RSS feeds, but you can also choose to use sitemap.xml, etc. Please look at your site's data to determine what trigger to set if necessary.

SearchAddSource06.png

Document Extractors

The next setting is Document Extractors, which translates to document extraction tool settings. In other words, we will set which values the crawler will retrieve.

In this case, we will use the Xpath items that are deployed by default.

SearchAddSource07.png

If you go to Tagger.

SearchAddSource08.png

Launch Xpath Helper, see our previous blog post on Xpath Helper.

The Tagger set up is as follows

Attribute

Rule type

Rule value

description

Expression

//meta[@property='og:description']/@content

image_url

Expression

//meta[@property='og:image']/@content

name

Expression

//meta[@property='og:title']/@content

type

Expression

//meta[@property='og:type']/@content

url

Expression

//meta[@property='og:url']/@content

Check if the Rule Value can be retrieved on the page using the Xpath Finder. The screen below shows the screen where the data is checked using the Description.

SearchAddSource10.png

Start of crawl

After completing the above settings, click the Publish button. A dialog box will appear as shown below.

SearchAddSource11.png

Wait for a while for the data to be added to the crawler's queue.

SearchAddSource12.png

After a short time, the crawl is completed and the data is stored in Search.

SearchAddSource13.png

Summary

In this issue, we have proceeded to the crawler setup. In the next issue, we would like to check how the imported contents are stored.

Tags