If you click the "Import" button of the "Select Your Own Information" page in the AI Chatbot creation process, you can access the URL Crawler by clicking its tab. You can then insert a root URL for crawling. Thanks to the URL Crawler, you can now simply insert just one root URL (or main URL) and Captivate Chat will use a web crawler to gather all URLs located inside that website - no need to manually find them yourself.
Using the URL Crawler
To use the URL Crawler feature, insert the root URL or main URL you want to use as a source for the crawl. Captivate Chat will deploy a web crawler to look for all URLs nested under this source. To use the URL Crawler feature, simply place a root URL in the box provided and click the
button.
After clicking "Submit" under "Import Your Own Information" > Import > URL Crawler, Captivate Chat will list all URLs nested under your root URL. It will appear under the "Start deep crawl" pop-up window. You can then select the ones you want your AI Chatbot to ingest and click "Import Selected" to proceed. Start deep crawl will appear after listing your root URL. It will list all the URLs available under that main link.
You can scroll down to choose which URLs you want your AI Chatbot to ingest, but you can also just click the checkbox beside the "URL" column name to select all the URLs listed in the crawl.
Wildcard Character
Using a wildcard character or an asterisk (*) at the end of the last front slash of a root URL inside the URL Crawler will make the Captivate Chat web crawler look for all relevant URLs within that website layer. If you want to make more precise searches with our URL Crawler, you can use what's called a wildcard character or an asterisk (*) at the end of your root URL.
What this does is to command the Captivate Chat web crawler to only retrieve URLs of that specific website layer. The Captivate Chat URL Crawler can retrieve from up to two (2) layers within a website. If we're using the above example:
Using only one wildcard character or asterisk (*) in the Captivate Chat URL Crawler will only retrieve URLs of one layer beneath the specified root URL. If you only use 1 Wildcard Character or asterisk (*) in the format url.com/abc/* in the URL Crawler, then you should be able to retrieve all URLs one layer below your root URL.
As a more technical example, if your website has this kind of arrangement:
/abc/def/123
/abc/def/123/456
/abc/ghi/123/456
/abc/jkl/
Then, inputting /abc/* in the URL Crawler will only retrieve:
/abc/def/
/abc/ghi/
/abc/jkl/
As we only crawled one layer downward.
Using only two wildcard characters or asterisks (**) in the Captivate Chat URL Crawler will only retrieve URLs of one layer beneath the specified root URL. If you use 2 Wildcard Characters or asterisks (**) in the format url.com/abc/** in the URL Crawler, then you should be able to retrieve all the URLs under your root URL.
As a more technical example, if your website has this kind of arrangement:
/abc/def/123
/abc/def/123/456
/abc/ghi/123/456
/abc/jkl/
Then, inputting /abc/def/** in the URL Crawler will only retrieve:
/abc/def/123
/abc/def/123/456
As we allowed the URL Crawler to look for all pages under /abc/def/, including lower layers.
After choosing the URLs you want to ingest, click
to proceed.