URL Crawler
Last updated
Last updated
Thanks to the URL Crawler, you can now simply insert just one root URL (or main URL) and Captivate Chat will use a web crawler to gather all URLs located inside that website - no need to manually find them yourself.
Use Direct Input if you have a specific set of URLs to ingest
If you want your AI Chatbot to ingest and learn from all available URLs within a website, the URL Crawler is a convenient tool to use!
However, you should use Import > Direct Input if you only have a specific set of URLs you want your AI Chatbot to ingest.
Start deep crawl will appear after listing your root URL. It will list all the URLs available under that main link.
You can scroll down to choose which URLs you want your AI Chatbot to ingest, but you can also just click the checkbox beside the "URL" column name to select all the URLs listed in the crawl.
If you want to make more precise searches with our URL Crawler, you can use what's called a wildcard character or an asterisk (*) at the end of your root URL.
What this does is to command the Captivate Chat web crawler to only retrieve URLs of that specific website layer. The Captivate Chat URL Crawler can retrieve from up to two (2) layers within a website. If we're using the above example:
url.com/abc/*
If you only use 1 Wildcard Character or asterisk (*) in the format url.com/abc/*
in the URL Crawler, then you should be able to retrieve all URLs one layer below your root URL.
As a more technical example, if your website has this kind of arrangement:
/abc/def/123
/abc/def/123/456
/abc/ghi/123/456
/abc/jkl/
Then, inputting /abc/*
in the URL Crawler will only retrieve:
/abc/def/
/abc/ghi/
/abc/jkl/
As we only crawled one layer downward.
This is useful for retrieving all pages within a single website section, such as the specific subcategories within a category of products.
For instance, a URL Crawler can give you all brands (subcategories) of laptop (laptop) in an electronics store.
url.com/abc/**
If you use 2 Wildcard Characters or asterisks (**) in the format url.com/abc/**
in the URL Crawler, then you should be able to retrieve all the URLs under your root URL.
As a more technical example, if your website has this kind of arrangement:
/abc/def/123
/abc/def/123/456
/abc/ghi/123/456
/abc/jkl/
Then, inputting /abc/def/**
in the URL Crawler will only retrieve:
/abc/def/123
/abc/def/123/456
As we allowed the URL Crawler to look for all pages under /abc/def/
, including lower layers.
This is useful for retrieving specific sub-types of a subcategory within a category of products.
For instance, a URL Crawler will give you all models (sub-type) or even models of a specific year (deeper sub-type) of a brand (subcategory) of a laptop (category) in an electronics store.
To use the URL Crawler feature, simply place a root URL in the box provided and click the button.
After choosing the URLs you want to ingest, click to proceed.