URL Crawler

Thanks to the URL Crawler, you can now simply insert just one root URL (or main URL) and Captivate Chat will use a web crawler to gather all URLs located inside that website - no need to manually find them yourself.

Use Direct Input if you have a specific set of URLs to ingest

If you want your AI Chatbot to ingest and learn from all available URLs within a website, the URL Crawler is a convenient tool to use!

However, you should use Import > Direct Input if you only have a specific set of URLs you want your AI Chatbot to ingest.

Using the URL Crawler

To use the URL Crawler feature, insert the root URL or main URL you want to use as a source for the crawl. Captivate Chat will deploy a web crawler to look for all URLs nested under this source.

To use the URL Crawler feature, simply place a root URL in the box provided and click the button.

After clicking "Submit" under "Import Your Own Information" > Import > URL Crawler, Captivate Chat will list all URLs nested under your root URL. It will appear under the "Start deep crawl" pop-up window. You can then select the ones you want your AI Chatbot to ingest and click "Import Selected" to proceed.

Start deep crawl will appear after listing your root URL. It will list all the URLs available under that main link.

You can scroll down to choose which URLs you want your AI Chatbot to ingest, but you can also just click the checkbox beside the "URL" column name to select all the URLs listed in the crawl.

Wildcard Character

If you want to make more precise searches with our URL Crawler, you can use what's called a wildcard character or an asterisk (*) at the end of your root URL.

What this does is to command the Captivate Chat web crawler to only retrieve URLs of that specific website layer. The Captivate Chat URL Crawler can retrieve from up to two (2) layers within a website. If we're using the above example:

1 Wildcard Character: `url.com/abc/*`

Using only one wildcard character or asterisk (*) in the Captivate Chat URL Crawler will only retrieve URLs of one layer beneath the specified root URL.

If you only use 1 Wildcard Character or asterisk (*) in the format url.com/abc/* in the URL Crawler, then you should be able to retrieve all URLs one layer below your root URL.

As a more technical example, if your website has this kind of arrangement:

/abc/def/123

/abc/def/123/456

/abc/ghi/123/456

/abc/jkl/

Then, inputting /abc/* in the URL Crawler will only retrieve:

/abc/def/

/abc/ghi/

/abc/jkl/

As we only crawled one layer downward.

This is useful for retrieving all pages within a single website section, such as the specific subcategories within a category of products.

For instance, a URL Crawler can give you all brands (subcategories) of laptop (laptop) in an electronics store.

2 Wildcard Characters: `url.com/abc/**`

If you use 2 Wildcard Characters or asterisks (**) in the format url.com/abc/** in the URL Crawler, then you should be able to retrieve all the URLs under your root URL.

As a more technical example, if your website has this kind of arrangement:

/abc/def/123

/abc/def/123/456

/abc/ghi/123/456

/abc/jkl/

Then, inputting /abc/def/** in the URL Crawler will only retrieve:

/abc/def/123

/abc/def/123/456

As we allowed the URL Crawler to look for all pages under /abc/def/, including lower layers.

This is useful for retrieving specific sub-types of a subcategory within a category of products.

For instance, a URL Crawler will give you all models (sub-type) or even models of a specific year (deeper sub-type) of a brand (subcategory) of a laptop (category) in an electronics store.

After choosing the URLs you want to ingest, click to proceed.

PreviousImport Your Own Information NextSelect Type

Last updated 6 months ago

Was this helpful?

Using the URL Crawler

Wildcard Character

1 Wildcard Character: url.com/abc/*

2 Wildcard Characters: url.com/abc/**

1 Wildcard Character: `url.com/abc/*`

2 Wildcard Characters: `url.com/abc/**`