What Is Scraping Camel AI?#

Scraping Camel AI technology is one of the features of the Scraping Camel extension. It is used for automated processing and analysis of website content using artificial intelligence. Instead of manually defining rules for each element, the system uses smart algorithms to independently evaluate text and generate values directly into the built-in AI elements. The parameters of the Scraping Camel AI feature are configured automatically, although for elements that generate keywords it is possible to adjust these parameters.

Overview of AI elements in Scraping Camel#

Scraping Camel has six AI elements. Their generation can be activated or deactivated within a specific project (website) by going to the Elements tab, clicking Edit elements, and then checking (or unchecking) them to enable (or disable) them.

SC_NUMBER_OF_WORDS (Page word count)#

The SC_NUMBER_OF_WORDS element contains a numeric value representing the number of words the algorithm found on the page after removing HTML code and non-essential content. It helps identify pages with a very low word count (e.g. under 50 words), which may be broken or could be considered low-quality or suspicious by search engines. It also reveals pages with an unusually high word count, where a faulty text import or processing error may have occurred.

SC_MAIN_TITLE (Machine-generated title)#

The SC_MAIN_TITLE element contains a short text description of the page generated based on multiple signals from the content. It is similar to the H1 and TITLE elements, but is written in lowercase, should not contain the website’s brand name, should be descriptive of the specific page, and should always be present. Compared to H1, it is more resistant to technical errors and invalid content. It should be unique to the page and not contain text that is common across the entire website.

SC_DETECTED_LANGUAGE (Detected language)#

The SC_DETECTED_LANGUAGE element contains the language code (e.g. cs, sk, pl) of the language in which the page text is written. It helps detect localisation errors, such as untranslated products from foreign suppliers or errors following the launch of a new language version of a website.

Keyword-generating elements#

The parameters of keyword-generating elements can be adjusted on the SettingsAI Settings page. Here you can define, for example, the maximum number of generated keywords, set a score threshold (the minimum percentage relevance a word must have to be included in the keyword selection), the minimum keyword length, rules for processing headings and numbers, define stop words (terms that the AI should completely ignore when generating keywords), or define phrases (multi-word expressions that the algorithm should process together as a single keyword).

SC_WORDS_COUNT#

A list of single-word keywords separated by commas. They provide information about the page content.

SC_WORDS_TUPLES_COUNT#

A list of two-word keywords (e.g. “PPC advertising”) separated by commas. They capture the specific topic of a page more accurately.

SC_WORDS_AGG_MIN_FREQ_3#

The most advanced and accurate method, which combines the results of the previous analyses and several other internal methods to generate the most relevant single-word and multi-word expressions.

Using data from AI elements#

The values of AI elements are available in the output CSV feed, which can be further processed in other tools. In Mergado, you can browse the data in detail, create queries from it, use it for data-driven SEO and reporting, or integrate it directly into PPC campaigns. Via the API, the data is also available for tools such as Mergado Marketing Buddy.

FAQ#

What is Scraping Camel AI?#

It is a set of Scraping Camel extension features that use artificial intelligence to automatically analyse website content and generate data into predefined AI elements — without the need to manually set rules or define what the extension should look for.

What can Scraping Camel AI detect from a website?#

It automatically detects the language of a page, counts the number of words on the page, and generates a machine-generated title describing the page content. It also extracts keywords — single-word, two-word, and their most relevant combination.

How do I enable or disable AI elements?#

On the Elements tab within a specific project, click Edit elements and activate or deactivate the individual AI elements by checking or unchecking them.

What is the SC_NUMBER_OF_WORDS element for?#

It contains the word count on a page after HTML code and non-essential content have been removed. It helps identify pages with too few words, which may be considered low-quality by search engines, as well as pages with an unusually high word count, where a faulty text import may have occurred.

What is SC_MAIN_TITLE and how does it differ from H1 or TITLE?#

It is a machine-generated page title based on multiple signals from the content. Unlike H1 and TITLE, it is always written in lowercase, does not include the website’s brand name, and is more resistant to technical errors and invalid content. It should be unique to each page.

What is the SC_DETECTED_LANGUAGE element for?#

It automatically detects the language of the text on a page and returns its code (e.g. cs, sk, pl). It helps detect localisation errors, such as untranslated products from a foreign supplier or pages that remained in the original language after a new language version of a website was launched.

What is the difference between SC_WORDS_COUNT, SC_WORDS_TUPLES_COUNT, and SC_WORDS_AGG_MIN_FREQ_3?#

SC_WORDS_COUNT generates a list of single-word keywords, SC_WORDS_TUPLES_COUNT generates a list of two-word expressions that capture the specific topic of a page more accurately. SC_WORDS_AGG_MIN_FREQ_3 is the most advanced method — it combines the results of the previous analyses and other internal methods and selects only the most relevant single-word and multi-word expressions.

Which keyword element should I use?#

For most use cases, we recommend SC_WORDS_AGG_MIN_FREQ_3, as it provides the cleanest and most relevant results. The other two elements are suitable when you need to work separately with single-word or two-word expressions.

Can I adjust the parameters of AI elements?#

Parameters can be adjusted for keyword-generating elements in the Settings → AI Settings section. Here you can set, for example, the maximum number of generated keywords, the score threshold, the minimum word length, stop words, or phrases. The elements SC_NUMBER_OF_WORDS, SC_MAIN_TITLE, and SC_DETECTED_LANGUAGE are configured automatically and their parameters cannot be changed.

Where do I find data from AI elements and how can I use it?#

Data is available in the output CSV feed. In Mergado, you can browse it in detail, create queries from it, use it for data-driven SEO and reporting, or integrate it into PPC campaigns. Via the API, the data is also available for tools such as Mergado Marketing Buddy.

Was this article helpful?