Skip to main content

Cyware Threat Intelligence eXchange

Add New Web Scraper URL

Add a new web scraper URL in  Intel Exchange to extract threat intel from web pages. 

Before you Start

  • Ensure that you have View Web Scraper permissions to access Web Scraper. 

  • Ensure that you have View Feed Source, Create Feed Source, and Update Feed Source permissions.

Steps

To add a new Web Scraper URL for threat data extraction, follow these steps:

  1. Go to Administration > Integrations Management > FEED SOURCES > Web Scraper.

  2. Click Add Web Scraper.

  3. Enter the web page URL to scrape data. For example, https://www.somedomain.com/.

  4. Select from the following content types based on the URL type you enter in the Web URL:

    • Text: Fetches data from a .txt file-based URL.

    • CSV: Fetches data from a CSV file-based URL. Choose a parsing delimiter to split the data by using commas, hash, colon, semicolons, and more.

    • JSON: Fetches data from a JSON file-based URL. Enter a custom object name to assign to the unidentified objects received from the source URL. A custom object name can only contain lowercase alphabets, numbers, and hyphens.

      Note

      Intel Exchange does not support extracting intel from JSON URLs that include nested objects.

    These content types automatically map the attributes of STIX objects while creating intel from the received threat data.

  5. Click Add Attribute(s) to select the IOCs fetched from the URL.

    • Choose attributes from the available list.

    • Click Add.

    • Select an indicator type from the drop-down for the selected attribute.

      Adding an attribute and indicator type for the web scraper URL maps the required attributes to the STIX packages. You can add multiple indicator types.

  6. Set a Polling Cron Schedule for the web scraper to automatically poll data from the source according to the configured date and time. This can be done once, daily, weekly, or monthly.

  7. Select a Start Date & Time to poll the data.

  8. Select a TLP to assign the web scraper URL source.

  9. Select URL Confidence for the URL to allow the analyst to rate the reliability and importance of the URL. Confidence values are HIGH, MEDIUM, LOW, and None.

  10. Enter the default values for the custom scores you have configured in Administration > Configuration > Custom Scores.

  11. Enter a Name for the web scraper URL.

  12. Click Add Web Scraper.

You can view the URL, title, content type, last polling done, and status of the new web scraper URL in Web Scraper.