Skip to main content

Scrape

It’s simple to use: you only need to submit your access_key and a url url of a webpage. The API will return the content of the webpage.

Getting started

REST

The Scrape API, like all of ScreenshotMAX’s APIs, is organized around REST. It is designed to use predictable, resource-oriented URL’s and to use HTTP status codes to indicate errors.

HTTPS

The Scrape API requires all communications to be secured TLS 1.2 or greater.

API Versions

All of ScreenshotMAX’s APIs are versioned. The Scrape API is currently on Version 1.

Your Access Key

Your access key is your unique authentication key to be used to access ScreenshotMAX APIs. You have to include your access key in the request body as a JSON object. You can also use the X-Access-Key header to pass your access key. You can find your access key in your account dashboard.

Base URL

https://api.screenshotmax.com/v1/scrape

Validation endpoint

ScreenshotMAX’s Scrape API simply requires your unique access key and url to be passed in the URL. The API will return the content of the webpage.

POST https://api.screenshotmax.com/v1/scrape
{
"access_key": "YOUR_ACCESS_KEY",
"url": "https://example.com"
}

This was a successful request, so the API returned a 200 OK response. The content of the webpage is returned in the body of the response.

Request parameters

access_keystringBodyRequired

Your unique access key. You can find your access key in your account dashboard.

urlstringBodyRequired

The URL of the webpage you want to rendering of. Must be a valid URL and accessible from the internet. If the URL contains a querystring, it must be URL-encoded.

For example, https://example.com/test?param=1 should be passed as https%3A%2F%2Fexample.com%2Ftest%3Fparam%3D1.

formatstringBodyDefault: html

The format of the screenshot. Available formats are html, md. The html format returns the HTML content of the webpage, while the md format returns the content in Markdown format.

js_enabledboolBodyDefault: true

Whether to enable JavaScript on the page. If set to false, the API will return the HTML content of the page without executing any JavaScript.

gpu_renderingboolBodyDefault: false

Whether to use GPU rendering. Only available for scale paid plan.

capture_beyond_viewportboolBodyDefault: false

Whether to capture content beyond the viewport.

viewport_devicestringBody

The device type for the viewport.

viewport_widthnumberBodyDefault: 1280

The width of the viewport in pixels.

viewport_heightnumberBodyDefault: 1080

The height of the viewport in pixels.

viewport_landscapeboolBody

Whether the viewport should be in landscape mode.

viewport_has_touchboolBody

Whether the viewport has touch capabilities.

viewport_mobileboolBody

Whether the viewport is a mobile device.

device_scale_factornumberBody

The device scale factor for the viewport.

block_annoyancestringBodyDefault: cookies_banner

The annoyance to block. Options include none, cookies_banner, ads, tracking.

block_ressourcesstringBody

The resources to block. Options include document, stylesheet, image, media, font, script, texttrack, xhr, fetch, eventsource, websocket, manifest and other.

media_typestringBodyDefault: screen

The media type for the rendering. Options include screen and print.

vision_deficiencystringBody

The vision deficiency for the rendering. Options include reduced_contrast, blurred_vision, deuteranopia, achromatopsia.

dark_modeboolBodyDefault: false

Whether to use dark mode for the rendering.

reduced_motionboolBodyDefault: false

Whether to reduce motion for the rendering.

geolocation_accuracynumberBody

The accuracy of the geolocation in meters. Minimum is 0. Maximum is 1000.

geolocation_latitudenumberBody

The latitude of the geolocation. Minimum is -90. Maximum is 90.

geolocation_longitudenumberBody

The longitude of the geolocation. Minimum is -180. Maximum is 180.

media_typestringBodyDefault: screen

The media type for the rendering. Options include screen and print.

attachment_namestringBody

The name of the attachment, without the extension filename. This is the name that will be used when downloading the response. Extension will be automatically added based on the format parameter.

timezonestringBody

The time zone for the request. This allows you to simulate different time zones. Available time zones from the IANA Time Zone Database.

authorizationstringBody

The authorization header to use for the request. This should be a base64-encoded string (e.g., for Basic Auth, encode "username:password" using base64). This allows you to authenticate with the webpage before capturing the content.

user_agentstringBody

The user agent to use for the request. This allows you to simulate different browsers and devices.

cookiesstring[]Body

The cookies to use for the request. This allows you to simulate different sessions and states. Example: cookies=name=value; name2=value2.

headersstring[]Body

The headers to use for the request. This allows you to simulate different requests and responses. Example: headers=header1:value1; header2:value2.

ip_locationstringBody

The IP location to use for the request. This allows you to simulate requests from different countries by routing them through proxy servers with corresponding IP addresses. This feature is only available on scale paid plan.

Supported locations:

  • United States (us)
  • China (cn)
  • Europe (eu) (random EU country)
  • Canada (ca)
  • Mexico (mx)
  • United Kingdom (gb)
  • Germany (de)
  • France (fr)
  • Switzerland (ch)
  • India (in)
  • Japan (jp)
  • South Korea (kr)
  • Russia (ru)
  • Brazil (br)
  • Australia (au)
proxystringBody

The proxy to use for the request. This allows you to route the request through a different IP address. The proxy must be in the format http://username:password@host:port or https://username:password@host:port.

bypass_cspboolBodyDefault: false

Whether to bypass the Content Security Policy (CSP) of the webpage. This allows you to capture content of webpages with strict CSPs.

delaynumberBodyDefault: 0

The delay in seconds before rendering. This allows you to wait for specific elements to load before capturing the content. Maximum is 30.

timeoutnumberBodyDefault: 30

The timeout in seconds for the rendering. This allows you to set a maximum time for the request to complete. Maximum is 30.

wait_untilstring[]BodyDefault: ['domcontentloaded']

The conditions to wait for before rendering. This allows you to ensure that specific elements are loaded before capturing the content. Available options include:

  • load: Wait for the load event to be fired.
  • domcontentloaded: Wait for the DOMContentLoaded event to be fired.
  • networkidle0: Wait for no network connections for at least 500 ms.
  • networkidle2: Wait for no more than 2 network connections to be active for at least 500 ms.
metadata_iconboolBodyDefault: false

Whether to include the metadata icon in the response. This allows you to capture the favicon of the webpage. The link of the icon will be included in the header X-Screenshotmax-Metadata-Icon.

metadata_titleboolBodyDefault: false

Whether to include the metadata title in the response. This allows you to capture the title of the webpage. The title will be included in the header X-Screenshotmax-Metadata-Title.

metadata_fontsboolBodyDefault: false

Whether to include the metadata fonts in the response. This allows you to capture the fonts used on the webpage. The fonts will be included in the header X-Screenshotmax-Metadata-Fonts.

metadata_hashboolBodyDefault: false

Whether to include the metadata hash in the response. This allows you to capture the hash of the webpage. The hash will be included in the header X-Screenshotmax-Metadata-Hash.

metadata_statusboolBodyDefault: false

Whether to include the metadata status in the response. This allows you to capture the HTTP status code of the webpage. The status code will be included in the header X-Screenshotmax-Metadata-Status.

metadata_headersboolBodyDefault: false

Whether to include the metadata headers in the response. This allows you to capture the headers of the webpage. The headers will be included in the header X-Screenshotmax-Metadata-Headers.

cacheboolBodyDefault: false

Whether to store the content of the rendering in the cache. This allows you to store the rendered content for a specified time-to-live (TTL) period.

cache_ttlnumberBodyDefault: 604800

The time-to-live (TTL) for the cache in seconds. This allows you to set a maximum time for the cached resources to be valid. Maximum is 30 days in seconds (2592000).

asyncboolBodyDefault: false

Whether to use asynchronous processing for the request. This allows you to capture screenshots without blocking the request.

webhook_urlstringBody

The callback URL for asynchronous processing. This allows you to receive the response via a webhook. The webhook will be triggered when the response is ready. The webhook URL must be a valid URL and must be accessible from the internet. The webhook URL must be HTTPS and must support the POST method. More information about webhooks can be found in the async & webhook documentation.

webhook_signedboolBodyDefault: true

Indicates whether the webhook request should be signed. Enabling this option allows you to verify the authenticity of incoming webhook requests. For more details, refer to the async & webhook documentation.

Response and error codes

Error Codes

Whenever you make a request that fails for some reason, an error is returned also in the JSON format. The errors include an error code and description, which you can find in detail below.

CodeTypeDetails
200OKThe request was successful.
400Bad requestThe request was malformed or invalid.
401UnauthorizedThe request was rejected due to an invalid access key or missing signature when signed requests are enabled.
403ForbiddenThe signature provided is invalid. Occurs when signed requests are enabled.
402Payment RequiredAccess denied due to an unpaid invoice. Applies to paid plans.
423LockedThe request was denied due to insufficient quota.
429Too Many RequestsThe rate limit has been exceeded (too many requests per minute).
500Internal server errorThe request failed due to an internal server error.