mirror of https://github.com/mendableai/firecrawl.git synced 2025-06-27 00:41:33 +00:00

History

feat(api/tests/scrape): Playwright test improvements (#1626 )

* feat(api/tests/scrape): verify that proxy works on Playwright

* debug: logs

* remove logs

* feat(playwright): add contentType relaying

* fix tests

* debug

* fix json

2025-06-04 01:24:19 +02:00

helpers

new playwright service

2024-06-26 12:32:30 -07:00

.dockerignore

fix(self-host): update docs and dockerignore

2025-02-20 07:57:57 +01:00

api.ts

feat(api/tests/scrape): Playwright test improvements (#1626 )

2025-06-04 01:24:19 +02:00

Dockerfile

setting up docker to ts playwright service

2024-07-03 11:55:39 -07:00

package.json

feat(ci/self-host): add playwright microservice tests (#1210 )

2025-02-20 02:06:13 +01:00

pnpm-lock.yaml

feat(ci/self-host): add playwright microservice tests (#1210 )

2025-02-20 02:06:13 +01:00

README.md

Changed port and added "using with firecrawl" section on readme

2024-06-28 11:51:24 -03:00

tsconfig.json

setting up docker to ts playwright service

2024-07-03 11:55:39 -07:00

README.md

Playwright Scrape API

This is a simple web scraping service built with Express and Playwright.

Features

Scrapes HTML content from specified URLs.
Blocks requests to known ad-serving domains.
Blocks media files to reduce bandwidth usage.
Uses random user-agent strings to avoid detection.
Strategy to ensure the page is fully rendered.

Install

npm install
npx playwright install

RUN

npm run build
npm start

npm run dev

USE

curl -X POST http://localhost:3000/scrape \
-H "Content-Type: application/json" \
-d '{
  "url": "https://example.com",
  "wait_after_load": 1000,
  "timeout": 15000,
  "headers": {
    "Custom-Header": "value"
  },
  "check_selector": "#content"
}'

USING WITH FIRECRAWL

Add PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3003/scrape to /apps/api/.env to configure the API to use this Playwright microservice for scraping operations.