2271 Commits

Author SHA1 Message Date
Nicolas
5fcf3fa97e Merge branch 'main' into mog/mineru 2024-12-27 19:53:09 -03:00
Nicolas
65cf4cd74e
Merge pull request #1013 from yujunhui/main
fix: merge mock success data
2024-12-27 19:04:04 -03:00
Nicolas
05d5f84d87
Merge pull request #1018 from mendableai/feat/add-favicon-metadata
[FIR-37] feat: extract and return favicon URL during scraping
2024-12-27 17:44:03 -03:00
Nicolas
eba5fda9a1
Merge pull request #955 from mendableai/rafa/fix-default-on-schema-llm-extract
fixed optional+default bug on llm schema
2024-12-27 16:33:04 -03:00
Ademílson F. Tonato
a4cf814f70 feat: return favicon url when scraping 2024-12-27 19:18:53 +00:00
Gergő Móricz
0421f81020
Sitemap fixes (#1010)
* sitemap fixes iter 1

* feat(sitemap): dedupe improvements

---------

Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2024-12-27 19:59:26 +01:00
Nicolas
c5b6495e48
Merge pull request #1015 from mendableai/nsc/improves-sitemap-fetching
Improves sitemap fetching
2024-12-27 14:41:04 -03:00
Nicolas
e8f0a22ebe Update v1-openapi.json 2024-12-27 13:59:43 -03:00
Nicolas
f7cfbba651 Merge branch 'main' into pr/1003 2024-12-27 13:59:24 -03:00
Nicolas
1abb544e3e Update index.test.ts 2024-12-27 13:59:09 -03:00
Gergő Móricz
4772951313 feat(scrapeURL/fire-engine): explicitly delete job after scrape 2024-12-27 16:44:41 +01:00
Gergő Móricz
0b55fb836b feat(scrapeURL/pdf): switch to MinerU 2024-12-27 16:37:32 +01:00
Nicolas
ece95e97f4 Merge branch 'main' into nsc/extract-url-trace 2024-12-26 21:28:51 -03:00
Gergő Móricz
c543f4f76c feat(scrapeURL/pdf): update mock Blob implementation to pass TypeScript 2024-12-26 20:31:51 +01:00
Gergő Móricz
f15ef0e758 feat(scrapeURL/fire-engine/chrome-cdp): handle file downloads 2024-12-26 20:29:09 +01:00
Nicolas
4451c4f671 Nick: 2024-12-26 13:51:20 -03:00
Nicolas
4332f18a8f Nick: making it optional for the user 2024-12-26 12:43:58 -03:00
Nicolas
233f347f5e Nick: refactor 2024-12-26 12:41:37 -03:00
Nicolas
f467a3ae6c Nick: init 2024-12-26 12:21:46 -03:00
yujunhui
2f39bdddd9 fix: merge mock success data 2024-12-26 17:56:30 +08:00
Nicolas
18ceaf10a5 Update .gitignore 2024-12-23 18:42:05 -03:00
RutamBhagat
ca2d3dc6d2 docs(credit-usage-api): add new endpoint documentation for credit usage 2024-12-21 06:24:53 -08:00
Nicolas
d1f3e26f9e Nick: blocklist string 2024-12-20 18:09:49 -03:00
Nicolas
ba95df96b1 Update rate-limiter.ts 2024-12-20 15:45:44 -03:00
Nicolas
6222152249 Nick: credit usage endpoint 2024-12-20 15:44:17 -03:00
Nicolas
ed24853ca6
Merge pull request #996 from mendableai/fix/title-extra-info
[BUG] fixed title extra info
2024-12-19 16:05:49 -03:00
Gergő Móricz
071b9a01c3 fix(scrapeURL/fire-engine): pass geolocation 2024-12-19 18:23:21 +01:00
rafaelmmiller
cf2ec77131 fixed title extra info 2024-12-19 08:32:10 -03:00
Nicolas
066071cd54 Update llmExtract.ts 2024-12-18 23:45:43 -03:00
Nicolas
05605112bb Update extract.ts 2024-12-18 23:34:07 -03:00
Nicolas
2d37dca9dc Nick: introduced system prompt to /extract 2024-12-18 22:10:41 -03:00
Nicolas
a759a7ab7a Nick: small improvements 2024-12-18 21:45:06 -03:00
Nicolas
e899ecbe44 Update llmExtract.ts 2024-12-18 16:52:05 -03:00
Móricz Gergő
bd36c441d3 feat(queue-worker): improve team-based logging 2024-12-17 22:06:36 +01:00
Móricz Gergő
780442d73b feat: improve billing logging 2024-12-17 22:02:31 +01:00
Nicolas
ac187452c3 Nick: better filtering for urls that should be scraped 2024-12-17 17:34:55 -03:00
Nicolas
3b6edef9fa chore: formatting 2024-12-17 16:58:57 -03:00
Nicolas
b9f621bed5 Nick: extract fixes 2024-12-17 16:58:35 -03:00
Nicolas
79e335636a Nick: fixed extract issues 2024-12-17 16:40:45 -03:00
Nicolas
6d77879d68 Update extract.ts 2024-12-17 15:22:25 -03:00
Nicolas
e26a0a65a7 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-12-17 15:19:53 -03:00
Nicolas
0f8b8a717d Update map.ts 2024-12-17 15:19:52 -03:00
Eric Ciarla
a20a003c74 revert to pdf parse 2024-12-17 12:12:22 -05:00
Eric Ciarla
194353af0d Remove pdf parse 2024-12-17 10:04:20 -05:00
Eric Ciarla
1402831a0a Replace pdf parse with pdf to md 2024-12-17 09:59:52 -05:00
Eric Ciarla
ed7d15d2af Update index.ts 2024-12-17 09:50:29 -05:00
Gergő Móricz
654d6c6e0b fix(scrapeURL): increase timeToRun 2024-12-17 13:21:24 +01:00
Gergő Móricz
47b968fede fix(scrapeURL/fire-engine): timeout calculation issues 2024-12-17 13:17:55 +01:00
Gergő Móricz
7f57c868be Revert "fix(scrapeURL): better timeToRun distribution"
This reverts commit 284a6ccedd1baede825571ee933eb7e4f773e2de.
2024-12-16 23:08:20 +01:00
Gergő Móricz
284a6ccedd fix(scrapeURL): better timeToRun distribution 2024-12-16 23:01:34 +01:00