Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							033e9bbf29
							
						
					 | 
					
						
						
							
							Nick: __experimental_streamSteps
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-14 01:45:50 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							558a7f4c08
							
						
					 | 
					
						
						
							
							Update package.json
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-14 01:35:29 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							9759f18725
							
						
					 | 
					
						
						
							
							Nick: temp file fixes
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-13 23:56:53 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							ac6650e488
							
						
					 | 
					
						
						
							
							Update requests.http
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-13 22:31:54 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							5e5b5ee0e2
							
						
					 | 
					
						
						
							
							(feat/extract) New re-ranker + multi entity extraction (#1061)
						
						
						
						
						
						
						
						* agent that decides if splits schema or not
* split and merge properties done
* wip
* wip
* changes
* ch
* array merge working!
* comment
* wip
* dereferentiate schema
* dereference schemas
* Nick: new re-ranker
* Create llm-links.txt
* Nick: format
* Update extraction-service.ts
* wip: cooking schema mix and spread functions
* wip
* wip getting there!!!
* nick:
* moved functions to helpers
* nick:
* cant reproduce the error anymore
* error handling all scrapes failed
* fix
* Nick: added the sitemap index
* Update sitemap-index.ts
* Update map.ts
* deduplicate and merge arrays
* added error handler for object transformations
* Update url-processor.ts
* Nick:
* Nick: fixes
* Nick: big improvements to rerank of multi-entity
* Nick: working
* Update reranker.ts
* fixed transformations for nested objs
* fix merge nulls
* Nick: fixed error piping
* Update queue-worker.ts
* Update extraction-service.ts
* Nick: format
* Update queue-worker.ts
* Update pnpm-lock.yaml
* Update queue-worker.ts
---------
Co-authored-by: rafaelmmiller <150964962+rafaelsideguide@users.noreply.github.com>
Co-authored-by: Thomas Kosmas <thomas510111@gmail.com> 
						
						
							
						
					 | 
					
						2025-01-13 22:30:15 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							5c62bb1195
							
						
					 | 
					
						
						
							
							feat: new snips test framework (FIR-414) (#1033)
						
						
						
						
						
						
						
						* feat: new snips test framework
* Update mock.ts
---------
Co-authored-by: Nicolas <nicolascamara29@gmail.com> 
						
						
							
						
					 | 
					
						2025-01-13 20:50:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							9a13c1dede
							
						
					 | 
					
						
						
							
							Nick: fixes to extract rephrase prompt
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-11 20:22:36 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							a82160a630
							
						
					 | 
					
						
						
							
							Update crawl-redis.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-10 21:31:23 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							f4d10c5031
							
						
					 | 
					
						
						
							
							Nick: formatting fixes
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-10 18:35:10 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							d1f3b96388
							
						
					 | 
					
						
						
							
							feat: add scrapeId in document.metadata
						
						
						
						
						
						
							
 v1.2.1
						
					 | 
					
						2025-01-09 20:52:12 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							29c1f126ab
							
						
					 | 
					
						
						
							
							feat(scrape-status): adapt
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 19:14:00 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							2849ce2f13
							
						
					 | 
					
						
						
							
							fix(queue-worker): errored job logging
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 18:48:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							97bf54214f
							
						
					 | 
					
						
						
							
							fix(scrapeURL/loop): re-add is long enough check with lt 0
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 18:43:50 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							0da386914d
							
						
					 | 
					
						
						
							
							fix(queue-worker): graceful shutdown
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 16:04:59 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							3c614a2e5c
							
						
					 | 
					
						
						
							
							fix(scrapeURL/engines/pdf,docx): support authorization
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 10:03:27 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							49e584f8e1
							
						
					 | 
					
						
						
							
							fix(queue-worker/crawl): use SCARD to generate num_docs field
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 09:51:34 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							9e8c629ff4
							
						
					 | 
					
						
						
							
							fix(log_job): don't redact with auth header
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-09 09:51:34 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							14f696805c
							
						
					 | 
					
						
						
							
							Update auth.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 17:04:57 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							51cb4b1615
							
						
					 | 
					
						
						
							
							Nick: temp rl for /extract
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 15:24:38 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							a199208e21
							
						
					 | 
					
						
						
							
							Update rate-limiter.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 15:15:21 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							aa31508ccd
							
						
					 | 
					
						
						
							
							Nick: links-billed update (temp)
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 15:13:33 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							363021ea78
							
						
					 | 
					
						
						
							
							feat(crawl): ensure url trimming
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 12:35:42 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							977a3e13c5
							
						
					 | 
					
						
						
							
							fix(scrapeURL): remove short content check
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-08 11:23:25 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							0a41fdd35d
							
						
					 | 
					
						
						
							
							Merge branch 'nsc/extract-queue'
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 18:21:57 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							7918d0e1c9
							
						
					 | 
					
						
						
							
							Nick: bump 1.12.0
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 18:20:56 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							f82a742cd1
							
						
					 | 
					
						
						
							
							Merge pull request #1044 from mendableai/nsc/extract-queue
						
						
						
						
						
						
						
						(feat/extract) Move extract to a queue system 
						
						
							
						
					 | 
					
						2025-01-07 18:10:46 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							b98e289f03
							
						
					 | 
					
						
						
							
							Nick:
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 17:49:21 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							a185c05a5c
							
						
					 | 
					
						
						
							
							Nick: sdk async and get status
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 17:27:40 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							9ec08d7020
							
						
					 | 
					
						
						
							
							Nick: fixed the sdks
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 17:20:49 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							dd14744850
							
						
					 | 
					
						
						
							
							Update types.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 16:55:55 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							9fdcfb9314
							
						
					 | 
					
						
						
							
							Update index.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 16:24:46 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							51636352a6
							
						
					 | 
					
						
						
							
							Merge branch 'nsc/extract-queue' of https://github.com/mendableai/firecrawl into nsc/extract-queue
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 16:21:58 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							11af214db1
							
						
					 | 
					
						
						
							
							Nick: update extract in case there is an error
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 16:21:51 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							1f2a76fc23
							
						
					 | 
					
						
						
							
							Update apps/api/src/lib/extract/extraction-service.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 20:18:10 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							eb254547e5
							
						
					 | 
					
						
						
							
							Nick:
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 16:16:01 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							c6a63793bb
							
						
					 | 
					
						
						
							
							crawl incomplete issues
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 19:38:17 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							ccfada98ca
							
						
					 | 
					
						
						
							
							various queue fixes
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 19:15:23 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							86e34d7c6c
							
						
					 | 
					
						
						
							
							Nick: wip
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 12:13:12 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							7a03275575
							
						
					 | 
					
						
						
							
							add comment
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 13:57:47 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							7d73ebdbf1
							
						
					 | 
					
						
						
							
							fix(crawl): never invalidate first crawl scrape if redirects
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 13:57:23 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							b96b97ed72
							
						
					 | 
					
						
						
							
							fix(crawl): don't push rawhtml to db unless requested
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 10:09:15 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Móricz Gergő
							
						 
					 | 
					
						
						
						
						
							
						
						
							35d1d85978
							
						
					 | 
					
						
						
							
							fix(crawler): also take the hostname of the base url when determining isInternalLink
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-07 09:29:58 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							bb27594443
							
						
					 | 
					
						
						
							
							Merge branch 'main' into nsc/extract-queue
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-06 13:01:15 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							b82cfa8540
							
						
					 | 
					
						
						
							
							Merge pull request #1038 from 1101-1/add_actual_random_useragent
						
						
						
						
						
						
						
						feat: use new random user agent instead of the old one 
						
						
							
						
					 | 
					
						2025-01-06 11:51:15 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Kirill
							
						 
					 | 
					
						
						
						
						
							
						
						
							736c3675b6
							
						
					 | 
					
						
						
							
							use new agent generation instead of expired one
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-05 17:07:14 +04:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
							
							
						
						
						
							
						
						
							ceb2104960
							
						
					 | 
					
						
						
							
							Merge pull request #1034 from mendableai/sdk/fixed-none-undefined-on-response
						
						
						
						
						
						
						
						[SDK] fixed none and undefined on response 
						
						
							
						
					 | 
					
						2025-01-04 16:31:41 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							461842fe8c
							
						
					 | 
					
						
						
							
							fix(v1/crawl-status): handle job's returnvalue being explicitly null (db race)
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-04 17:24:33 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Gergő Móricz
							
						 
					 | 
					
						
						
						
						
							
						
						
							b92a4eb79b
							
						
					 | 
					
						
						
							
							fix(queue-worker): only do redirect handling logic on crawls, not batch scrape
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-04 16:59:35 +01:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							d48ddb8820
							
						
					 | 
					
						
						
							
							Update canonical-url.test.ts
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-03 23:55:05 -03:00 | 
					
					
						
						
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Nicolas
							
						 
					 | 
					
						
						
						
						
							
						
						
							f2e0bfbfe3
							
						
					 | 
					
						
						
							
							Nick: url normalization
						
						
						
						
						
						
							
						
					 | 
					
						2025-01-03 23:54:03 -03:00 | 
					
					
						
						
							
							
							
						
					 |