firecrawl

mirror of https://github.com/mendableai/firecrawl.git synced 2026-01-31 12:29:02 +00:00

Author	SHA1	Message	Date
rafaelsideguide	49e3e64787	bugfix for pdfs and logging pdf events, also added trycatchs for docx	2024-07-29 14:13:46 -03:00
Nicolas	ff4266f09e	Update pdfProcessor.ts	2024-07-26 17:21:09 -04:00
rafaelsideguide	6208ecdbc0	added logger	2024-07-23 17:30:46 -03:00
Nicolas	56d42d9c9b	Nick:	2024-06-24 16:33:07 -03:00
rafaelsideguide	21d29de819	testing crawl with new.abb.com case many unnecessary console.logs for tracing the code execution	2024-06-24 16:25:07 -03:00
Rafael Miller	f9c7ca9388	Merge branch 'main' into feat/issue-266	2024-06-14 11:47:58 -03:00
rafaelsideguide	bb859ae9a7	Added metadata.pageStatusCode and metadata.pageError properties to the responses	2024-06-13 17:08:40 -03:00
rafaelsideguide	e37d151404	added parsePDF option to pageOptions user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves	2024-06-12 15:06:47 -03:00
Nicolas	cbf8d79cce	Update pdfProcessor.ts	2024-06-04 00:13:37 -07:00
Nicolas	5be208f595	Nick: fixed	2024-05-17 10:40:44 -07:00
rafaelsideguide	8eb2e95f19	Cleaned up	2024-05-13 16:13:10 -03:00
rafaelsideguide	f4348024c6	Added check during scraping to deal with pdfs Checks if the URL is a PDF during the scraping process (single_url.ts). TODO: Run integration tests - Does this strat affect the running time? ps. Some comments need to be removed if we decide to proceed with this strategy.	2024-05-13 09:13:42 -03:00
rafaelsideguide	f8b207793f	changed the request to do a HEAD to check for a PDF instead	2024-04-29 15:15:32 -03:00
Nicolas	c5cb268b61	Update pdfProcessor.ts	2024-04-19 13:13:42 -07:00
Nicolas	43cfcec326	Nick: disabling in crawl and sitemap for now	2024-04-19 13:12:08 -07:00
Nicolas	140529c609	Nick: fixes pdfs not found	2024-04-19 13:05:21 -07:00
rafaelsideguide	57e5b36014	[Feat] Adding pdf parser	2024-04-18 11:43:57 -03:00

17 Commits