Nicolas
6c939d534d
Nick: small refactor
2024-05-29 19:43:51 -07:00
Eric Ciarla
37915e11e8
Final push
2024-05-29 21:18:24 -04:00
Eric Ciarla
a0e404f94e
init commit
2024-05-29 18:56:57 -04:00
rafaelsideguide
ee9a2184e2
Added custom scraping conditions for readme docs
2024-05-29 13:39:43 -03:00
Nicolas
1b3547dcf2
Nick:
2024-05-28 12:56:24 -07:00
Nicolas
a8ff295977
Update single_url.ts
2024-05-21 18:50:42 -07:00
Nicolas
a5e718b084
Nick: improvements
2024-05-21 18:34:23 -07:00
Nicolas
df6c3d1e7d
Merge branch 'main' into detect-pdfs
2024-05-17 09:55:51 -07:00
Nicolas
d10f81e7fe
Nick: fixes
2024-05-15 11:28:20 -07:00
Nicolas
a96fc5b96d
Nick: 4x speed
2024-05-13 20:45:11 -07:00
rafaelsideguide
8eb2e95f19
Cleaned up
2024-05-13 16:13:10 -03:00
rafaelsideguide
f4348024c6
Added check during scraping to deal with pdfs
...
Checks if the URL is a PDF during the scraping process (single_url.ts).
TODO: Run integration tests - Does this strat affect the running time?
ps. Some comments need to be removed if we decide to proceed with this strategy.
2024-05-13 09:13:42 -03:00
Nicolas
d21091bb06
Update single_url.ts
2024-05-09 17:52:46 -07:00
Nicolas
be85008622
Nick: better
2024-05-09 17:48:11 -07:00
Nicolas
be5661a768
Nick: a lot better
2024-05-09 17:45:16 -07:00
rafaelsideguide
e1f52c538f
nested includeHtml inside pageOptions
2024-05-07 13:40:24 -03:00
rafaelsideguide
509250c4ef
changed to includeHtml
2024-05-06 19:45:56 -03:00
rafaelsideguide
538355f1af
Added toMarkdown option
2024-05-06 11:36:44 -03:00
Nicolas
768166b066
Update single_url.ts
2024-04-30 16:57:44 -07:00
Caleb Peffer
3ca9e5153f
Caleb: trying to get loggin workng
2024-04-30 09:20:15 -07:00
Nicolas
b69feab916
Merge branch 'main' into llm-extraction
2024-04-29 08:40:44 -07:00
Caleb Peffer
6ee1f2d3bc
Caleb: initially pulled inspiration code from https://github.com/mishushakov/llm-scraper
2024-04-28 13:59:35 -07:00
Nicolas
68838c9e0d
Update single_url.ts
2024-04-28 12:44:00 -07:00
Nicolas
8e44696c4d
Nick:
2024-04-28 11:34:25 -07:00
Nicolas
fdb2789eaa
Nick: added url as return param
2024-04-23 17:14:34 -07:00
Nicolas
f0695c7123
Update single_url.ts
2024-04-23 17:04:10 -07:00
Nicolas
0146157876
Nick: mvp
2024-04-23 15:28:32 -07:00
Nicolas
306cfe4ce1
Nick:
2024-04-23 11:15:11 -07:00
Nicolas
ca2bf9cc12
Update single_url.ts
2024-04-17 18:27:08 -07:00
Nicolas
36abe0f7f9
Nick:
2024-04-17 18:24:46 -07:00
Nicolas
08ed68ff55
Nick: fixes
2024-04-17 12:44:23 -07:00
rafaelsideguide
ff622739b7
Added a html to markdown table parser
2024-04-17 11:01:19 -03:00
Nicolas
93627ae87c
Nick:
2024-04-16 12:06:46 -04:00
Nicolas
a6c2a87811
Initial commit
2024-04-15 17:01:47 -04:00