Nicolas
|
78badf8f72
|
Nick: wip
|
2024-10-28 16:02:07 -03:00 |
|
Nicolas
|
d965f2ce7d
|
Nick: fixes
|
2024-10-24 23:13:30 -03:00 |
|
Nicolas
|
d8abd15716
|
Nick: from bulk to batch
|
2024-10-23 15:37:24 -03:00 |
|
Nicolas
|
66e505317e
|
Merge branch 'main' into mog/bulk-scrape
|
2024-10-23 14:36:26 -03:00 |
|
Thomas Kosmas
|
acde353e56
|
skipTlsVerification on robots.txt scraping
|
2024-10-23 01:07:03 +03:00 |
|
Thomas Kosmas
|
bd55464b52
|
skipTlsVerification
|
2024-10-22 22:28:02 +03:00 |
|
Nicolas
|
d2344aa14b
|
Revert "Nick: improved map ranking algorithm"
This reverts commit 7acd8d2edb6abc45a63fe1060377d2acb398ec36.
|
2024-10-21 16:11:32 -03:00 |
|
Nicolas
|
7acd8d2edb
|
Nick: improved map ranking algorithm
|
2024-10-19 13:27:47 -03:00 |
|
Gergő Móricz
|
03b37998fd
|
feat: bulk scrape
|
2024-10-17 19:40:18 +02:00 |
|
Nicolas
|
081d7407b3
|
Merge pull request #788 from mendableai/nsc/log-extractpr-options
Extractor options logging v1 fix
|
2024-10-16 23:51:22 -03:00 |
|
Nicolas
|
06b8d24a4c
|
Update scrape.ts
|
2024-10-16 23:50:21 -03:00 |
|
Nicolas
|
a73b06589c
|
Merge pull request #785 from mendableai/nsc/support-for-all-metadata
Return all the website metadata
|
2024-10-16 23:37:26 -03:00 |
|
Nicolas
|
c0384ea381
|
Nick: added tests
|
2024-10-16 23:32:44 -03:00 |
|
Nicolas
|
b4f6a0f919
|
Nick: geolocation
|
2024-10-15 21:12:33 -03:00 |
|
rafaelsideguide
|
4afcd16e02
|
performance improv for ws
|
2024-10-15 10:12:27 -03:00 |
|
rafaelsideguide
|
3afaab13d9
|
feat/improv-crawl-status-filters
|
2024-10-14 18:14:00 -03:00 |
|
Nicolas
|
961b1010cf
|
Nick: rm the cache for map for 24hrs
|
2024-10-12 17:48:37 -03:00 |
|
rafaelsideguide
|
2d3d7c827a
|
fix/added unkwown status to job filter
|
2024-10-11 15:40:29 -03:00 |
|
rafaelsideguide
|
8cbd94ed2d
|
fix/filters failed and unknown jobs now
|
2024-10-11 09:45:51 -03:00 |
|
busaud
|
c6ebbc6f6a
|
bugfix: self-host crawling doesnt respect limit
|
2024-10-09 22:52:49 +00:00 |
|
Nicolas
|
497ac3328b
|
Merge pull request #732 from mendableai/fix/url-validation-params
[BUG] Fixed URLs with params
|
2024-10-03 17:43:37 -03:00 |
|
rafaelsideguide
|
cfd776a5de
|
fix: now urls with params are passing validation
example: https://www.granitecreek.com?asljhda=akjshd
|
2024-10-03 17:37:04 -03:00 |
|
Nicolas
|
49bd95327e
|
Update types.ts
|
2024-10-03 17:00:33 -03:00 |
|
Nicolas
|
1a1ac9fd60
|
Nick:
|
2024-10-03 16:37:58 -03:00 |
|
Nicolas
|
c6717fecaa
|
Nick: got rid of job interval sleep and math.min
|
2024-10-01 16:11:12 -03:00 |
|
Nicolas
|
18f9cd09e1
|
Nick: fixed more stuff
|
2024-10-01 16:04:39 -03:00 |
|
Nicolas
|
37299fc035
|
Update types.ts
|
2024-10-01 15:18:11 -03:00 |
|
Nicolas
|
4d5477f357
|
Nick: resolved conflicts
|
2024-10-01 14:39:57 -03:00 |
|
Nicolas
|
96245e387d
|
Update crawl.ts
|
2024-10-01 14:29:53 -03:00 |
|
Nicolas
|
445fc432e9
|
Reapply "fix(v1/crawl): always use sitemap"
This reverts commit 339b19ce9d57fd15b11820e1cfbe4d7b5f44cf30.
|
2024-10-01 14:03:07 -03:00 |
|
Nicolas
|
339b19ce9d
|
Revert "fix(v1/crawl): always use sitemap"
This reverts commit 5dc0fcf644bfc64b2b30dd345b2a61b64a4c1262.
|
2024-10-01 13:59:49 -03:00 |
|
Gergő Móricz
|
5dc0fcf644
|
fix(v1/crawl): always use sitemap
|
2024-10-01 18:49:44 +02:00 |
|
Nicolas
|
1af26fe1b4
|
Nick: sitemap fix
|
2024-10-01 12:38:48 -03:00 |
|
Gergő Móricz
|
3621e191bd
|
feat(concurrency-limit): set limit based on plan
|
2024-09-28 00:19:54 +02:00 |
|
Gergő Móricz
|
d5e2a80e4a
|
fix(crawl-status): keep 10 megabyte pages if they're the only thing in the output
|
2024-09-27 20:41:41 +02:00 |
|
Gergő Móricz
|
e98f858eb6
|
fix(api): playground scrape errors
|
2024-09-26 22:28:14 +02:00 |
|
Gergő Móricz
|
84bff8add8
|
fix(billTeam): update cached ACUC after billing
|
2024-09-26 22:15:15 +02:00 |
|
Gergő Móricz
|
f22ab5ffaf
|
feat(db): implement bill_team RPC
|
2024-09-26 22:15:15 +02:00 |
|
Gergő Móricz
|
f8c70fe5dd
|
feat(db): implement auth_credit_usage_chunk RPC
|
2024-09-26 22:15:15 +02:00 |
|
Gergő Móricz
|
29815e084b
|
feat(v1/Document): add warning field
|
2024-09-26 21:19:05 +02:00 |
|
Gergő Móricz
|
b696bfc854
|
fix(crawl-status): avoid race conditions where crawl may be deemed failed
|
2024-09-26 21:00:27 +02:00 |
|
Gergő Móricz
|
e67cbc2ca1
|
fix(billTeam): update cached ACUC after billing
|
2024-09-25 21:37:01 +02:00 |
|
Gergő Móricz
|
5a8eb17a82
|
feat(db): implement bill_team RPC
|
2024-09-25 20:57:45 +02:00 |
|
Gergő Móricz
|
331e826bca
|
feat(db): implement auth_credit_usage_chunk RPC
|
2024-09-25 19:25:18 +02:00 |
|
Gergő Móricz
|
f00c0b82f9
|
fix(v1/scrape): add total wait specified in request to timeout
|
2024-09-24 21:56:22 +02:00 |
|
Gergő Móricz
|
3e661a2087
|
fix(v1/crawl-cancel): avoid double authing
|
2024-09-24 20:01:34 +02:00 |
|
Gergő Móricz
|
a59b5836d5
|
Revert error tallying
|
2024-09-24 10:27:49 +02:00 |
|
Nicolas
|
db161ac55a
|
Nick: press + write
|
2024-09-20 19:45:23 -04:00 |
|
Nicolas
|
0690cfeaad
|
Merge branch 'main' into feat/actions
|
2024-09-20 18:24:13 -04:00 |
|
Gergő Móricz
|
d663bbf0ca
|
feat(actions): add scroll
|
2024-09-20 21:41:53 +02:00 |
|