unclecode
|
c8589f8da3
|
Update:
- Fix Spacy model issue
- Update Readme and requirements.txt
|
2024-05-16 19:50:20 +08:00 |
|
unclecode
|
6a6365ae0a
|
Refactor code to exclude the extraction of semantical blocks of text from the HTML
|
2024-05-16 18:10:55 +08:00 |
|
unclecode
|
5b80be956d
|
Update:
- Debug
- Refactor code for new version
|
2024-05-16 17:31:44 +08:00 |
|
UncleCode
|
4a2e17447b
|
Update README.md
|
2024-05-16 08:57:58 +08:00 |
|
unclecode
|
f6e59157bf
|
- Test all methods
- Update index.hml
- Update Readme
- Resolve some bugs
|
2024-05-14 21:27:41 +08:00 |
|
unclecode
|
5fea6c064b
|
Improve libraries import
|
2024-05-13 02:46:35 +08:00 |
|
unclecode
|
11393183f7
|
Add Colab setup scritp.
|
2024-05-13 00:39:06 +08:00 |
|
unclecode
|
7679064521
|
Add model parameter for clustring.
|
2024-05-13 00:06:16 +08:00 |
|
unclecode
|
cf087cfa58
|
Replace embedding model with smaller one
|
2024-05-12 23:55:57 +08:00 |
|
unclecode
|
5693e324a4
|
Add time measurements.
|
2024-05-12 23:35:27 +08:00 |
|
unclecode
|
b38bf64490
|
Exclude spaCy from requirements.txt
|
2024-05-12 22:59:26 +08:00 |
|
unclecode
|
82706129f5
|
Update:
- Text Categorization
- Crawler, Extraction, and Chunking strategies
- Clustering for semantic segmentation
|
2024-05-12 22:37:21 +08:00 |
|
unclecode
|
7039e3c1ee
|
- Issue Resolved: Every <pre> tag's HTML content is replaced with its inner text to address situations like syntax highlighters, where each character might be in a <span> . This avoids issues where the minimum word threshold might ignore them.
|
2024-05-12 14:08:22 +08:00 |
|
unclecode
|
8e536b9717
|
chore: Refactor README.md and project structure
|
2024-05-12 12:41:42 +08:00 |
|
unclecode
|
aac4e07389
|
chore: Update README.md and project structure
|
2024-05-12 12:39:31 +08:00 |
|
UncleCode
|
e3960ace68
|
Update README.md
Explain more about `extract_blocks_flag`
|
2024-05-11 22:11:16 +08:00 |
|
UncleCode
|
b0f97ab2b3
|
Update README.md
Public server is available now
|
2024-05-11 08:56:19 +08:00 |
|
unclecode
|
372c921429
|
Update: Fix bug, when user set extract_blocks to False
|
2024-05-10 20:12:31 +08:00 |
|
ntohidi
|
aa126e436b
|
Add CORS middleware for allowing all origins to make requests
|
2024-05-10 12:27:40 +02:00 |
|
unclecode
|
20ef255c7f
|
Update README
|
2024-05-09 23:28:47 +08:00 |
|
unclecode
|
da7748a780
|
Update README file
|
2024-05-09 22:51:10 +08:00 |
|
unclecode
|
f74f4e88c0
|
Update README file
|
2024-05-09 22:48:42 +08:00 |
|
unclecode
|
a8e7218769
|
chore: Update README.md and project structure
|
2024-05-09 22:40:08 +08:00 |
|
unclecode
|
84f093593a
|
Update README
|
2024-05-09 22:37:45 +08:00 |
|
unclecode
|
88643612e8
|
chore: Update environment variable usage in config files
|
2024-05-09 22:37:01 +08:00 |
|
unclecode
|
6f99bad6f0
|
Update web application URL in README.md
|
2024-05-09 22:28:37 +08:00 |
|
unclecode
|
50d7a7e45d
|
chore: Update forced flag for single page fetch to use default value
|
2024-05-09 22:21:12 +08:00 |
|
unclecode
|
c71dd9189b
|
chore: Update import statements to use crawl4ai package
|
2024-05-09 22:17:15 +08:00 |
|
unclecode
|
3ff1d15702
|
Change the project folder name from crawler to crawl4ai
|
2024-05-09 22:16:28 +08:00 |
|
UncleCode
|
7ee8001b7d
|
Update README.md
Add configuration section
|
2024-05-09 21:49:04 +08:00 |
|
unclecode
|
b9d9d2bbd4
|
chore: Update URL for single page fetch to NBC News
|
2024-05-09 20:05:59 +08:00 |
|
unclecode
|
6320d07a93
|
chore: Update landing page URL and min words threshold
|
2024-05-09 20:05:31 +08:00 |
|
unclecode
|
181250cb93
|
chore: Add function to clear the database
|
2024-05-09 19:42:43 +08:00 |
|
unclecode
|
f7c031c097
|
chore: Remove unused code from test.py
|
2024-05-09 19:26:37 +08:00 |
|
unclecode
|
51095062d4
|
Update file names
|
2024-05-09 19:26:16 +08:00 |
|
unclecode
|
c71adb29ce
|
chore: Update .gitignore and README.md
|
2024-05-09 19:25:25 +08:00 |
|
unclecode
|
898ec30a18
|
chore: Update license information in README.md
`chore: Update social media links in index.html`
|
2024-05-09 19:14:48 +08:00 |
|
unclecode
|
343c4477f8
|
Update Crawl4AI web application URL in README.md
|
2024-05-09 19:13:20 +08:00 |
|
unclecode
|
99e0dd1ccd
|
chore: Update README.md with installation instructions for Crawl4AI library and local server
|
2024-05-09 19:12:39 +08:00 |
|
unclecode
|
b8e743cd8d
|
Initial Commit
|
2024-05-09 19:10:25 +08:00 |
|