100 Commits

Author SHA1 Message Date
yangdx
fd0ae4646f Fixes crash when processing files with UTF-8 encoding error
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
c22315ea6d refactor: remove selective LLM cache clearing functionality
- Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods
- Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing
- Update API endpoint to ignore mode-specific parameters and clear all cache
- Simplify frontend clearCache() function to send empty request body

This change ensures all LLM cache is cleared together.
2025-08-05 23:51:51 +08:00
yangdx
e04d8ed8a7 Improved storage drop logging with namespace details
- Added namespace and workspace to drop logs
2025-08-04 00:56:39 +08:00
yangdx
7505195303 fix: add full_entities and full_relations to clear_documents storage list 2025-08-03 23:02:58 +08:00
yangdx
0eac1a883a Feat: add file path sorting for document manager
- Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB)
- Implement smart column header switching between "ID" and "File Name" based on display mode
- Add automatic sort field switching when toggling between ID and file name display
- Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance
- Update frontend to maintain sort state when switching display modes
- Add internationalization support for "fileName" in English and Chinese locales

This enhancement improves user experience by providing intuitive file-based sorting
while maintaining performance through optimized database indexes.
2025-07-30 18:46:55 +08:00
yangdx
74eecc46e5 feat(pagination): Implement document list pagination backends and frontend UI
- Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON)
- Implement RESTful API endpoints for paginated document queries and status counts
- Create reusable pagination UI components with internationalization support
- Optimize performance with database-level pagination and efficient in-memory processing
- Maintain backward compatibility while adding configurable page sizes (10-200 items)
2025-07-30 17:58:32 +08:00
yangdx
c24c2ff2f6 Remove deprecated temp file saving function
- Delete unused save_temp_file function
2025-07-30 14:23:08 +08:00
yangdx
29e829113b Fix status key serialization issue in get_rack_status 2025-07-30 04:45:48 +08:00
yangdx
7207598fc4 Fix track_id bugs and add track_id to scanning response 2025-07-30 03:06:20 +08:00
yangdx
6f958d5aee feat: add metadata timestamps to document processing and update frontend compatibility
- Add metadata field to doc_status storage with Unix timestamps for processing start/end times
- Update frontend API types: error -> error_msg, add track_id and metadata support
- Add getTrackStatus API method for document tracking functionality
- Fix frontend DocumentManager to use error_msg field for proper error display
- Ensure full compatibility between backend metadata changes and frontend UI
2025-07-30 00:04:27 +08:00
yangdx
6014b9bf73 feat: add track_id support for document processing progress monitoring
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
910c6973f3 Limit file deletion to current directory only after document cleaning 2025-07-16 20:35:24 +08:00
yangdx
033098c1bc Feat: Add WORKSPACE support to all storage types 2025-07-07 00:57:21 +08:00
yangdx
98150e80b8 Improved empty/whitespace file handling
- Better detection of whitespace-only files
- Changed error to warning for empty chunks
2025-07-05 23:16:39 +08:00
xuewei
49cb51b5dc PDF文件解析不到内容 2025-07-05 13:47:47 +08:00
yangdx
04d793abbd Update logger message 2025-07-03 22:15:32 +08:00
yangdx
67f51597c2 Bump api version to 0178 2025-07-03 21:37:47 +08:00
yangdx
05231233f1 Feat: Check pending equest_pending after document deletion
- Add double-check for pipeline status to prevent race conditions
- Implement automatic processing of pending indexing requests after deletion
2025-07-03 21:36:35 +08:00
yangdx
a506753548 Fix linting 2025-06-27 02:33:20 +08:00
yangdx
60777d535b fix: prevent Path Traversal vulnerability in upload endpoint
- Add sanitize_filename() function to validate and clean uploaded filenames
- Remove path separators, traversal sequences, and control characters
- Verify final paths stay within input directory using Path.resolve()
- Return HTTP 400 errors for unsafe filenames
- Prevents directory traversal attacks like ../../../etc/passwd
2025-06-27 02:33:05 +08:00
yangdx
8fb1c09b08 Refac: pipelinge message 2025-06-26 01:00:54 +08:00
yangdx
bdcd55a871 Feat: Add delete upload file option to document deletion 2025-06-25 19:02:46 +08:00
yangdx
51bb0471cd Change the API for deleting documents to support deleting multiple documents at once. 2025-06-25 16:19:49 +08:00
yangdx
495d6c8cce Improve the pipeline status message for document deletetion 2025-06-25 15:46:58 +08:00
yangdx
2aaa6d5f7d Fix linting 2025-06-25 14:59:45 +08:00
yangdx
49baeb7318 Change document deletion API to async 2025-06-25 14:59:10 +08:00
yangdx
922484915b Remove deprecated API endpoint. 2025-06-25 13:55:47 +08:00
yangdx
8b6dcfb6eb Pls do not use /delete_document API endpoint 2025-06-24 11:26:38 +08:00
yangdx
5ae945c1e5 Improved error handling for document deletion
Added HTTPException for not_found status
Added HTTPException for fail status
2025-06-24 01:12:25 +08:00
yangdx
c18065a912 Disable document deletion when LLM cache for extraction is off 2025-06-23 22:41:27 +08:00
yangdx
1973c80dca Feat: Add entity and relation deletion endpoints 2025-06-23 22:14:50 +08:00
yangdx
bd487dd252 Unify document APIs returen status string 2025-06-23 21:38:47 +08:00
yangdx
5099ac8213 Fix linting 2025-06-23 18:41:30 +08:00
yangdx
dffe659388 Feat: Add document deletion by ID API endpoint
- New DELETE endpoint for document removal
- Implements doc_id-based deletion
- Handles pipeline status during operation
- Includes proper error handling
- Updates pipeline status messages
2025-06-23 18:10:40 +08:00
yangdx
a6046bf827 Fix linting 2025-05-22 10:06:09 +08:00
Benjamin L
1b6ddcaf5b change validator method names 2025-05-21 16:06:35 +02:00
Benjamin L
62b536ea6f Adding file_source.s as optional attribute to text.s requests 2025-05-21 15:10:27 +02:00
yangdx
36f8787bc7 Fix linting 2025-05-01 10:04:31 +08:00
yangdx
a561be0cff Fix time zone problem of doc status 2025-05-01 02:16:19 +08:00
yangdx
31bd274601 Add Unicode collation for Chinese file sorting of document scanning 2025-04-25 01:02:09 +08:00
yangdx
3aab5b41f2 Fix linting 2025-04-24 14:15:10 +08:00
yangdx
fc425f1397 Send all found files to pipeline at once 2025-04-24 14:00:43 +08:00
cuikunyu
135a40d696 Optimize: Use python-docx for better parsing. 2025-04-11 03:10:20 +00:00
yangdx
bd2c528dba Merge branch 'optimize-config-management' into clear-doc 2025-04-04 19:46:45 +08:00
yangdx
b0f0f1ff84 refactor: improve document clearing status management
- Use update() for atomic status updates
- Improve history messages clearing while preserving list object
2025-04-01 14:03:45 +08:00
yangdx
cd94e84267 Update clear cache endpoint path 2025-04-01 10:36:28 +08:00
yangdx
d54bda8d36 feat(api): Add Pydantic models for all endpoints in document_routes.py 2025-03-31 23:53:14 +08:00
yangdx
8845779ed7 Add clear cache API endpoint 2025-03-31 23:37:03 +08:00
yangdx
95a8ee27ed Fix linting 2025-03-31 23:22:27 +08:00
yangdx
04967b33cc feat(api): Add dedicated ClearDocumentsResponse class for document deletion endpoint 2025-03-31 19:13:27 +08:00