| 
							
							
								 -LAN- | 5f12c17355 | fix(core): use CreatedByRole enum for role consistency (#9607) | 2024-10-22 13:03:50 +08:00 |  | 
			
				
					| 
							
							
								 -LAN- | e61752bd3a | feat/enhance the multi-modal support (#8818) | 2024-10-21 10:43:49 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 240b66d737 | chore: avoid implicit optional in type annotations of method (#8727) | 2024-10-09 14:36:43 +08:00 |  | 
			
				
					| 
							
							
								 Zhaofeng Miao | 369e1e6f58 | feat(website-crawl): add jina reader as additional alternative for website crawling (#8761) | 2024-09-30 09:57:19 +08:00 |  | 
			
				
					| 
							
							
								 crazywoola | bf64ff215b | fix: . is missing in file_extension (#8736) | 2024-09-25 10:09:20 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | a1104ab97e | chore: refurish python code by applying Pylint linter rules (#8322) | 2024-09-13 22:42:08 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 6613b8f2e0 | chore: fix unnecessary string concatation in single line (#8311) | 2024-09-13 14:24:49 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 40fb4d16ef | chore: refurbish Python code by applying refurb linter rules (#8296) | 2024-09-12 15:50:49 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | c69f5b07ba | chore: apply ruff E501 line-too-long linter rule (#8275) Co-authored-by: -LAN- <laipz8200@outlook.com> | 2024-09-12 14:00:36 +08:00 |  | 
			
				
					| 
							
							
								 takatost | 56c90e212a | fix(workflow): missing content in the answer node stream output during iterations (#8292) Co-authored-by: -LAN- <laipz8200@outlook.com> | 2024-09-12 13:59:48 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 0f14873255 | chore: cleanup ruff flake8-simplify linter rules (#8286) Co-authored-by: -LAN- <laipz8200@outlook.com> | 2024-09-12 12:55:45 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 2cf1187b32 | chore(api/core): apply ruff reformatting (#7624) | 2024-09-10 17:00:20 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | af92f19291 | filter excel empty sheet (#8194) | 2024-09-10 14:55:08 +08:00 |  | 
			
				
					| 
							
							
								 Nam Vu | 2d7954c7da | Fix variable typo (#8084) | 2024-09-08 13:14:11 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 01581dd35f | improve the notion table extract (#7925) | 2024-09-03 17:52:07 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 6f33351eb3 | ignore linked images when image id is none (#7890) | 2024-09-02 19:37:05 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | ccb6ddd840 | chore: bump Ruff to 0.5.7 (#7174) | 2024-08-12 10:24:48 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 12095f8cd6 | extract docx filter comment element (#7092) | 2024-08-08 16:53:29 +08:00 |  | 
			
				
					| 
							
							
								 chenxu9741 | 72c75b75cf | feat: Add hyperlink parsing to the DOCX document. (#7017) | 2024-08-07 16:01:14 +08:00 |  | 
			
				
					| 
							
							
								 yanghx | c53875ce8c | fix #6902  .docx handles images within tables and handles cross-column tables (#6951) | 2024-08-06 17:14:24 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | cf258b7a67 | add xlsx support hyperlink extract (#6722) | 2024-07-26 19:26:52 +08:00 |  | 
			
				
					| 
							
							
								 Yeuoly | 79cb23e8ac | security/SSRF vulns (#6682) | 2024-07-25 20:50:26 +08:00 |  | 
			
				
					| 
							
							
								 灰灰 | 5e4ac11df3 | fix: code block segmentation problem of markdown document (#6465) | 2024-07-25 17:24:37 +08:00 |  | 
			
				
					| 
							
							
								 Poorandy | c8f5dfcf17 | refactor(rag): switch to dify_config. (#6410) Co-authored-by: -LAN- <laipz8200@outlook.com> | 2024-07-18 18:40:36 +08:00 |  | 
			
				
					| 
							
							
								 tangyoha | 0cbbaf3f68 | fix: markdown proc will remove image (#5855) | 2024-07-12 20:07:22 +08:00 |  | 
			
				
					| 
							
							
								 Matri | a9ee52f2d7 | Fix/firecrawl parameters issue (#6213) | 2024-07-12 12:59:50 +08:00 |  | 
			
				
					| 
							
							
								 Aurelius Huang | f546db5437 | fix: document truncation and loss in notion document sync (#5631) Co-authored-by: Aurelius Huang <cm.huang@aftership.com> | 2024-07-05 11:48:17 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 43335b5c87 | delete the deprecated method  (#5612) | 2024-06-26 12:51:50 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 39c14ec7c1 | improve: unify Excel files parsing in either xls or xlsx file format by Pandas (#4965) | 2024-06-20 16:14:49 +08:00 |  | 
			
				
					| 
							
							
								 takatost | 12c815c597 | fix: ExtractSetting optional value missing None as default val (#5238) | 2024-06-15 02:58:47 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | ba5f8afaa8 | Feat/firecrawl data source (#5232) Co-authored-by: Nicolas <nicolascamara29@gmail.com>
Co-authored-by: chenhe <guchenhe@gmail.com>
Co-authored-by: takatost <takatost@gmail.com> | 2024-06-15 02:46:02 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | f976740b57 | improve: mordernizing validation by migrating pydantic from 1.x to 2.x (#4592) | 2024-06-14 01:05:37 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 3b60c28b3a | deal the external image when extract docx image (#5024) | 2024-06-07 20:00:39 +08:00 |  | 
			
				
					| 
							
							
								 YC | 9f8ca75a81 | fixing a bug of handling header row when parsing xls file, and tune xls/xlsx parsing result to be more structured (#3600) | 2024-06-05 15:28:43 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 58db719a2c | dep: bump pandas from 1.x to 2.x (#4820) | 2024-06-04 13:24:28 +08:00 |  | 
			
				
					| 
							
							
								 Oliver Lee | 176d91937d | fix 'NoneType' and new ContentType supported. (#4818) | 2024-05-31 14:19:33 +08:00 |  | 
			
				
					| 
							
							
								 yalei | 026175c8f7 | feat: update notion extractor (#3898) Co-authored-by: duyalei <> | 2024-05-24 20:30:48 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 233c4150d1 | support images and tables extract from docx (#4619) | 2024-05-23 18:05:23 +08:00 |  | 
			
				
					| 
							
							
								 majian | b5204111da | Add UNSTRUCTURED_API_KEY env support (#4369) | 2024-05-20 13:14:17 +08:00 |  | 
			
				
					| 
							
							
								 Charlie.Wei | 97b65f9b4b | Optimize webscraper (#4392) Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
Co-authored-by: crazywoola <427733928@qq.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com> | 2024-05-15 15:23:16 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | 7919596a21 | fix: UP031 style rule violation (#3866) | 2024-04-26 11:24:08 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 0737e930cb | chore: remove Langchain tools import (#3407) | 2024-04-12 16:26:09 +08:00 |  | 
			
				
					| 
							
							
								 chenxu9741 | ad65c891e7 | add xls file suport (#3321) | 2024-04-12 14:53:44 +08:00 |  | 
			
				
					| 
							
							
								 LiuVaayne | b00466f025 | feat:api Add support for extracting EPUB files in ExtractProcessor (#3254) Co-authored-by: crazywoola <427733928@qq.com> | 2024-04-12 11:25:02 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 6164604462 | fix dataset retrival in dataset mode (#3334) | 2024-04-11 02:11:21 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | 9eba6ffdd4 | Optimize csv and excel extract (#3155) Co-authored-by: jyong <jyong@dify.ai> | 2024-04-08 16:34:43 +08:00 |  | 
			
				
					| 
							
							
								 Vikey Chen | e4f686deb7 | fix unstructured api,remove unused parameters (#3056) | 2024-04-03 21:00:20 +08:00 |  | 
			
				
					| 
							
							
								 Jyong | b0b0cc045f | add mutil-thread document embedding (#3016) Co-authored-by: jyong <jyong@dify.ai> | 2024-03-28 17:02:35 +08:00 |  | 
			
				
					| 
							
							
								 Weaxs | 20bd49285b | excel: get keys from every sheet (#2796) | 2024-03-12 16:59:25 +08:00 |  | 
			
				
					| 
							
							
								 Bowen Liang | b163545771 | Use python-docxto extract docx files (#2654) | 2024-03-07 18:24:55 +08:00 |  |