196 Commits

Author SHA1 Message Date
gagb
5fc70864f2 Run pre-commit 2024-12-18 11:46:39 -08:00
Sugato Ray
39410d01df
Update CLI helpdoc formatting to allow indentation in code
Use `textwrap.dedent()` to allow indented cli-helpdoc in `__main__.py` file. The indentation increases readability, while `textwrap.dedent` helps maintain the same functionality without breaking code.
2024-12-18 14:22:58 -05:00
Joel Esler
6e4caac70d Safeguard against path traversal for ZipConverter
fix: prevent path traversal vulnerabilities in ZipConverter

Added a secure check for path traversal vulnerabilities in the ZipConverter class.
Now validates extracted file paths using `os.path.commonprefix` to ensure all files
remain within the intended extraction directory. Raises a `ValueError` if a
path traversal attempt is detected.

- Normalized file paths using `os.path.normpath`.
- Added specific exception handling for `zipfile.BadZipFile` and traversal errors.
- Ensured cleanup of extracted files after processing when `cleanup_extracted` is enabled.
2024-12-18 13:12:55 -05:00
Petr@AP Consulting
224f1df0fc
Update README.md
I collapsed section about batch processing as was suggested
2024-12-18 09:28:18 +01:00
gagb
1deaba1c6c
Merge pull request #98 from waterimp/feature/fix-code-comments
fix incorrect comments for "bail if not ..." for WAV and image cases.
2024-12-17 17:57:25 -08:00
gagb
09cb048cbe
Merge branch 'main' into feature/fix-code-comments 2024-12-17 17:34:53 -08:00
gagb
b029ae1cd4
Merge pull request #108 from microsoft/gagb-readme
Simplify README
2024-12-17 17:30:49 -08:00
gagb
524aa0da75
Update README.md 2024-12-17 17:25:40 -08:00
gagb
de1b54d79f
Update README.md 2024-12-17 17:25:13 -08:00
gagb
1e7806a7ac Simplify 2024-12-17 17:21:39 -08:00
gagb
1163aa2b4e
Merge pull request #106 from microsoft/gagb-patch-1
Update README.md
2024-12-17 16:57:32 -08:00
gagb
3bcf2bdae7
Update README.md 2024-12-17 16:54:17 -08:00
gagb
41a10b9a35
Merge pull request #64 from l-lumin/add-devcontainer-config
feat(devcontainer): Add DevContainer Configuration for Easier Contribution Setup
2024-12-17 16:52:50 -08:00
gagb
f1e399eee4
Merge branch 'main' into add-devcontainer-config 2024-12-17 16:50:32 -08:00
gagb
8b02c0bf9f
Merge pull request #80 from diya155/main
Update README.md
2024-12-17 16:49:58 -08:00
gagb
1dda535330
Merge branch 'main' into main 2024-12-17 16:46:23 -08:00
gagb
362214323e
Merge branch 'main' into feature/fix-code-comments 2024-12-17 16:38:47 -08:00
lumin
457b6234e6
Merge branch 'main' into add-devcontainer-config 2024-12-18 09:14:31 +09:00
afourney
790031409b
Merge pull request #71 from AumGupta/main
feat: Add IpynbConverter
2024-12-17 15:41:51 -08:00
afourney
9e546a8588
Merge branch 'main' into main 2024-12-17 15:37:28 -08:00
afourney
ddf695cf81
Merge pull request #97 from Soulter/main
feat: Add RSSConverter
2024-12-17 15:34:22 -08:00
Adam Fourney
8d5f16ecd2 Fixed formatting. 2024-12-17 15:27:06 -08:00
afourney
a571021199
Merge branch 'main' into main 2024-12-17 15:12:59 -08:00
afourney
9add517510
Merge branch 'main' into feature/fix-code-comments 2024-12-17 14:56:16 -08:00
afourney
3ce21a47ab
Merge pull request #102 from microsoft/bump_version
Bump version.
v0.0.1a3
2024-12-17 13:55:12 -08:00
Adam Fourney
9518c01d4e Bump version. 2024-12-17 13:51:13 -08:00
afourney
22504551ef
Merge pull request #101 from microsoft/add_deprecation_warnings
Added deprecation warnings for mlm_* arguments.
2024-12-17 13:49:44 -08:00
Adam Fourney
95188a4a27 Merge main. 2024-12-17 13:46:26 -08:00
afourney
e69d012b86
Merge pull request #100 from microsoft/add_llm_tests 2024-12-17 13:36:36 -08:00
Adam Fourney
03a7843a0a Added deprecation warnings for mlm_* arguments. 2024-12-17 13:22:48 -08:00
Adam Fourney
248d64edd0 Added llm tests to the local test set. 2024-12-17 12:13:19 -08:00
Lee Bush
05a49ca129 fix incorrect comments for "bail if not ..." for WAV and image cases. 2024-12-17 08:10:53 -07:00
Soulter
752fbd333c feat: add tests of rss convertor 2024-12-17 22:45:27 +08:00
Soulter
7dc2695b96 feat: support convert atom to markdown 2024-12-17 21:44:50 +08:00
Soulter
53fad6eb31 feat: add rss converter 2024-12-17 21:22:27 +08:00
Petr@AP Consulting
f398f3d443
Update README.md
I added description and script for batch of files processing
2024-12-17 10:26:09 +01:00
lumin
e0a30295ff docs: update README with Devcontainer instructions
Add instructions for using Dev to run tests.Remove the install script it is no longer needed. 
Update trademark section for clarity.
2024-12-17 17:04:31 +09:00
lumin
07fe457a90 feat: add devcontainer configuration and installation script
Add a devcontainer configuration to streamline the development 
environment setup. Introduce an `install.sh` script to install 
the project in editable mode. Update the Dockerfile to use 
the `python:3.13-slim-bullseye` base image and install 
dependencies using `apt-get` for better compatibility.
2024-12-17 17:04:31 +09:00
Om Gupta
60c4a62917
Merge branch 'microsoft:main' into main 2024-12-17 10:33:40 +05:30
Om Gupta
3eb8cf385b Merge branch 'main' of https://github.com/AumGupta/markitdown 2024-12-17 10:24:30 +05:30
Om Gupta
8c91c11ea8 pre-commit run 2024-12-17 10:24:25 +05:30
diya155
14bd8d319a
Update README.md 2024-12-17 09:16:40 +05:30
gagb
ad5d4fb139
Merge pull request #77 from microsoft/kevinclb/main
Kevinclb/main
2024-12-16 18:14:09 -08:00
gagb
ad29122592 run precommit 2024-12-16 18:09:48 -08:00
gagb
898bfd4774
Merge branch 'main' into main 2024-12-16 18:00:26 -08:00
gagb
c8980d9f41
Merge pull request #75 from microsoft/cybernobie/main
Cybernobie/main
2024-12-16 17:40:13 -08:00
gagb
24b52b2b8f Improve readme 2024-12-16 17:35:47 -08:00
gagb
09159aa04e
Merge branch 'main' into main 2024-12-16 17:24:47 -08:00
gagb
77f620b568
Merge pull request #67 from DIMAX99/issue#65
fix issue #65
2024-12-16 17:18:53 -08:00
gagb
825d3bbb77
Merge branch 'main' into issue#65 2024-12-16 17:09:53 -08:00