Kenny Zhang
4e0a10ecf3
ran unit tests locally
2025-02-27 16:44:50 -05:00
Kenny Zhang
950b135da6
formatting
2025-02-27 15:08:10 -05:00
Kenny Zhang
b671345bb9
updated readme
2025-02-27 15:07:46 -05:00
Kenny Zhang
d9a92f7f06
added file obj unit tests for rss and json
2025-02-27 15:05:29 -05:00
Kenny Zhang
db0c8acbaf
added file obj support to rss and plain text converters
2025-02-27 14:55:49 -05:00
Kenny Zhang
08330c2ac3
added core unit tests for file obj support
2025-02-27 11:27:05 -05:00
Kenny Zhang
4afc1fe886
added non-binary example to README
2025-02-21 13:31:37 -05:00
Kenny Zhang
b0044720da
updated docs
2025-02-20 16:56:47 -05:00
Kenny Zhang
07a28d4f00
black formatting
2025-02-20 16:49:37 -05:00
Kenny Zhang
b8b3897952
modify ext guesser
2025-02-20 16:47:37 -05:00
Kenny Zhang
395ce2d301
close file object after using
2025-02-20 13:54:51 -05:00
Kenny Zhang
808401a331
added conversion path for file object in central class
2025-02-19 17:02:51 -05:00
Kenny Zhang
e75f3f6f5b
local path inputs to MarkitDown class adhere to new converterinput structure
2025-02-19 15:16:45 -05:00
Kenny Zhang
8e950325d2
refactored remaining converters
2025-02-19 14:01:43 -05:00
Kenny Zhang
096fef3d5f
refactored more converters to support input class
2025-02-19 13:34:28 -05:00
Kenny Zhang
52cbff061a
begin refactoring converter classes
2025-02-19 11:48:00 -05:00
Kenny Zhang
0027e6d425
added wrapper class for converter file input
2025-02-18 12:44:18 -05:00
Kenny Zhang
63a7bafadd
removed redundant priority setting
2025-02-18 12:18:49 -05:00
afourney
dbdf2c0c10
Added CLI tests. ( #327 )
2025-02-11 20:42:50 -08:00
KennyZhang1
97eeed5f32
Doc Intelligence fixes for refactored code ( #325 )
...
* added priority flag to doc intel converter constructor
* fixed analysis features bug for docx
2025-02-11 16:01:46 -08:00
afourney
935da9976c
Added priority argument to all converter constructors. ( #324 )
...
* Added priority argument to all converter constructors.
2025-02-11 12:36:32 -08:00
Ruijun Gao
5ce85c236c
Fix a typo in sample RTF plugin ( #320 )
2025-02-11 10:33:52 -08:00
Tomasz Kalinowski
3a5ca22a8d
Don't generate md links in 'pre' blocks ( #322 )
2025-02-11 07:13:17 -08:00
Adam Fourney
4b62506451
Small typo in README.
2025-02-10 15:24:28 -08:00
afourney
c73afcffea
Cleanup and refactor, in preparation for plugin support. ( #318 )
...
* Work started moving converters to individual files.
* Significant cleanup and refactor.
* Moved everything to a packages subfolder.
* Added sample plugin.
* Added instructions to the README.md
* Bumped version, and added a note about compatibility.
2025-02-10 15:21:44 -08:00
wunde005
73ba69d8cd
For csv files mimetypes.guess_type is returning "application/vnd.ms-excel" on windows causing an invalid mime type in plaintextconverter. In reference to issue: https://github.com/microsoft/markitdown/issues/150 ( #273 )
2025-02-08 20:58:13 -08:00
Werner Robitza
2a4f7bb6a8
fix: argparse CLI option ordering, fixes #268 ( #290 )
...
* fix: argparse CLI option ordering, fixes #268
* Fixed formatting.
2025-02-08 20:50:38 -08:00
masquare
7cf5e0bb23
feat(pptx): support image description with LLM for pptx files ( #306 )
2025-02-08 20:37:34 -08:00
James Hickey
3090917a49
Typo fixed ( #270 )
2025-02-08 20:30:13 -08:00
ZeyuTeng96
7bea2672a0
remove leading and trailing \n for HtmlConverter ( #262 )
2025-02-08 20:28:35 -08:00
KennyZhang1
bf6a15e9b5
Kennyzhang/docintel docs ( #312 )
...
* updated docs to include doc intelligence
* include reference to doc intel setup docs
2025-01-31 22:23:26 -08:00
KennyZhang1
bfde857420
Add support for conversion via Document Intelligence ( #303 )
...
* added cli params for doc intel
* added DocumentIntelligenceConverter class implementation
* initialized doc intel client instance field
* added isolated doc_intel main conversion function
* temp fix for ContentFormat import bug
* ran tests for docintel and offline for many filetypes
* push doc intel converter to the top of the stack
* formatting changes
* modified project toml file
2025-01-24 14:09:32 -08:00
afourney
f58a864951
Set exiftool path explicitly. ( #267 )
2025-01-06 12:43:47 -08:00
afourney
265aea2edf
Removed the holiday away message from README.md ( #266 )
2025-01-06 09:06:21 -08:00
afourney
05b78e7ce1
Recognize json as plain text (if no other handlers are present). ( #261 )
...
* Recognize json as plain text (if no other handlers are present).
2025-01-03 16:40:43 -08:00
afourney
436407288f
If puremagic has no guesses, try again after ltrim. ( #260 )
2025-01-03 16:03:11 -08:00
afourney
731b39e7f5
Added a test for leading spaces. ( #258 )
2025-01-03 14:34:33 -08:00
yeungadrian
08ed32869e
Feature/ Add xls support ( #169 )
...
* add xlrd
* add xls converter with tests
2025-01-03 13:58:17 -08:00
Murat Can Kurtuluş
d248621ba4
feat: outlook ".msg" file converter ( #196 )
...
* feat: outlook .msg converter
* add test, adjust docstring
2025-01-03 13:34:39 -08:00
AbSadiki
4678c8a2a4
fix(transcription): IS_AUDIO_TRANSCRIPTION_CAPABLE should be iniztialized ( #194 )
2025-01-03 13:29:26 -08:00
Ikko Eltociear Ashimine
125e206047
docs: update README.md ( #182 )
...
faciliate -> facilitate
2024-12-21 01:51:30 -08:00
numekudi
f94d09990e
feat: enable Git support in devcontainer ( #136 )
...
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 18:09:17 -08:00
lumin
cfd2319c14
feat: add version option to markitdown CLI ( #172 )
...
Add a `--version` option to the markitdown command-line interface
that displays the current version number.
2024-12-20 16:24:45 -08:00
dependabot[bot]
73161982ff
Bump actions/setup-python from 2 to 5 ( #179 )
...
Bumps [actions/setup-python](https://github.com/actions/setup-python ) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases )
- [Commits](https://github.com/actions/setup-python/compare/v2...v5 )
---
updated-dependencies:
- dependency-name: actions/setup-python
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:20:22 -08:00
dependabot[bot]
9b69467772
Bump actions/cache from 3 to 4 ( #178 )
...
Bumps [actions/cache](https://github.com/actions/cache ) from 3 to 4.
- [Release notes](https://github.com/actions/cache/releases )
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md )
- [Commits](https://github.com/actions/cache/compare/v3...v4 )
---
updated-dependencies:
- dependency-name: actions/cache
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
Co-authored-by: afourney <adamfo@microsoft.com>
2024-12-20 16:17:43 -08:00
gagb
857a2d160d
Update README.md ( #180 )
2024-12-20 14:49:20 -08:00
Soulter
1123392306
fix: support -o param to avoid encoding issues ( #116 )
...
* perf: cli supports -o param
* doc: update README
---------
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:43:00 -08:00
dependabot[bot]
377a7eaa7d
Bump actions/checkout from 2 to 4 ( #177 )
...
Bumps [actions/checkout](https://github.com/actions/checkout ) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases )
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md )
- [Commits](https://github.com/actions/checkout/compare/v2...v4 )
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-20 14:36:48 -08:00
lumin
c1a0d3deaf
chore: configure Dependabot for GitHub Actions updates ( #112 )
...
Sets up Dependabot to automatically check for updates to
GitHub Actions on a weekly basis, ensuring that the project
remains up-to-date with the latest dependencies and security
fixes.
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:28:55 -08:00
SigireddyBalasai
5276616ba1
Added support to use Pathlib ( #93 )
...
* Add support for Path objects in MarkItDown conversion methods
* Remove unnecessary blank line in test_markitdown_exiftool function
* Remove unnecessary blank line in test_markitdown_exiftool function
* remove pathlib path in test file
---------
Co-authored-by: afourney <adamfo@microsoft.com>
Co-authored-by: gagb <gagb@users.noreply.github.com>
2024-12-20 14:12:48 -08:00