markitdown

mirror of https://github.com/microsoft/markitdown.git synced 2025-06-26 22:00:21 +00:00

Author	SHA1	Message	Date
onefloid	da7bcea527	docs: rephrase sentence (#1278 )	2025-06-03 21:09:25 -07:00
afourney	3bfb821c09	Have the MarkItDown MCP server read MARKITDOWN_ENABLE_PLUGINS from ENV (#1273 ) * Have the MarkItdown MCP server read MARKITDOWN_ENABLE_PLUGINS from os.environ * Update the Dockerfile to enable plugins. No puglins are installed by default.	2025-06-03 09:35:33 -07:00
Tomasz Kalinowski	62b72284fe	pin onnxruntime on Windows (#1274 ) closes #1266	2025-05-28 13:13:51 -07:00
afourney	1dd3c83339	Promoting 0.1.2a1 to 0.1.2 (#1272 ) v0.1.2	2025-05-28 10:04:42 -07:00
afourney	9dc982a3b1	Small changes to favor streamable HTTP over deprecated SSE (#1264 )	2025-05-23 11:39:41 -07:00
afourney	effde4767b	Preparing a pre-release of 0.1.2 (#1260 ) v0.1.2a1	2025-05-21 15:24:56 -07:00
rtpacks	04bf831209	docs: fix typos (#1201 )	2025-05-21 15:12:22 -07:00
Betula-L	9fd680c366	support streamable http mcp (#1245 ) Co-authored-by: luhualin	2025-05-21 14:34:50 -07:00
一I	38261fd31c	Update Python version requirement and add .cursorrules to .gitignore (#1249 ) * update markdown * Update and install Python version suggestions * Update README with prerequisites. --------- Co-authored-by: Lucas Liu <lucas@LucasdeMacBook-Pro.local> Co-authored-by: afourney <adamfo@microsoft.com>	2025-05-21 10:47:29 -07:00
Yi-Cheng Wang	131f0c7739	feat: add Document Intelligence API version selection via kwargs (#1253 ) Co-authored-by: Yi-Cheng Wang <yicheng.wang@heph-ai.com> Co-authored-by: afourney <adamfo@microsoft.com>	2025-05-21 10:22:08 -07:00
JoshClark-git	56f7579ce2	FIX YouTube transcript errors (#1241 ) * FIX YouTube transcript errors * Fixed formatting. --------- Co-authored-by: Josh <jca351@sfu.ca> Co-authored-by: afourney <adamfo@microsoft.com>	2025-05-21 10:17:57 -07:00
t3tra	cb421cf9ea	Chore: Make linter happy (#1256 ) * refactor: remove unused imports * fix: replace NotImplemented with NotImplementedError * refactor: resolve E722 (do not use bare 'except') * refactor: remove unused variable * refactor: remove unused imports * refactor: ignore unused imports that will be used in the future * refactor: resolve W293 (blank line contains whitespace) * refactor: resolve F541 (f-string is missing placeholders) --------- Co-authored-by: afourney <adamfo@microsoft.com>	2025-05-21 10:02:16 -07:00
kira-offgrid	39e7252940	fix: python.lang.security.use-defused-xml-parse.use-defused-xml-parse-packages-markitdown-src-markitdown-converter_utils-docx-math-omml.py (#1251 )	2025-05-21 09:57:21 -07:00
afourney	bbcf876b18	Switched from the stdlib minidom parser to defusedxml. (#1259 )	2025-05-21 09:47:14 -07:00
createcentury	041be54471	Update README.md (#1187 ) updated subtle misspelling.	2025-04-13 09:31:40 -07:00
lentil32	ebe2684b3d	chore: fix typo in README.md (#1175 ) * chore: fix typo in README.md	2025-04-13 09:29:16 -07:00
Turdıbek	8576f1d915	Add CSV to Markdown table conversion - fixes #1144 (#1176 ) * feat: Add CSV to Markdown table converter - Add new CsvConverter class to convert CSV files to Markdown tables\n- Support text/csv and application/csv MIME types\n- Preserve table structure with headers and data rows\n- Handle edge cases like empty cells and mismatched columns\n- Fix Azure Document Intelligence dependency handling\n- Register CsvConverter in MarkItDown class ---- Thanks also to @benny123tw who submitted a very similar PR in #1171	2025-04-13 09:19:00 -07:00
Sathindu	3fcd48cdfc	feat: render math equations in .docx documents (#1160 ) * feat: math equation rendering in .docx files * fix: import fix on .docx pre processing * test: add test cases for docx equation rendering * docs: add ThirdPartyNotices.md * refactor: reformatted with black	2025-03-28 15:36:38 -07:00
afourney	9e067c42b6	Make it easier to use AzureKeyCredentials with Azure Doc Intelligence (#1151 ) * Make it easier to use AzureKeyCredentials with Azure Doc Intelligence * Fixed mypy type error. * Added more fine-grained options over types. * Pass doc intel options further up the stack.	2025-03-26 10:44:11 -07:00
afourney	9a951055f0	Update readme to point to the mcp package. (#1158 ) * Updated readme with link to the MCP package.	2025-03-25 15:00:04 -07:00
afourney	73b9d57312	Update badges (#1157 ) * Update badges in subpackages.	2025-03-25 14:52:24 -07:00
afourney	3ca57986ef	Basic SSE MCP Server for MarkItDown (#1155 ) * Added an initial minimal MCP server for MarkItDown * Added STDIO default option. * Added a Dockerfile, and updated the README accordingly. Also added instructions for Claude Desktop * Pin mcp version.	2025-03-25 14:38:22 -07:00
afourney	c1f9a323ee	Bump version. (#1154 ) v0.1.1	2025-03-24 23:26:30 -07:00
afourney	e928b43afb	convert_url renamed to convert_uri, and now handles data and file URIs (#1153 )	2025-03-24 21:43:04 -07:00
afourney	2ffe6ea591	Bump version. (#1150 ) v0.1.0	2025-03-22 11:21:32 -07:00
afourney	efc55b260d	Bump version and resolve a console encoding error. (#1149 ) v0.1.0a6	2025-03-21 09:27:25 -07:00
Yuzhong Zhang	52432bd228	Add support for preserving base64 encoded images (#1140 ) * optional reserve base64 string in markdown _CustomMarkdownify and pptx * add other converter para support * fix linter * Use kwarg to pass keep_data_uri para. Add module cli vector tests * Fixed formatting, and adjusted tests.	2025-03-20 18:50:23 -07:00
afourney	c0a511ecff	Updated docx file to include an image. (#1146 )	2025-03-20 12:25:56 -07:00
afourney	cd6aa41361	Adjust warning filters and update dependencies (#1143 ) Adjusts warning filters to be more contextual Updates dependencies for magika and youtube-transcript-api Updates the version to 0.1.0a5 in __about__.py v0.1.0a5	2025-03-19 22:09:14 -07:00
afourney	716f74dcb9	Consider anything with a charset as plain text-convertible. (#1142 )	2025-03-19 20:46:35 -07:00
afourney	a93e0567e6	EPub Support. Adapted #123 to not use epublib. (#1131 ) * Adapted #123 to not use epublib. * Updated README.md v0.1.0a4	2025-03-17 07:48:15 -07:00
afourney	c5f70b904f	Have magika read from the stream. (#1136 )	2025-03-17 07:39:19 -07:00
afourney	53834fdd24	Investigate and silence warnings. (#1133 )	2025-03-15 23:41:35 -07:00
afourney	5c565b7d79	Fix remaining mypy errors. (#1132 )	2025-03-15 23:12:48 -07:00
afourney	a78857bd43	Added epub test file. (#1130 )	2025-03-15 18:34:51 -07:00
afourney	09df7fe8df	Small fixes for autogen integration. (#1124 )	2025-03-12 19:18:11 -07:00
Adam Fourney	6a9f09b153	Updated Magika dependency.	2025-03-12 16:15:33 -07:00
afourney	0b815fb916	Bumping version to 0.1.0a2 (#1123 )	2025-03-12 11:44:19 -07:00
Emanuele Meazzo	12620f1545	Handle not supported plot type in pptx (#1122 ) * Handle not supported plot type in pptx * Fixed formatting.	2025-03-12 11:26:23 -07:00
afourney	5f75e16d20	Refactored tests. (#1120 ) * Refactored tests. * Fixed CI errors, and included misc tests. * Omit mskanji from streaminfo test. * Omit mskanji from no hints test. * Log results of debugging in comments (linked to Magika issue) * Added docs as to when to use misc tests.	2025-03-12 11:08:06 -07:00
yushihang	75140a90e2	fix: correct f-string formatting in FileConversionException (#1121 )	2025-03-12 10:15:09 -07:00
afourney	af1be36e0c	Added CLI options for extension, mimetypes, and charset. (#1115 )	2025-03-11 13:16:33 -07:00
Adam Fourney	2a2ccc86aa	Added mimetypes to _rss_converter	2025-03-10 16:17:41 -07:00
Adam Fourney	2e51ba22e7	Enhance type guessing.	2025-03-10 16:05:41 -07:00
afourney	8f8e58c9bb	Minimize guesses when guesses are compatible. (#1114 ) * Minimize guesses when guesses are compatible.	2025-03-10 15:30:44 -07:00
afourney	8e73a325c6	Switch from puremagic to magika. (#1108 )	2025-03-10 12:49:52 -07:00
Mohit Agarwal	2405f201af	fix typo in well-known path list (#1109 )	2025-03-08 19:32:44 -08:00
afourney	99d8e562db	Fix exiftool in well-known paths. (#1106 )	2025-03-07 21:47:20 -08:00
Sebastian Yaghoubi	515fa854bf	feat(docker): improve dockerfile build (#220 ) * refactor(docker): remove unnecessary root user The USER root directive isn't needed directly after FROM Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * fix(docker): use generic nobody nogroup default instead of uid gid Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * fix(docker): build app from source locally instead of installing package Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * fix(docker): use correct files in dockerignore Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * chore(docker): dont install recommended packages with git Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * fix(docker): run apt as non-interactive Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> * Update Dockerfile to new package structure, and fix streaming bugs. --------- Signed-off-by: Sebastian Yaghoubi <sebastianyaghoubi@gmail.com> Co-authored-by: afourney <adamfo@microsoft.com>	2025-03-07 20:07:40 -08:00
Richard Ye	0229ff6cb7	feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order (#1104 ) * Sort PPTX shapes to be read in top-to-bottom, left-to-right order Referenced from `39bef65b31/pptx2md/parser.py (L249)` * Update README.md * Fixed formatting. * Added missing import	2025-03-07 15:45:14 -08:00

1 2 3 4 5 ...

279 Commits