mirror of
https://github.com/docling-project/docling.git
synced 2025-06-27 05:20:05 +00:00
fix(integration): update the Apify Actor integration (#1619)
* fix(actor): remove references to missing docling_processor.py Signed-off-by: Václav Vančura <commit@vancura.dev> * chore(actor): update Actor README.md with recent repo URL changes Signed-off-by: Václav Vančura <commit@vancura.dev> * chore(actor): improve the Actor README.md local header link Signed-off-by: Václav Vančura <commit@vancura.dev> * chore(actor): bump the Actor version number Signed-off-by: Václav Vančura <commit@vancura.dev> * Update .actor/actor.json Co-authored-by: Marek Trunkát <marek@trunkat.eu> Signed-off-by: Jan Čurn <jan.curn@gmail.com> --------- Signed-off-by: Václav Vančura <commit@vancura.dev> Signed-off-by: Jan Čurn <jan.curn@gmail.com> Co-authored-by: Jan Čurn <jan.curn@gmail.com> Co-authored-by: Marek Trunkát <marek@trunkat.eu>
This commit is contained in:
parent
84d0889829
commit
14d4f5b109
@ -64,7 +64,6 @@ ENV EASYOCR_MODULE_PATH=/tmp/easyocr-models
|
||||
COPY --chown=1000:1000 .actor/actor.sh .actor/actor.sh
|
||||
COPY --chown=1000:1000 .actor/actor.json .actor/actor.json
|
||||
COPY --chown=1000:1000 .actor/input_schema.json .actor/input_schema.json
|
||||
COPY --chown=1000:1000 .actor/docling_processor.py .actor/docling_processor.py
|
||||
RUN chmod +x .actor/actor.sh
|
||||
|
||||
# Copy the build files from builder
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
[](https://apify.com/vancura/docling)
|
||||
|
||||
This Actor (specification v1) wraps the [Docling project](https://ds4sd.github.io/docling/) to provide serverless document processing in the cloud. It can process complex documents (PDF, DOCX, images) and convert them into structured formats (Markdown, JSON, HTML, Text, or DocTags) with optional OCR support.
|
||||
This Actor (specification v1) wraps the [Docling project](https://github.com/docling-project/docling) to provide serverless document processing in the cloud. It can process complex documents (PDF, DOCX, images) and convert them into structured formats (Markdown, JSON, HTML, Text, or DocTags) with optional OCR support.
|
||||
|
||||
## What are Actors?
|
||||
|
||||
@ -14,7 +14,7 @@ This Actor (specification v1) wraps the [Docling project](https://ds4sd.github.i
|
||||
2. [Usage](#usage)
|
||||
3. [Input Parameters](#input-parameters)
|
||||
4. [Output](#output)
|
||||
5. [Performance & Resources](#performance--resources)
|
||||
5. [Performance and Resources](#performance-and-resources)
|
||||
6. [Troubleshooting](#troubleshooting)
|
||||
7. [Local Development](#local-development)
|
||||
8. [Architecture](#architecture)
|
||||
@ -190,7 +190,7 @@ Access logs via:
|
||||
apify key-value-stores get-record DOCLING_LOG
|
||||
```
|
||||
|
||||
## Performance & Resources
|
||||
## Performance and Resources
|
||||
|
||||
- **Docker Image Size**: ~4GB
|
||||
- **Memory Requirements**:
|
||||
|
@ -1,10 +1,10 @@
|
||||
{
|
||||
"actorSpecification": 1,
|
||||
"name": "docling",
|
||||
"version": "0.0",
|
||||
"version": "1.0",
|
||||
"environmentVariables": {},
|
||||
"dockerFile": "./Dockerfile",
|
||||
"input": "./input_schema.json",
|
||||
"inputSchema": "./input_schema.json",
|
||||
"scripts": {
|
||||
"run": "./actor.sh"
|
||||
}
|
||||
|
@ -154,17 +154,6 @@ else
|
||||
echo "Warning: No build files directory found. Some tools may be unavailable."
|
||||
fi
|
||||
|
||||
# Copy Python processor script to tools directory
|
||||
PYTHON_SCRIPT_PATH="$(dirname "$0")/docling_processor.py"
|
||||
if [ -f "$PYTHON_SCRIPT_PATH" ]; then
|
||||
echo "Copying Python processor script to tools directory..."
|
||||
cp "$PYTHON_SCRIPT_PATH" "$TOOLS_DIR/"
|
||||
chmod +x "$TOOLS_DIR/docling_processor.py"
|
||||
else
|
||||
echo "ERROR: Python processor script not found at $PYTHON_SCRIPT_PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check OCR directories and ensure they're writable
|
||||
echo "Checking OCR directory permissions..."
|
||||
OCR_DIR="/opt/app-root/src/.EasyOCR"
|
||||
|
Loading…
x
Reference in New Issue
Block a user