When the v2 fsspec connectors currently generate the relative path, they
may introduce a path with a leading slash (this happens in the case of
the Box connector, which is a subclass of fsspec). When this happens
this results in the paths unintentionally being treated as absolute
paths. As a result, the ingest pipeline attempts to write files to
directories at root level, which in turn raises permission issues.
Note: Box expected results needed to update now that it's no longer
failing.
Aside: found that our tests were unintentionally skipping `box.sh` tests
because we were intending to skip `dropbox.sh` and we use regex to match
if a given test is in skip tests. This adds changes to force an exact
match.
## Changes
* Strip leading slashes during the creating of relative paths in fsspec
connectors
* Add expected results for Box connector
* (bonus): `make tidy` altered an unrelated file by removing an
unnecessary call of `pass`
* (bonus): check exact match for skipped ingest tests which fixes Box
tests getting skipped
## Testing
[Tests](https://github.com/Unstructured-IO/unstructured/actions/runs/9461928289/job/26093475612#step:7:2085)
for the Box connector was failing. It was accidentally getting skipped
(see changes above). It is now no longer skipped and passing.
Part one of the issue described here:
https://github.com/Unstructured-IO/unstructured/issues/2461
It does not change how hashing algorithm works, just reworks how ids are
assigned:
> Element ID Design Principles
>
> 1. A partitioning function can assign only one of two available ID
types to a returned element: a hash or UUID.
> 2. All elements that are returned come with an ID, which is never
None.
> 3. No matter which type of ID is used, it will always be in string
format.
> 4. Partitioning a document returns elements with hashes as their
default IDs.
Big thanks to @scanny for explaining the current design and suggesting
ways to do it right, especially with chunking.
Here's the next PR in line:
https://github.com/Unstructured-IO/unstructured/pull/2673
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: micmarty-deepsense <micmarty-deepsense@users.noreply.github.com>
Closes#1843
Ingest connector for HubSpot. Supports:
- Calls: Logs from calls related to contacts, companies and tickets
- Communications: Logs from SMS/Whatsapp related to contacts, companies
and tickets
- Notes: Notes related to CRM notes
- Products: CRM products
- Emails: Logs from emails sent to CRM objects.
- Tasks: CRM tasks
From each record, `body/`description`information is grabbed. When a
title property is available, this is registered at the beggining of the
output file. The CLI receives three params:
- `api-token`: [Private
app](https://developers.hubspot.com/docs/api/private-apps) token.
- `object-types: One of the noted supported objects in the form of a
comma separated list: `calls,products,tasks`
- `custom-properties`: Custom properties to grab information from. Must
be in the form
`<object_type>:<custom_property_id>,<object_type>:<custom_property_id>`
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: rvztz <rvztz@users.noreply.github.com>