100 Commits

Author SHA1 Message Date
Nathan Evans
ac504e31a0
Add stricter filtering and tests for cli data directory discovery (#910)
* Add stricter filtering and tests for cli data directory discovery

* Semver

* Ignore ruff on error type

* Format

* Fix for windows paths

* Fix for windows paths

* Uncomment blob tests

* Sort by timestamp name instead of modified date

* Format

* Add additional folder name test
2024-08-13 17:34:14 -06:00
Alonso Guevara
d68e323193
Disable fail fast on tests (#911) 2024-08-13 12:20:14 -06:00
Alonso Guevara
f9c1bdd748
Release v0.3.0 (#912) v0.3.0 2024-08-12 18:14:52 -06:00
Alonso Guevara
4b9f268604
Fix/query embedding (#909)
* fix strategy config in entity_extraction

* should not post token list to the embedding model

* fix embedding in local query

* add sembersioner

* remove strategy

---------

Co-authored-by: KylinMountain <kose2livs@gmail.com>
2024-08-12 17:12:51 -06:00
benx13
3f31af80d2
typo summarize prompt (#907)
* typo in  entity_summarization prompt

* typo in summarize prompt

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 16:03:08 -06:00
Andres Morales
5a7dbaa051
Fix sort_context max_tokens & max_tokens param in verb (#888)
* Fix sort_context max_tokens & max_tokens param in verb

* Fix sort_context for windows test

* add semversioner file

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 15:55:31 -06:00
Josh Bradley
238f1c2adc
Implement prompt tuning API (#855)
* initial setup commit

* cleanup API and CLI interfaces

* move datatype definition to types.py

* code cleanup

* add semversioner file

* remove unused import

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 15:09:00 -06:00
Josh Bradley
4bcbfd10eb
Implement query api (#839)
* initial API redesign

* typo fix

* update docstring

* update docsring

* remove artifacts caused by the merge from main

* minor typo updates

* add semversioner check

* switch API to async function calls

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-12 13:40:10 -06:00
Alonso Guevara
7fd23fa79c
Stabilize smoke tests for query community context building (#908)
* Stabilize smoke tests for query community context building

* Fix CODEOWNERS
2024-08-12 13:17:40 -06:00
Alonso Guevara
073f650ba9
Fix/json dumps ascii (#873)
* Ensure ascii false in json dumps, support for non ASCII chars

* Format

* Semver
2024-08-09 17:05:48 -06:00
Alonso Guevara
7376f149d2
Release v0.2.2 (#872) v0.2.2 2024-08-08 16:48:47 -06:00
dependabot[bot]
85a5a61340
Bump tenacity from 8.5.0 to 9.0.0 (#823)
Bumps [tenacity](https://github.com/jd/tenacity) from 8.5.0 to 9.0.0.
- [Release notes](https://github.com/jd/tenacity/releases)
- [Commits](https://github.com/jd/tenacity/compare/8.5.0...9.0.0)

---
updated-dependencies:
- dependency-name: tenacity
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-08-08 16:39:15 -06:00
dependabot[bot]
c88dbb3575
Bump json-repair from 0.25.3 to 0.26.0 (#824)
Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.25.3 to 0.26.0.
- [Release notes](https://github.com/mangiucugna/json_repair/releases)
- [Commits](https://github.com/mangiucugna/json_repair/compare/0.25.3...0.26.0)

---
updated-dependencies:
- dependency-name: json-repair
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-08 15:27:13 -06:00
Alonso Guevara
c451aa0093
Update smoke tests (#861)
* Run smoke tests on 4o

* Shorten dulce for smoke tests

* Update secrets for consistency
2024-08-08 13:07:44 -06:00
Dayenne Souza
1e10bd342e
Re-enable smoke tests (#848)
* add smoke tests again

* add smoke tests separated action

* add patch version

* disable blob test

* blob conn again

* add file as cache type

* remove cache type enterely

* increase timeout

* remove comment

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-07 12:23:46 -06:00
Nathan Evans
c749fe2a15
Docs updates aug06 (#852)
* Remove outdated references to entity resolution

* Clarify covariate extraction

* Minor edits from other PR feedback

* Remove duplicate line

* Semver

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-06 16:31:47 -07:00
Ha Trinh
8a1221e0e4
Fix community context builder for local search (#850)
* add a check for empty context

* remove log and format code

* add changelog

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-06 16:08:45 -07:00
Alonso Guevara
53268406fe
Release v0.2.1 (#835) v0.2.1 2024-08-05 18:45:28 -06:00
Alonso Guevara
bd326d2614
Only repair broken responses (#834)
* Only repair broken reponses

* Format
2024-08-05 18:25:08 -06:00
Ha Trinh
482246528d
fix json parsing logic and warning message (#833)
* fix json parsing logic and warning message

* amended warning message

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-08-05 16:31:36 -06:00
Alonso Guevara
7b656af50c
Fix embeddings loading on local search cli (#831)
* Fix embeddings loading on local search cli

* Update lockfile

* Update rules in ruff check
2024-08-05 16:00:31 -06:00
Alonso Guevara
487cb96376
Repair json when LLM returns faulty responses on non json mode (#801)
* fixed json issue

* change to use try_parse_json_object onlu

* pyproject add json-repair

* add check extra description before and after json object

* json.loads() before repire_json, based on jbradley1 suggestion.

* Fix json parsing and formatting

* semver

* Nicer tuple parsing

---------

Co-authored-by: paulg <paul.guo@iag.com.au>
2024-08-01 19:38:50 -06:00
Alonso Guevara
9020df1770
Update prompt tune prompts (#794)
* Update prompts in prompt tune

* Update prompt tuning meta prompts

* Semver

* Formatting

* Update examples
2024-08-01 17:27:08 -06:00
Ha Trinh
7e1529ac19
fix community context builder (#783)
fix and refactor community context builder

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-30 20:14:40 -06:00
Gabriel Nieves-Ponce
d26491a622
Gnievesponce/query client vectore store (#771)
* added default title_column and collection_name values for workflows using the vector store option

* incorporated vector database support to the query client

* Updated docuemnatation to reflect the new query client param.

* Fixed ruff formatting

* added new poetry lock file

---------

Co-authored-by: Gabriel Nieves-Ponce <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-30 17:59:04 -06:00
Gabriel Nieves-Ponce
fc9f29dccd
added default title_column and collection_name values for workflows u… (#677)
* added default title_column and collection_name values for workflows using the vector store option

* update poetry lockfile

* fixed ruff formatting

* ran semversioner

---------

Co-authored-by: Gabriel Nieves-Ponce <gnievesponce@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
2024-07-30 17:46:58 -06:00
dependabot[bot]
da100c7acf
Bump poethepoet from 0.26.1 to 0.27.0 (#764)
Bumps [poethepoet](https://github.com/nat-n/poethepoet) from 0.26.1 to 0.27.0.
- [Release notes](https://github.com/nat-n/poethepoet/releases)
- [Commits](https://github.com/nat-n/poethepoet/compare/v0.26.1...v0.27.0)

---
updated-dependencies:
- dependency-name: poethepoet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-30 17:17:28 -06:00
dependabot[bot]
ddbe7e1d80
Bump lancedb from 0.10.2 to 0.11.0 (#763)
Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.10.2 to 0.11.0.
- [Release notes](https://github.com/lancedb/lancedb/releases)
- [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md)
- [Commits](https://github.com/lancedb/lancedb/compare/python-v0.10.2...python-v0.11.0)

---
updated-dependencies:
- dependency-name: lancedb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-30 16:53:22 -06:00
dependabot[bot]
a1506ad26e
Bump textual from 0.72.0 to 0.74.0 (#762)
Bumps [textual](https://github.com/Textualize/textual) from 0.72.0 to 0.74.0.
- [Release notes](https://github.com/Textualize/textual/releases)
- [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Textualize/textual/compare/v0.72.0...v0.74.0)

---
updated-dependencies:
- dependency-name: textual
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-30 16:26:15 -06:00
Ha Trinh
70bd2d9a55
Fix default entity extraction prompt (#781)
* fixed default entity extraction prompts

* minor changes and formatting

* add missing parenthesis and changelog

* Updating dictionary

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-30 15:41:10 -06:00
Nathan Evans
4c181a5390
Update issues-autoresolve.yml (#780)
Switch the logic to only look for awaiting_response label
2024-07-30 11:35:55 -06:00
dependabot[bot]
64fe754397
Bump pytest from 8.3.1 to 8.3.2 (#761)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.1 to 8.3.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.3.1...8.3.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-29 16:27:24 -06:00
Chris Trevino
56db78ae38
system -> assistant (#773)
* system -> assistant

* semver
2024-07-29 14:56:55 -07:00
dependabot[bot]
5eb58b6826
Bump openai from 1.37.0 to 1.37.1 (#760)
Bumps [openai](https://github.com/openai/openai-python) from 1.37.0 to 1.37.1.
- [Release notes](https://github.com/openai/openai-python/releases)
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-python/compare/v1.37.0...v1.37.1)

---
updated-dependencies:
- dependency-name: openai
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-29 14:53:27 -06:00
dependabot[bot]
da5ed4baf4
Bump actions/stale from 5 to 9 (#759)
Bumps [actions/stale](https://github.com/actions/stale) from 5 to 9.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v5...v9)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-07-29 14:34:13 -06:00
Chris Trevino
9d99f323ea
Add encoding model to entity/claim extraction config sections (#740)
* Add encoding-model configuration to entity & claim extraction

* add change note

* pr updates

* test fix

* disable GH-based smoke tests
2024-07-26 15:05:08 -07:00
Chris Trevino
8565cd66c5
Update the ConfigReader to allow for empty chunk-by arrays (#742) 2024-07-26 14:38:44 -07:00
Chris Trevino
4c229afec8
add encoding model to text-chunking config (#743)
* add encoding model to text-chunking config

* revert groupby fix, handled in other pr

* revert environment reader update for other pr
2024-07-26 14:15:17 -07:00
Nathan Evans
971e7d91f5
Update issue templates for more explicit guidance (#738) 2024-07-26 14:52:14 -04:00
Chris Trevino
f5c9c2bee0
Add History input to cache-key, cache data (#736)
* Update caching llm to use history inputs

* formatting

* linting

* update glean sections to have continuous history
2024-07-26 09:26:37 -07:00
Chris Trevino
4e6589b614
fix config reader to allow for zero gleans (#735) 2024-07-26 09:11:34 -07:00
Chris Trevino
41451675ba
Add user input to history tracking (#734)
add user input to history tracking
2024-07-26 09:11:18 -07:00
Alonso Guevara
61b5eea347
Fix version numbering on publication (#701) 2024-07-24 22:02:52 -06:00
Alonso Guevara
c8aefb23cb
v0.2.0 (#700)
* v0.2.0

* Update 0.2.0.json

* Update CHANGELOG.md
v0.2.0
2024-07-24 23:02:03 -04:00
Alonso Guevara
ac6f240e29
Add autoresolve and update publish workflows (#698) 2024-07-24 19:54:11 -06:00
王俊
60520dc8c3
fix the llm organization parameter is ineffective during queries (#612)
* fix the organization parameter is ineffective during queries

* add semver impact document

---------

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
2024-07-24 15:52:01 -06:00
Alonso Guevara
2a95e6771c
Update issue templates (#675)
* Update issue templates

* Quick fix

* Quick fix

* Template fix

* Remove required
2024-07-24 14:01:44 -06:00
Kapil Sachdeva
e8df283b24
fix - in run_local_search(), avoid re-reading create_final_nodes.parquet file (#592) 2024-07-24 15:26:37 -04:00
Julian Whiting
a1138216f0
Fix/few shot selection (#530)
* try to always use at least 3 few shot examples

* add args for auto tune

* use context-based KNN to select most relevant chunks

* enforce at least 3 few shot examples for generated prompts

* utils for content-based KNN

* sem version

* fix callback arg

* fixes

* switch back to no op callbacks

* make n few shot, user controlled. default to 2"
2024-07-24 13:24:00 -06:00
Josh Bradley
2ddee65c29
Read/write files as binary utf-8 (#639) 2024-07-24 13:28:22 -04:00