176 Commits

Author SHA1 Message Date
Le Minh Duc
4d5f9ba39c
ci: add commitlint (#170) v0.4.3 2024-09-01 23:10:03 +07:00
kan_cin
041d229282
feat: add test connection feature (#166)
* feat: add test connection feature

* fix: typo
v0.4.2
2024-09-01 08:22:36 +07:00
Tadashi
c1e8c37e5e
fix: update packaging script (bump:patch) v0.4.1 2024-08-31 07:07:28 +07:00
Tadashi
7daa9eb149
docs: update demo URL (bump:minor) v0.4.0 2024-08-30 23:46:18 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
09f8f91510
docs: update README (#157) 2024-08-30 23:29:31 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
9354ad8241
fix: update default settings and local model guide (#156) 2024-08-30 23:18:31 +07:00
Quang (Albert)
4b2b334d2c
fix: refine kotaemon/pyproject.toml (#153) 2024-08-30 23:02:14 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
d880294153
fix: pwd change in setttings (#147) 2024-08-29 13:41:12 +07:00
ian
971ffcc9d0
add github star history (#137) 2024-08-28 17:19:20 +07:00
Quang (Albert)
fcefb80fa6
feat: Add contribution templates (#none) (#139)
* feat: Add PR template

* feat: Add issue templates

* style: Comfort pre-commit

* style: Comfort pre-commit
2024-08-28 17:18:50 +07:00
John Freier
1cdefe7ba3
Update mkdocs.yml (#129)
Documentation Navigation URL Fix
v0.3.6
2024-08-28 06:37:17 +07:00
ian
5946fd33de
change default bump to patch, don't create release if there is no bump (#126) v0.3.5 2024-08-28 06:30:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
bb56ef4f8e
chore: update workflow (#124) 2024-08-26 09:52:16 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
2570e11501
feat: merge develop (#123)
* Support hybrid vector retrieval

* Enable figures and table reading in Azure DI

* Retrieve with multi-modal

* Fix mixing up table

* Add txt loader

* Add Anthropic Chat

* Raising error when retrieving help file

* Allow same filename for different people if private is True

* Allow declaring extra LLM vendors

* Show chunks on the File page

* Allow elasticsearch to get more docs

* Fix Cohere response (#86)

* Fix Cohere response

* Remove Adobe pdfservice from dependency

kotaemon doesn't rely more pdfservice for its core functionality,
and pdfservice uses very out-dated dependency that causes conflict.

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* Add confidence score (#87)

* Save question answering data as a log file

* Save the original information besides the rewritten info

* Export Cohere relevance score as confidence score

* Fix style check

* Upgrade the confidence score appearance (#90)

* Highlight the relevance score

* Round relevance score. Get key from config instead of env

* Cohere return all scores

* Display relevance score for image

* Remove columns and rows in Excel loader which contains all NaN (#91)

* remove columns and rows which contains all NaN

* back to multiple joiner options

* Fix style

---------

Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: trducng <trungduc1992@gmail.com>

* Track retriever state

* Bump llama-index version 0.10

* feat/save-azuredi-mhtml-to-markdown (#93)

* feat/save-azuredi-mhtml-to-markdown

* fix: replace os.path to pathlib change theflow.settings

* refactor: base on pre-commit

* chore: move the func of saving content markdown above removed_spans

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: losing first chunk (#94)

* fix: losing first chunk.

* fix: update the method of preventing losing chunks

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: adding the base64 image in markdown (#95)

* feat: more chunk info on UI

* fix: error when reindexing files

* refactor: allow more information exception trace when using gpt4v

* feat: add excel reader that treats each worksheet as a document

* Persist loader information when indexing file

* feat: allow hiding unneeded setting panels

* feat: allow specific timezone when creating conversation

* feat: add more confidence score (#96)

* Allow a list of rerankers

* Export llm reranking score instead of filter with boolean

* Get logprobs from LLMs

* Rename cohere reranking score

* Call 2 rerankers at once

* Run QA pipeline for each chunk to get qa_score

* Display more relevance scores

* Define another LLMScoring instead of editing the original one

* Export logprobs instead of probs

* Call LLMScoring

* Get qa_score only in the final answer

* feat: replace text length with token in file list

* ui: show index name instead of id in the settings

* feat(ai): restrict the vision temperature

* fix(ui): remove the misleading message about non-retrieved evidences

* feat(ui): show the reasoning name and description in the reasoning setting page

* feat(ui): show version on the main windows

* feat(ui): show default llm name in the setting page

* fix(conf): append the result of doc in llm_scoring (#97)

* fix: constraint maximum number of images

* feat(ui): allow filter file by name in file list page

* Fix exceeding token length error for OpenAI embeddings by chunking then averaging (#99)

* Average embeddings in case the text exceeds max size

* Add docstring

* fix: Allow empty string when calling embedding

* fix: update trulens LLM ranking score for retrieval confidence, improve citation (#98)

* Round when displaying not by default

* Add LLMTrulens reranking model

* Use llmtrulensscoring in pipeline

* fix: update UI display for trulen score

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* feat: add question decomposition & few-shot rewrite pipeline (#89)

* Create few-shot query-rewriting. Run and display the result in info_panel

* Fix style check

* Put the functions to separate modules

* Add zero-shot question decomposition

* Fix fewshot rewriting

* Add default few-shot examples

* Fix decompose question

* Fix importing rewriting pipelines

* fix: update decompose logic in fullQA pipeline

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add encoding utf-8 when save temporal markdown in vectorIndex (#101)

* fix: improve retrieval pipeline and relevant score display (#102)

* fix: improve retrieval pipeline by extending first round top_k with multiplier

* fix: minor fix

* feat: improve UI default settings and add quick switch option for pipeline

* fix: improve agent logics (#103)

* fix: improve agent progres display

* fix: update retrieval logic

* fix: UI display

* fix: less verbose debug log

* feat: add warning message for low confidence

* fix: LLM scoring enabled by default

* fix: minor update logics

* fix: hotfix image citation

* feat: update docx loader for handle merged table cells + handle zip file upload (#104)

* feat: update docx loader for handle merged table cells

* feat: handle zip file

* refactor: pre-commit

* fix: escape text in download UI

* feat: optimize vector store query db (#105)

* feat: optimize vector store query db

* feat: add file_id to chroma metadatas

* feat: remove unnecessary logs and update migrate script

* feat: iterate through file index

* fix: remove unused code

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add openai embedidng exponential back-off

* fix: update import download_loader

* refactor: codespell

* fix: update some default settings

* fix: update installation instruction

* fix: default chunk length in simple QA

* feat: add share converstation feature and enable retrieval history (#108)

* feat: add share converstation feature and enable retrieval history

* fix: update share conversation UI

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: allow exponential backoff for failed OCR call (#109)

* fix: update default prompt when no retrieval is used

* fix: create embedding for long image chunks

* fix: add exception handling for additional table retriever

* fix: clean conversation & file selection UI

* fix: elastic search with empty doc_ids

* feat: add thumbnail PDF reader for quick multimodal QA

* feat: add thumbnail handling logic in indexing

* fix: UI text update

* fix: PDF thumb loader page number logic

* feat: add quick indexing pipeline and update UI

* feat: add conv name suggestion

* fix: minor UI change

* feat: citation in thread

* fix: add conv name suggestion in regen

* chore: add assets for usage doc

* chore: update usage doc

* feat: pdf viewer (#110)

* feat: update pdfviewer

* feat: update missing files

* fix: update rendering logic of infor panel

* fix: improve thumbnail retrieval logic

* fix: update PDF evidence rendering logic

* fix: remove pdfjs built dist

* fix: reduce thumbnail evidence count

* chore: update gitignore

* fix: add js event on chat msg select

* fix: update css for viewer

* fix: add env var for PDFJS prebuilt

* fix: move language setting to reasoning utils

---------

Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: trducng <trungduc1992@gmail.com>

* feat: graph rag (#116)

* fix: reload server when add/delete index

* fix: rework indexing pipeline to be able to disable vectorstore and splitter if needed

* feat: add graphRAG index with plot view

* fix: update requirement for graphRAG and lighten unnecessary packages

* feat: add knowledge network index (#118)

* feat: add Knowledge Network index

* fix: update reader mode setting for knet

* fix: update init knet

* fix: update collection name to index pipeline

* fix: missing req

---------

Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>

* fix: update info panel return for graphrag

* fix: retriever setting graphrag

* feat: local llm settings (#122)

* feat: expose context length as reasoning setting to better fit local models

* fix: update context length setting for agents

* fix: rework threadpool llm call

* fix: fix improve indexing logic

* fix: fix improve UI

* feat: add lancedb

* fix: improve lancedb logic

* feat: add lancedb vectorstore

* fix: lighten requirement

* fix: improve lanceDB vs

* fix: improve UI

* fix: openai retry

* fix: update reqs

* fix: update launch command

* feat: update Dockerfile

* feat: add plot history

* fix: update default config

* fix: remove verbose print

* fix: update default setting

* fix: update gradio plot return

* fix: default gradio tmp

* fix: improve lancedb docstore

* fix: fix question decompose pipeline

* feat: add multimodal reader in UI

* fix: udpate docs

* fix: update default settings & docker build

* fix: update app startup

* chore: update documentation

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
Co-authored-by: cin-ace <ace@cinnamon.is>
Co-authored-by: Linh Nguyen <70562198+linhnguyen-cinnamon@users.noreply.github.com>
Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: cin-jacky <101088014+jacky0218@users.noreply.github.com>
Co-authored-by: jacky0218 <jacky0218@github.com>
Co-authored-by: kan_cin <kan@cinnamon.is>
Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>
2024-08-26 08:50:37 +07:00
ian
86d60e1649
Update docs (#88)
Co-authored-by: ian <ian@cinnamon.is>
2024-05-31 17:49:02 +07:00
trducng
ebf1315569
(pump:minor) Allow the indexing pipeline to report the indexing progress onto the UI (#81)
* Turn the file indexing event to generator to report progress

* Fix React text's trimming function

* Refactor delete file into a method
2024-05-25 22:09:41 +07:00
trducng
56dfc8fb53
Allow the application name to be configurable in settings (#80)
* Make app name configurable

* Use app name in browser tab
2024-05-20 22:37:24 +07:00
trducng (john)
04e602161b
Fix Yaml datetime format (#79) 2024-05-20 17:36:14 +07:00
trducng (john)
5ca3c25404
Avoid empty chat message (#78) 2024-05-20 16:20:50 +07:00
ian_Cin
b2296cfcdf
(bump:patch) Feat: Show app version in the Help page (#68)
* typo

* show version in the Help page

* update docs

* pump duckduckgo-search

* allow app version to be set by env var
v0.3.4
2024-05-16 14:27:51 +07:00
ian_Cin
bd34facddc
(bump:patch) force push with set upstream (#67) v0.3.3 2024-05-15 18:15:03 +07:00
ian_Cin
a122dc0a94
(bump:patch) Fix: llama-cpp-python security bug and setup local latest branch in github action (#66)
* update llama-cpp-python version in response to https://github.com/Cinnamon/kotaemon/security/dependabot/1

* setup local latest branch in github action
v0.3.2
2024-05-15 17:57:37 +07:00
ian_Cin
fc35f9f918
(bump:patch) push latest branch after update pointer (#64)
* push latest branch after update pointer

* force push update latest pointer

* update docs
v0.3.1
2024-05-15 17:22:41 +07:00
ian_Cin
654501e01c
(bump:minor) Feat: Add mechanism for user-site update and auto creating releases (#56)
* move flowsettings.py and launch.py to root

* update docs

* sync sub package versions

* rename launch.py to app.py and make run scripts work with installation package

* add update scripts

* auto version for root package

* rename authors and update doc dir

* Update auto-bump-and-release.yaml to trigger on push to main branch

* latest as branch instead of tag

* pin deps versions

* cache the changelogs
v0.3.0
2024-05-15 16:34:50 +07:00
ian_Cin
eb198e0ff3
fix bug in delete file, remove file delete confirmation (#59) 2024-05-09 16:21:56 +07:00
Albert
466adf2d94
Feat/Add ReAct and ReWOO Reasoning Pipelines (#43)
* Add ReactAgentPipeline by wrapping the ReactAgent

* Implement stream processing for ReactAgentPipeline and RewooAgentPipeline

* Fix highlight_citation in Rewoo and remove highlight_citation from React

* Fix importing ktem.llms inside kotaemon

* fix: Change Rewoo::solver's output to LLMInterface instead of plain text

* Add more user_settings to the RewooAgentPipeline

* Fix LLMTool

* Add more user_settings to the ReactAgentPipeline

* Minor fix

* Stream the react agent immediately

* Yield the Rewoo progress to info panel

* Hide the agent in flowsettings

* Remove redundant comments

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2024-05-09 16:06:24 +07:00
Duc Nguyen (john)
ec11b54ff2
Add Azure AI Document Intelligence loader (#52)
* Add azureai document intelligence loader

* Add load_data interface to Azure DI

* Bump version

* Access azure credentials from environment variables
2024-04-29 14:49:55 +07:00
ian_Cin
bbe862fe47
Update docs (#49) 2024-04-25 17:33:19 +07:00
Duc Nguyen (john)
a8725710af
Allow users to select reasoning pipeline. Fix small issues with user UI, cohere name (#50)
* Fix user page

* Allow changing LLM in reasoning pipeline

* Fix CohereEmbedding name
2024-04-25 17:18:12 +07:00
Duc Nguyen (john)
e29bec6275
Allow file index to be private (#45)
* Fix breaking reranker

* Allow private file index

* Avoid setting default to 1 when user management is enabled
2024-04-25 14:24:35 +07:00
Duc Nguyen (john)
456f020caf
Enable MHTML reader (#44)
* Enable mhtml loader

* Use default supported file types

* Add tests and bump version
2024-04-23 14:16:24 +07:00
Duc Nguyen (john)
fbe983ccb3
Add relevant chat context when query the index (#42)
* Add context for query

* Add older messages in the chat

* Update the indexing

* Make some hard-code values configurable

* Remove hard-code values
2024-04-22 14:32:30 +07:00
Duc Nguyen (john)
749c9e5641
Remove redundant attributes in the file index (#41) 2024-04-20 18:21:32 +07:00
Duc Nguyen (john)
c6045bcb9f
Update the Citation pipeline according to new OpenAI function call interface (#40) 2024-04-20 01:12:23 +07:00
Duc Nguyen (john)
1b2082a140
Allow file selector to be disabled (#36)
* Allow file selector to be disabled

* Update docs and variable names
2024-04-16 18:43:56 +07:00
ian_Cin
e19893a509
fix typo (#35) 2024-04-15 23:16:32 +07:00
ian_Cin
1130aa78d1
add demo gif (#34) 2024-04-15 22:57:02 +07:00
ian_Cin
5286ff48bc
Fix info panel overflow (#33)
* update chatbot placeholder

* fix chat info panel overflow bug

* set azure_endpoint to required in AzureChatOpenAI

* update screenshots
2024-04-14 09:34:14 +07:00
ian_Cin
8985963e1e
Setup app data dir (#32)
* setup local data dir

* update readme

* update chat panel

* update help page
2024-04-13 23:26:06 +07:00
Duc Nguyen (john)
0417610d3e
Refactor reasoning pipeline (#31)
* Move the text rendering out for reusability

* Refactor common operations in the reasoning pipeline

* Add run method

* Provide dedicated method for invoke
2024-04-13 23:13:04 +07:00
ian_Cin
af38708b77
Setup root toml file and stop gradio auto reloading (#30)
* stop gradio auto reload

* setup root toml file
2024-04-13 18:59:24 +07:00
ian_Cin
4022af7e9b
allow LlamaCppChat to auto download model from hf hub (#29) 2024-04-13 18:57:04 +07:00
Duc Nguyen (john)
917fb0a082
Treat index id as auto-generated field (#27)
* Treat index id as auto-generated field

* fix Can't create index: KeyError: 'embedding' #28

* udpate docs

* Update requirement

* Use lighter default local embedding model

---------

Co-authored-by: ian <ian@cinnamon.is>
2024-04-13 18:29:37 +07:00
Duc Nguyen (john)
66905d39c4
Allow adding, updating and deleting indices (#24)
* Allow adding indices

* Allow deleting indices

* Allow updating the indices

* When there are multiple indices, group them below Indices tab

* Update elem classes
2024-04-12 15:41:09 +07:00
ian_Cin
4efe9c02a8
Update documentations (#23) 2024-04-11 19:41:45 +07:00
Duc Nguyen (john)
5ce6bac03d
Allow listing indices (#22) 2024-04-11 16:28:04 +07:00
Duc Nguyen (john)
3ed50b0f10
Improve LLMs and Embedding models resources experience (#21)
* Fix inconsistent default values
* Disallow LLM's empty name. Handle LLM creation error on UI
2024-04-11 07:50:53 +07:00
Duc Nguyen (john)
f3e82b2e70
Put the preparation step in FileIndex to on_start (#20) 2024-04-10 19:30:45 +07:00
ian_Cin
b507eef541
Improve manuals (#19)
* Rename Admin -> Resources
* Improve ui
* Update docs
2024-04-10 17:04:04 +07:00
Duc Nguyen (john)
7b3307e3c4
Provide embedding manager (#16)
* Provide the Embedding management UI

* Update Fastembed documentation

* Add validation when adding / updating embeddings

* Stop using the old ktem embeddings manager

* Set default local embedding models

* Move the local embeddings below in flowsettings

* Update flowsettings
2024-04-10 15:11:44 +07:00