* Feat/headless browser (retargeted) (#1832)
* Add headless browser to the WebSurferAgent, closes#1481
* replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup)
* import HeadlessChromeBrowser
* implicitly wait for 10s
* inicrease max. wait time to 99s
* fix: trim trailing whitespace
* test: fix headless tests
* better bing query search
* docs: add example 3 for headless option
---------
Co-authored-by: Vijay Ramesh <vijay@regrello.com>
* Handle missing Selenium package.
* Added browser_chat.py example to simplify testing.
* Based browser on mdconvert. (#1847)
* Based browser on mdconvert.
* Updated web_surfer.
* Renamed HeadlessChromeBrowser to SeleniumChromeBrowser
* Added an initial POC with Playwright.
* Separated Bing search into it's own utility module.
* Simple browser now uses Bing tools.
* Updated Playwright browser to inherit from SimpleTextBrowser
* Got Selenium working too.
* Renamed classes and files for consistency.
* Added more instructions.
* Initial work to support other search providers.
* Added some basic behavior when the BING_API_KEY is missing.
* Cleaned up some search results.
* Moved to using the request.Sessions object. Moved Bing SERP paring to mdconvert to be more broadly useful.
* Added backward compatibility to WebSurferAgent
* Selenium and Playwright now grab the whole DOM, not jus the body, allowing the converters access to metadata.
* Fixed printing of page titles in Playwright.
* Moved installation of WebSurfer dependencies to contrib-tests.yml
* Fixing pre-commit issues.
* Reverting conversable_agent, which should not have been changed in prior commit.
* Added RequestMarkdownBrowser tests.
* Fixed a bug with Bing search, and added search test cases.
* Added tests for Bing search.
* Added tests for md_convert
* Added test files.
* Added missing pptx.
* Added more tests for WebSurfer coverage.
* Fixed guard on requests_markdown_browser test.
* Updated test coverage for mdconvert.
* Fix brwser_utils tests.
* Removed image test from browser, since exiftool isn't installed on test machine.
* Removed image test from browser, since exiftool isn't installed on test machine.
* Disable Selenium GPU and sandbox to ensure it runs headless in Docker.
* Added option for Bing API results to be interleaved (as Bing specifies), or presented in a categorized list (Web, News, Videos), etc
* Print more details when requests exceptions are thrown.
* Added additional documentation to markdown_search
* Added documentation to the selenium_markdown_browser.
* Added documentation to playwright_markdown_browser.py
* Added documentation to requests_markdown_browser
* Added documentation to mdconvert.py
* Updated agentchat_surfer notebook.
* Update .github/workflows/contrib-tests.yml
Co-authored-by: Davor Runje <davor@airt.ai>
* Merge main. Resolve conflicts.
* Resolve pre-commit checks.
* Removed offending LFS file.
* Re-added offending LFS file.
* Fixed browser_utils tests.
* Fixed style errors.
---------
Co-authored-by: Asapanna Rakesh <45640029+INF800@users.noreply.github.com>
Co-authored-by: Vijay Ramesh <vijay@regrello.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
Co-authored-by: Davor Runje <davor@airt.ai>