📃Adding Sources
Sources are the data with whom you will train your bots. The amount of data you can use (in terms of words) depends on your account Plan.
Data Types
For now, you can use data's coming from website (URLs) or documents (pdf, doc, docx...).
For PDFs, even scanned files can be used thanks to a solution called tesseract.
Tesseract OCR is an open-source optical character recognition engine that is widely used for converting images of text into editable and searchable documents
Web sources (links)
The endpoint to add Links as sources is : https://www.owlbot.ai/api/weblinks/learn
This endpoint only accept POST Request with the good authentication token as explained in the dedidcated section.
Here is the parameters you have to provided :
Remember that the method used to extract data from the links you provide is called scrapping.
As per the general rules of internet etiquette and legal standard, you are only allowed to scrape data from websites that you own or for which you have explicit written permission from the owner. The unauthorized scraping of data from a website that you do not own is considered a violation of copyright laws and the website's Terms of Service.
Web scraping may seem harmless, but it can have serious consequences. Unauthorized web scraping can lead to legal repercussions, including potential lawsuits for copyright infringement and privacy violations. Moreover, it can burden the web servers, leading to performance issues and even outages, which negatively impact all users.
Adding File / Document
The endpoint to add Links as sources is : https://www.owlbot.ai/api/upload
We do support POSTing binary files directly to our storage space. You have to use formData
to do so.
Response
Response is a HTTP 200 with a JSON source object :
Where uuid is the unique identifier of the uploaded file.
Some examples :
In Javascript / Fetch
Supported File Type
Last updated