Data Ingestion API Reference
Learn about the data ingestion endpoints and how to add content to your namespaces.
Data ingestion is the process of adding content to your namespaces. RAGaaS supports multiple ingestion methods to handle different content sources.
Ingest Text
Ingest text content into the namespace.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for text ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
TEXT
- Name
config
*- Type
- object
- Description
- Configuration for text ingestion
- Name
text
*- Type
- string
- Description
- Text content to ingest
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/text \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "TEXT",
"config": {
"text": "RAGaaS is a platform for building AI applications.",
"metadata": {
"title": "Sample Document",
"category": "documentation"
}
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your text ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_123"
}
}
Ingest File
Ingest file content into the namespace.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Form Data
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
FILE
- Name
file
*- Type
- file
- Description
- File to ingest
- Name
metadata
- Type
- object (stringified)(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object (stringified)(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/file \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F 'namespaceId="ns_123"' \
-F 'file=@"/Users/Downloads/sample.pdf"' \
-F 'metadata="{\"title\": \"Sample Document\", \"category\": \"documentation\"}"'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your file ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_123"
}
}
Ingest URLs
Ingest content from a list of URLs.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for URLs ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
URLS_LIST
- Name
config
*- Type
- object
- Description
- Configuration for URLs ingestion
- Name
urls
*- Type
- array<string>
- Description
- List of URLs to ingest
- Name
scrapeOptions
- Type
- object(optional)
- Description
- Options for web scraping
- Name
includeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to include
- Name
excludeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to exclude
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/urls \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "URLS_LIST",
"config": {
"urls": [
"https://example.com/docs/page1",
"https://example.com/docs/page2"
],
"scrapeOptions": {
"includeSelectors": [".content"],
"excludeSelectors": [".navigation", ".footer"]
}
"metadata": {
"source": "documentation",
"version": "1.0"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your urls ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_124"
}
}
Ingest Sitemap
Ingest content from a sitemap.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for sitemap ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
SITEMAP
- Name
config
*- Type
- object
- Description
- Configuration for sitemap ingestion
- Name
url
*- Type
- string
- Description
- URL of the sitemap
- Name
scrapeOptions
- Type
- object(optional)
- Description
- Options for web scraping
- Name
includeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to include
- Name
excludeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to exclude
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/sitemap \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "SITEMAP",
"config": {
"url": "https://example.com/sitemap.xml",
"scrapeOptions": {
"includeSelectors": [".content"],
"excludeSelectors": [".navigation", ".footer"]
},
"metadata": {
"source": "website",
"version": "1.0"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your sitemap ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_125"
}
}
Ingest Website
Ingest content from a website.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for website ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
WEBSITE
- Name
config
*- Type
- object
- Description
- Configuration for website ingestion
- Name
url
*- Type
- string
- Description
- URL of the website
- Name
maxDepth
- Type
- number(optional)
- Description
- Maximum depth to crawl (1-10)
- Name
maxLinks
- Type
- number(optional)
- Description
- Maximum number of links to process
- Name
includePaths
- Type
- array<string>(optional)
- Description
- URL paths to include
- Name
excludePaths
- Type
- array<string>(optional)
- Description
- URL paths to exclude
- Name
scrapeOptions
- Type
- object(optional)
- Description
- Options for web scraping
- Name
includeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to include
- Name
excludeSelectors
- Type
- array<string>(optional)
- Description
- CSS selectors for content to exclude
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/website \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "WEBSITE",
"config": {
"url": "https://example.com",
"maxDepth": 3,
"maxLinks": 100,
"includePaths": ["/docs"],
"excludePaths": ["/admin", "/docs/internal"],
"scrapeOptions": {
"includeSelectors": [".content"],
"excludeSelectors": [".navigation", ".footer"],
},
"metadata": {
"source": "website",
"version": "1.0"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your website ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_126"
}
}
Ingest Notion
Ingest content from a Notion database.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for Notion ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
NOTION
- Name
config
*- Type
- object
- Description
- Configuration for Notion ingestion
- Name
connectionId
*- Type
- string
- Description
- ID of the Notion connection
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/notion \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "NOTION",
"config": {
"connectionId": "conn_abc123",
"metadata": {
"source": "notion",
"workspace": "My Workspace"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your Notion ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_128"
}
}
Ingest Google Drive
Ingest content from a Google Drive database.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for Google Drive ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
GOOGLE_DRIVE
- Name
config
*- Type
- object
- Description
- Configuration for Google Drive ingestion
- Name
connectionId
*- Type
- string
- Description
- ID of the Google Drive connection
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/google-drive \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "GOOGLE_DRIVE",
"config": {
"connectionId": "conn_abc123",
"metadata": {
"source": "google-drive",
"workspace": "My Workspace"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your Google Drive ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_128"
}
}
Ingest Dropbox
Ingest content from a Dropbox database.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for Dropbox ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
DROPBOX
- Name
config
*- Type
- object
- Description
- Configuration for Dropbox ingestion
- Name
connectionId
*- Type
- string
- Description
- ID of the Dropbox connection
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/dropbox \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "DROPBOX",
"config": {
"connectionId": "conn_abc123",
"metadata": {
"source": "dropbox",
"workspace": "My Workspace"
},
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your Dropbox ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_128"
}
}
Ingest OneDrive
Ingest content from selected OneDrive files.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Request Body
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace to ingest into
- Name
ingestConfig
*- Type
- object
- Description
- Configuration for OneDrive ingestion
- Name
source
*- Type
- enum<string>
- Description
- Type of ingestion sourceAvailable options:
ONEDRIVE
- Name
config
*- Type
- object
- Description
- Configuration for OneDrive ingestion
- Name
connectionId
*- Type
- string
- Description
- ID of the OneDrive connection
- Name
metadata
- Type
- object(optional)
- Description
- Metadata to associate with the document(s)
- Name
chunkConfig
- Type
- object(optional)
- Description
- Configuration for text chunking
- Name
chunkSize
*- Type
- number
- Description
- Size of each chunk in tokens
- Name
chunkOverlap
*- Type
- number
- Description
- Number of tokens to overlap between chunks
Request
curl -X POST https://api.ragaas.dev/v1/ingest/onedrive \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_123",
"ingestConfig": {
"source": "ONEDRIVE",
"config": {
"connectionId": "conn_abc123",
"metadata": {
"source": "onedrive",
"workspace": "My Workspace"
}
},
"chunkConfig": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}
}'
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Response
{
"success": true,
"message": "Added your OneDrive ingestion request to the queue successfully",
"data": {
"ingestJobRunId": "job_123"
}
}
Ingest Job Run Status
Get the status of an ingestion job run.
Authorization
- Name
Authorization
*- Type
- string
- Description
- Bearer token authentication. Include your API key as
Bearer your_api_key
Path Parameters
- Name
ingestJobRunId
*- Type
- string
- Description
- ID of the ingestion job run
Query Parameters
- Name
namespaceId
*- Type
- string
- Description
- ID of the namespace
Request
curl -X GET "https://api.ragaas.dev/v1/ingest-job-runs/ijr_abc123?namespaceId=ns_abc123" \
-H "Authorization: Bearer $RAGAAS_API_KEY"
Response Body
- Name
success
*- Type
- boolean
- Description
- Indicates whether the request is successful or not. This is always true for success responses.
- Name
message
*- Type
- string
- Description
- Human readable message mentioning the result of the request
- Name
data
*- Type
- object
- Description
- Data returned from the API.
- Name
id
*- Type
- string
- Description
- ID of the ingestion job run
- Name
status
*- Type
- enum<string>
- Description
- Current status of the ingestion jobAvailable options:
QUEUED
,PRE_PROCESSING
,PROCESSING
,COMPLETED
- Name
documents
*- Type
- object
- Description
- Status of individual documents
- Name
queued
*- Type
- array<object>
- Description
- Documents waiting to be processed
- Name
id
*- Type
- string
- Description
- Document ID
- Name
status
*- Type
- string
- Description
- Document status
- Name
error
*- Type
- string | null
- Description
- Error message if any
- Name
processing
*- Type
- array<object>
- Description
- Documents currently being processed
- Name
id
*- Type
- string
- Description
- Document ID
- Name
status
*- Type
- string
- Description
- Document status
- Name
error
*- Type
- string | null
- Description
- Error message if any
- Name
completed
*- Type
- array<object>
- Description
- Documents that have been processed
- Name
id
*- Type
- string
- Description
- Document ID
- Name
status
*- Type
- string
- Description
- Document status
- Name
error
*- Type
- string | null
- Description
- Error message if any
- Name
failed
*- Type
- array<object>
- Description
- Documents that failed processing
- Name
id
*- Type
- string
- Description
- Document ID
- Name
status
*- Type
- string
- Description
- Document status
- Name
error
*- Type
- string | null
- Description
- Error message if any
Response
{
"success": true,
"message": "Fetched the ingestion job run status successfully",
"data": {
"id": "ijr_abc123",
"status": "PROCESSING",
"documents": {
"queued": [
{ "id": "doc_1", "status": "QUEUED", "error": null }
],
"processing": [
{ "id": "doc_2", "status": "PROCESSING", "error": null }
],
"completed": [
{ "id": "doc_3", "status": "SUCCESS", "error": null }
],
"failed": [
{
"id": "doc_4",
"status": "FAILED",
"error": "File format not supported"
}
]
}
}
}
Error Codes
- Name
NAMESPACE_NOT_FOUND
- Description
The specified namespace does not exist
- Name
WEB_SCRAPER_CONFIG_NOT_SET
- Description
Web scraper config is not set for the namespace
- Name
INVALID_INGEST_CONFIG
- Description
Invalid ingestion configuration provided
- Name
INVALID_CHUNK_CONFIG
- Description
Invalid chunk configuration provided
- Name
INVALID_SCRAPE_OPTIONS
- Description
Invalid scraping options provided
- Name
INGESTION_FAILED
- Description
Internal error during ingestion process
Chunk Configuration
- Name
Chunk Size
- Description
Number of characters per chunk (default: 1000)
- Name
Chunk Overlap
- Description
Number of overlapping characters between chunks (default: 200)
Scrape Options
- Name
Include Selectors
- Description
CSS selectors of elements to include in the scraped content
- Name
Exclude Selectors
- Description
CSS selectors of elements to exclude from the scraped content
Website Ingestion Options
- Name
Max Depth
- Description
- Maximum crawl depth from start URL
- Name
Max Links
- Description
Maximum number of website links to scrape
- Name
Include Paths
- Description
Include only those URLs which contain these paths
- Name
Exclude Paths
- Description
Exclude all those URLs which contain these paths