Data Ingestion API Reference

Learn about the data ingestion endpoints and how to add content to your namespaces.

POST/v1/ingest/text

Ingest Text

Ingest text content into the namespace.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for text ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: TEXT
    • Name
      config*
      Type
      object
      Description
      Configuration for text ingestion
      • Name
        text*
        Type
        string
        Description
        Text content to ingest
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/text
curl -X POST https://api.ragaas.dev/v1/ingest/text \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "TEXT",
      "config": {
        "text": "RAGaaS is a platform for building AI applications.",
        "metadata": {
          "title": "Sample Document",
          "category": "documentation"
        }
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/text
{
  "success": true,
  "message": "Added your text ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_123"
  }
}

POST/v1/ingest/file

Ingest File

Ingest file content into the namespace.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Form Data

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    source*
    Type
    enum<string>
    Description
    Type of ingestion source
    Available options: FILE
  • Name
    file*
    Type
    file
    Description
    File to ingest
  • Name
    metadata
    Type
    object (stringified)(optional)
    Description
    Metadata to associate with the document(s)
  • Name
    chunkConfig
    Type
    object (stringified)(optional)
    Description
    Configuration for text chunking
    • Name
      chunkSize*
      Type
      number
      Description
      Size of each chunk in tokens
    • Name
      chunkOverlap*
      Type
      number
      Description
      Number of tokens to overlap between chunks

Request

POST
/v1/ingest/file
curl -X POST https://api.ragaas.dev/v1/ingest/file \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F 'namespaceId="ns_123"' \
  -F 'file=@"/Users/Downloads/sample.pdf"' \
  -F 'metadata="{\"title\": \"Sample Document\", \"category\": \"documentation\"}"'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/file
{
  "success": true,
  "message": "Added your file ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_123"
  }
}

POST/v1/ingest/urls

Ingest URLs

Ingest content from a list of URLs.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for URLs ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: URLS_LIST
    • Name
      config*
      Type
      object
      Description
      Configuration for URLs ingestion
      • Name
        urls*
        Type
        array<string>
        Description
        List of URLs to ingest
      • Name
        scrapeOptions
        Type
        object(optional)
        Description
        Options for web scraping
        • Name
          includeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to include
        • Name
          excludeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to exclude
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/urls
curl -X POST https://api.ragaas.dev/v1/ingest/urls \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "URLS_LIST",
      "config": {
        "urls": [
          "https://example.com/docs/page1",
          "https://example.com/docs/page2"
        ],
        "scrapeOptions": {
          "includeSelectors": [".content"],
          "excludeSelectors": [".navigation", ".footer"]
        }
        "metadata": {
          "source": "documentation",
          "version": "1.0"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/urls
{
  "success": true,
  "message": "Added your urls ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_124"
  }
}

POST/v1/ingest/sitemap

Ingest Sitemap

Ingest content from a sitemap.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for sitemap ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: SITEMAP
    • Name
      config*
      Type
      object
      Description
      Configuration for sitemap ingestion
      • Name
        url*
        Type
        string
        Description
        URL of the sitemap
      • Name
        scrapeOptions
        Type
        object(optional)
        Description
        Options for web scraping
        • Name
          includeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to include
        • Name
          excludeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to exclude
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/sitemap
curl -X POST https://api.ragaas.dev/v1/ingest/sitemap \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "SITEMAP",
      "config": {
        "url": "https://example.com/sitemap.xml",
        "scrapeOptions": {
          "includeSelectors": [".content"],
          "excludeSelectors": [".navigation", ".footer"]
        },
        "metadata": {
          "source": "website",
          "version": "1.0"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/sitemap
{
  "success": true,
  "message": "Added your sitemap ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_125"
  }
}

POST/v1/ingest/website

Ingest Website

Ingest content from a website.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for website ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: WEBSITE
    • Name
      config*
      Type
      object
      Description
      Configuration for website ingestion
      • Name
        url*
        Type
        string
        Description
        URL of the website
      • Name
        maxDepth
        Type
        number(optional)
        Description
        Maximum depth to crawl (1-10)
      • Name
        maxLinks
        Type
        number(optional)
        Description
        Maximum number of links to process
      • Name
        includePaths
        Type
        array<string>(optional)
        Description
        URL paths to include
      • Name
        excludePaths
        Type
        array<string>(optional)
        Description
        URL paths to exclude
      • Name
        scrapeOptions
        Type
        object(optional)
        Description
        Options for web scraping
        • Name
          includeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to include
        • Name
          excludeSelectors
          Type
          array<string>(optional)
          Description
          CSS selectors for content to exclude
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/website
curl -X POST https://api.ragaas.dev/v1/ingest/website \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "WEBSITE",
      "config": {
        "url": "https://example.com",
        "maxDepth": 3,
        "maxLinks": 100,
        "includePaths": ["/docs"],
        "excludePaths": ["/admin", "/docs/internal"],
        "scrapeOptions": {
          "includeSelectors": [".content"],
          "excludeSelectors": [".navigation", ".footer"],
        },
        "metadata": {
          "source": "website",
          "version": "1.0"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/website
{
  "success": true,
  "message": "Added your website ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_126"
  }
}

POST/v1/ingest/notion

Ingest Notion

Ingest content from a Notion database.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for Notion ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: NOTION
    • Name
      config*
      Type
      object
      Description
      Configuration for Notion ingestion
      • Name
        connectionId*
        Type
        string
        Description
        ID of the Notion connection
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/notion
curl -X POST https://api.ragaas.dev/v1/ingest/notion \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "NOTION",
      "config": {
        "connectionId": "conn_abc123",
        "metadata": {
          "source": "notion",
          "workspace": "My Workspace"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/notion
{
  "success": true,
  "message": "Added your Notion ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_128"
  }
}

POST/v1/ingest/google-drive

Ingest Google Drive

Ingest content from a Google Drive database.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for Google Drive ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: GOOGLE_DRIVE
    • Name
      config*
      Type
      object
      Description
      Configuration for Google Drive ingestion
      • Name
        connectionId*
        Type
        string
        Description
        ID of the Google Drive connection
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/google-drive
curl -X POST https://api.ragaas.dev/v1/ingest/google-drive \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "GOOGLE_DRIVE",
      "config": {
        "connectionId": "conn_abc123",
        "metadata": {
          "source": "google-drive",
          "workspace": "My Workspace"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/google-drive
{
  "success": true,
  "message": "Added your Google Drive ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_128"
  }
}

POST/v1/ingest/dropbox

Ingest Dropbox

Ingest content from a Dropbox database.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for Dropbox ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: DROPBOX
    • Name
      config*
      Type
      object
      Description
      Configuration for Dropbox ingestion
      • Name
        connectionId*
        Type
        string
        Description
        ID of the Dropbox connection
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/dropbox
curl -X POST https://api.ragaas.dev/v1/ingest/dropbox \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "DROPBOX",
      "config": {
        "connectionId": "conn_abc123",
        "metadata": {
          "source": "dropbox",
          "workspace": "My Workspace"
        },
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/dropbox
{
  "success": true,
  "message": "Added your Dropbox ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_128"
  }
}

POST/v1/ingest/onedrive

Ingest OneDrive

Ingest content from selected OneDrive files.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Request Body

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace to ingest into
  • Name
    ingestConfig*
    Type
    object
    Description
    Configuration for OneDrive ingestion
    • Name
      source*
      Type
      enum<string>
      Description
      Type of ingestion source
      Available options: ONEDRIVE
    • Name
      config*
      Type
      object
      Description
      Configuration for OneDrive ingestion
      • Name
        connectionId*
        Type
        string
        Description
        ID of the OneDrive connection
      • Name
        metadata
        Type
        object(optional)
        Description
        Metadata to associate with the document(s)
    • Name
      chunkConfig
      Type
      object(optional)
      Description
      Configuration for text chunking
      • Name
        chunkSize*
        Type
        number
        Description
        Size of each chunk in tokens
      • Name
        chunkOverlap*
        Type
        number
        Description
        Number of tokens to overlap between chunks

Request

POST
/v1/ingest/onedrive
curl -X POST https://api.ragaas.dev/v1/ingest/onedrive \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_123",
    "ingestConfig": {
      "source": "ONEDRIVE",
      "config": {
        "connectionId": "conn_abc123",
        "metadata": {
          "source": "onedrive",
          "workspace": "My Workspace"
        }
      },
      "chunkConfig": {
        "chunkSize": 1000,
        "chunkOverlap": 200
      }
    }
  }'


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      ingestJobRunId*
      Type
      string
      Description
      ID of the ingestion job run

Response

POST
/v1/ingest/onedrive
{
  "success": true,
  "message": "Added your OneDrive ingestion request to the queue successfully",
  "data": {
    "ingestJobRunId": "job_123"
  }
}

GET/v1/ingest-job-runs/:ingestJobRunId

Ingest Job Run Status

Get the status of an ingestion job run.

Authorization

  • Name
    Authorization*
    Type
    string
    Description
    Bearer token authentication. Include your API key as Bearer your_api_key

Path Parameters

  • Name
    ingestJobRunId*
    Type
    string
    Description
    ID of the ingestion job run

Query Parameters

  • Name
    namespaceId*
    Type
    string
    Description
    ID of the namespace

Request

GET
/v1/ingest-job-runs/:ingestJobRunId
curl -X GET "https://api.ragaas.dev/v1/ingest-job-runs/ijr_abc123?namespaceId=ns_abc123" \
  -H "Authorization: Bearer $RAGAAS_API_KEY"


Response Body

  • Name
    success*
    Type
    boolean
    Description
    Indicates whether the request is successful or not. This is always true for success responses.
  • Name
    message*
    Type
    string
    Description
    Human readable message mentioning the result of the request
  • Name
    data*
    Type
    object
    Description
    Data returned from the API.
    • Name
      id*
      Type
      string
      Description
      ID of the ingestion job run
    • Name
      status*
      Type
      enum<string>
      Description
      Current status of the ingestion job
      Available options: QUEUED, PRE_PROCESSING, PROCESSING, COMPLETED
    • Name
      documents*
      Type
      object
      Description
      Status of individual documents
      • Name
        queued*
        Type
        array<object>
        Description
        Documents waiting to be processed
        • Name
          id*
          Type
          string
          Description
          Document ID
        • Name
          status*
          Type
          string
          Description
          Document status
        • Name
          error*
          Type
          string | null
          Description
          Error message if any
      • Name
        processing*
        Type
        array<object>
        Description
        Documents currently being processed
        • Name
          id*
          Type
          string
          Description
          Document ID
        • Name
          status*
          Type
          string
          Description
          Document status
        • Name
          error*
          Type
          string | null
          Description
          Error message if any
      • Name
        completed*
        Type
        array<object>
        Description
        Documents that have been processed
        • Name
          id*
          Type
          string
          Description
          Document ID
        • Name
          status*
          Type
          string
          Description
          Document status
        • Name
          error*
          Type
          string | null
          Description
          Error message if any
      • Name
        failed*
        Type
        array<object>
        Description
        Documents that failed processing
        • Name
          id*
          Type
          string
          Description
          Document ID
        • Name
          status*
          Type
          string
          Description
          Document status
        • Name
          error*
          Type
          string | null
          Description
          Error message if any

Response

GET
/v1/ingest-job-runs/:ingestJobRunId
{
  "success": true,
  "message": "Fetched the ingestion job run status successfully",
  "data": {
    "id": "ijr_abc123",
    "status": "PROCESSING",
    "documents": {
      "queued": [
        { "id": "doc_1", "status": "QUEUED", "error": null }
      ],
      "processing": [
        { "id": "doc_2", "status": "PROCESSING", "error": null }
      ],
      "completed": [
        { "id": "doc_3", "status": "SUCCESS", "error": null }
      ],
      "failed": [
        {
          "id": "doc_4",
          "status": "FAILED",
          "error": "File format not supported"
        }
      ]
    }
  }
}

Error Codes

  • Name
    NAMESPACE_NOT_FOUND
    Description

    The specified namespace does not exist

  • Name
    WEB_SCRAPER_CONFIG_NOT_SET
    Description

    Web scraper config is not set for the namespace

  • Name
    INVALID_INGEST_CONFIG
    Description

    Invalid ingestion configuration provided

  • Name
    INVALID_CHUNK_CONFIG
    Description

    Invalid chunk configuration provided

  • Name
    INVALID_SCRAPE_OPTIONS
    Description

    Invalid scraping options provided

  • Name
    INGESTION_FAILED
    Description

    Internal error during ingestion process

Chunk Configuration

  • Name
    Chunk Size
    Description

    Number of characters per chunk (default: 1000)

  • Name
    Chunk Overlap
    Description

    Number of overlapping characters between chunks (default: 200)

Scrape Options

  • Name
    Include Selectors
    Description

    CSS selectors of elements to include in the scraped content

  • Name
    Exclude Selectors
    Description

    CSS selectors of elements to exclude from the scraped content

Website Ingestion Options

  • Name
    Max Depth
    Description
    Maximum crawl depth from start URL
  • Name
    Max Links
    Description

    Maximum number of website links to scrape

  • Name
    Include Paths
    Description

    Include only those URLs which contain these paths

  • Name
    Exclude Paths
    Description

    Exclude all those URLs which contain these paths