Document Management

Learn how to manage and retrieve your documents effectively in RAGaaS.

Overview

RAGaaS provides a powerful document management system that allows you to:

  • Track all ingested documents
  • Filter documents by various criteria
  • Update document metadata
  • Organize content across namespaces

Document Structure

Each document in RAGaaS has:

  • A unique identifier (id) - System-generated unique ID
  • An external identifier (externalId) - Generated by external system
  • A document type (e.g., TEXT, PDF, DOCX) - Based on ingestion source
  • Custom metadata - Your organization-specific data
  • Ingestion status - Current processing state
  • Creation and update timestamps

Example document:

{
  "id": "doc_abc123",
  "externalId": "contract-2024-01",
  "documentType": "PDF",
  "ingestionStatus": "SUCCESS",
  "metadata": {
    "department": "legal",
    "category": "contracts",
    "version": "1.0",
    "author": "jane.doe",
    "lastUpdated": "2024-01-15T10:00:00Z"
  },
  "createdAt": "2024-01-15T10:00:00Z",
  "updatedAt": "2024-01-15T10:00:00Z"
}

Using Metadata

Metadata helps organize and filter your documents. Here's a comprehensive example:

{
  "metadata": {
    "department": "engineering",
    "docType": "api-spec",
    "version": "2.0",
    "status": "published",
    "platform": "mobile",
    "language": "en",
    "lastReviewer": "john.smith",
    "lastReviewedAt": "2024-01-15T10:00:00Z"
  }
}

Best practices for metadata:

  • Use consistent keys and naming conventions
  • Follow standard date formats (ISO 8601)
  • Include relevant identifiers for filtering
  • Keep values standardized for effective searching

Retrieving Documents

Fetch documents using various filters:

curl -X POST https://api.ragaas.dev/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_abc123",
    "filterConfig": {
      "documentIds": ["doc_123", "doc_456"],
      "documentExternalIds": ["contract-2024-01"],
      "documentTypes": ["PDF", "DOCX"],
      "metadata": {
        "department": "engineering",
        "status": "published"
      }
    }
  }'

Managing Documents

Updating Documents

Update metadata for multiple documents using filters:

curl -X PATCH https://api.ragaas.dev/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_abc123",
    "filterConfig": {
      "documentTypes": ["PDF"],
      "metadata": {
        "department": "legal",
        "status": "pending"
      }
    },
    "data": {
      "metadata": {
        "status": "reviewed",
        "reviewedBy": "john.doe",
        "reviewedAt": "2024-01-15T10:00:00Z"
      }
    }
  }'

Deleting Documents

Delete multiple documents using filters:

curl -X DELETE https://api.ragaas.dev/v1/documents \
  -H "Authorization: Bearer $RAGAAS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespaceId": "ns_abc123",
    "filterConfig": {
      "metadata": {
        "status": "archived",
      }
    }
  }'

Best Practices

  1. Organization

    • Use separate namespaces for different projects/environments
    • Define a consistent metadata schema
  2. Performance

    • Use specific filters to reduce result sets
    • Batch updates when possible