Document Management
Learn how to manage and retrieve your documents effectively in RAGaaS.
Overview
RAGaaS provides a powerful document management system that allows you to:
- Track all ingested documents
- Filter documents by various criteria
- Update document metadata
- Organize content across namespaces
All document operations are namespace-scoped. Make sure you have the correct
namespaceId
before making any requests.
Document Structure
Each document in RAGaaS has:
- A unique identifier (
id
) - System-generated unique ID - An external identifier (
externalId
) - Generated by external system - A document type (e.g.,
TEXT
,PDF
,DOCX
) - Based on ingestion source - Custom metadata - Your organization-specific data
- Ingestion status - Current processing state
- Creation and update timestamps
Example document:
{
"id": "doc_abc123",
"externalId": "contract-2024-01",
"documentType": "PDF",
"ingestionStatus": "SUCCESS",
"metadata": {
"department": "legal",
"category": "contracts",
"version": "1.0",
"author": "jane.doe",
"lastUpdated": "2024-01-15T10:00:00Z"
},
"createdAt": "2024-01-15T10:00:00Z",
"updatedAt": "2024-01-15T10:00:00Z"
}
Using Metadata
Metadata helps organize and filter your documents. Here's a comprehensive example:
{
"metadata": {
"department": "engineering",
"docType": "api-spec",
"version": "2.0",
"status": "published",
"platform": "mobile",
"language": "en",
"lastReviewer": "john.smith",
"lastReviewedAt": "2024-01-15T10:00:00Z"
}
}
Best practices for metadata:
- Use consistent keys and naming conventions
- Follow standard date formats (ISO 8601)
- Include relevant identifiers for filtering
- Keep values standardized for effective searching
Retrieving Documents
Fetch documents using various filters:
curl -X POST https://api.ragaas.dev/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_abc123",
"filterConfig": {
"documentIds": ["doc_123", "doc_456"],
"documentExternalIds": ["contract-2024-01"],
"documentTypes": ["PDF", "DOCX"],
"metadata": {
"department": "engineering",
"status": "published"
}
}
}'
Filters are combined with AND logic. A document must match all specified criteria to be included in the results.
Managing Documents
Updating Documents
Update metadata for multiple documents using filters:
curl -X PATCH https://api.ragaas.dev/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_abc123",
"filterConfig": {
"documentTypes": ["PDF"],
"metadata": {
"department": "legal",
"status": "pending"
}
},
"data": {
"metadata": {
"status": "reviewed",
"reviewedBy": "john.doe",
"reviewedAt": "2024-01-15T10:00:00Z"
}
}
}'
Deleting Documents
Delete multiple documents using filters:
curl -X DELETE https://api.ragaas.dev/v1/documents \
-H "Authorization: Bearer $RAGAAS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespaceId": "ns_abc123",
"filterConfig": {
"metadata": {
"status": "archived",
}
}
}'
Document deletion is permanent. All associated vectors and embeddings are also removed. Consider using soft deletion via metadata if you need to preserve history.
Best Practices
-
Organization
- Use separate namespaces for different projects/environments
- Define a consistent metadata schema
-
Performance
- Use specific filters to reduce result sets
- Batch updates when possible