Remote Agent - 3Sixty RAG
Overview
What This Application Does
3Sixty RAG (Retrieval-Augmented Generation) is an enterprise AI service that enables intelligent search and question-answering capabilities across organizational documents and content repositories. It acts as a bridge between your document storage systems and Large Language Models (LLMs), allowing users to ask natural language questions and receive accurate, contextually-aware answers with source citations.
The application works by indexing documents into a searchable database (OpenSearch), breaking them into smaller chunks, and generating semantic embeddings (vector representations) of the content. When a user asks a question, the system searches for the most relevant document chunks using both traditional keyword search (BM25) and semantic similarity search (vector embeddings), then uses an LLM to generate a comprehensive answer based on the retrieved context. This approach ensures that AI responses are grounded in actual organizational documents rather than relying solely on the LLM's training data.
Built specifically for enterprise environments, 3Sixty RAG integrates with the 3Sixty content management platform and supports optional connections to Nexus repositories and Confluence wikis, making it a centralized AI-powered search and knowledge extraction tool across multiple enterprise content sources.
Who This Is For
-
Enterprise Knowledge Workers: Quickly find information across large document repositories without manually searching through files
-
Compliance & Legal Teams: Get answers to regulatory questions with direct citations to relevant policy documents and guidelines
-
Technical Support Teams: Access product documentation and troubleshooting guides through conversational queries
-
Business Analysts: Extract insights from reports and business documents using natural language questions
-
IT Administrators: Deploy and manage intelligent search capabilities across organizational content systems
-
Integration Developers: Build custom applications that leverage RAG capabilities through a standard OpenAI-compatible API
Key Features
1. Intelligent Document Indexing
What it does: Processes and indexes documents from various sources into a searchable vector database, automatically chunking content and generating semantic embeddings.
Why it matters: Enables semantic search capabilities that understand the meaning of content, not just keywords. Different chunking strategies (semantic, character-based, tiktoken, etc.) optimize retrieval quality for different document types.
How to use it: Send documents via the /v1/index API endpoint with metadata (file name, source ID, connector ID), choose a chunking method, and the system handles the rest. Documents are automatically split, embedded, and stored in OpenSearch.
2. Hybrid Search Architecture
What it does: Combines traditional keyword-based search (BM25) with modern vector similarity search, using configurable weights to balance between the two approaches.
Why it matters: Keyword search excels at finding specific terms or phrases, while vector search understands semantic meaning and context. The ensemble approach provides better retrieval accuracy than either method alone. Users can tune the balance based on their specific use cases.
How to use it: Configure TEXT_RETRIEVER_K, VECTOR_RETRIEVER_K, RANKER_TEXT_WEIGHT, and RANKER_VECTOR_WEIGHT environment variables to control how many documents each method retrieves and how results are weighted in the final ranking.
3. Retrieval-Augmented Generation (RAG)
What it does: Enhances LLM responses by first retrieving relevant documents from the knowledge base, then feeding that context to the language model to generate accurate, source-backed answers.
Why it matters: Prevents AI hallucinations by grounding responses in actual organizational documents. Provides citations so users can verify information and access source materials. Keeps answers current with the latest indexed content rather than relying on outdated LLM training data.
How to use it: Send chat messages to /v1/chat/completions with a model parameter matching your index name. The system automatically extracts search queries from the conversation, retrieves relevant documents, and generates answers with source citations.
4. OpenAI-Compatible API
What it does: Exposes a REST API that mimics OpenAI's chat completion format, making it compatible with existing LLM applications and frameworks.
Why it matters: Developers can integrate RAG capabilities into existing applications with minimal code changes. Tools and libraries built for OpenAI's API work seamlessly with 3Sixty RAG. Provides a familiar interface for teams already using LLM services.
How to use it: Point your OpenAI client library at the 3Sixty RAG base URL and use standard endpoints like /v1/models, /v1/chat/completions, and /v1/index. The API documentation is available at /docs when the service is running.
5. Multi-Source Content Integration
What it does: Integrates with 3Sixty content services, Nexus repositories, and Confluence wikis to provide unified search across enterprise content platforms.
Why it matters: Organizations store knowledge in multiple systems. Instead of searching each platform separately, users get unified answers that draw from all available sources. Nexus and Confluence plugins enable real-time search without pre-indexing content.
How to use it: Enable plugins via environment variables (ENABLE_NEXUS, ENABLE_CONFLUENCE) and configure connection credentials. Once enabled, these sources appear as available "models" in the /v1/models endpoint and can be queried like any indexed content.
6. Document-Level Permissions & Security
What it does: Enforces access control by filtering search results based on user permissions stored in document metadata (user lists and group memberships).
Why it matters: Ensures users only receive answers based on documents they're authorized to access. Maintains data security and compliance requirements in multi-tenant or role-based environments.
How to use it: Include simflofyUsers and simflofyUserGroups fields in document metadata during indexing. When making chat requests, include the user field (and optionally userGroups) in the request body. The system automatically filters results to authorized documents.
7. Flexible Text Chunking Strategies
What it does: Offers seven different methods for splitting documents into searchable chunks: semantic, character-based, recursive character, tiktoken, HuggingFace, SpaCy, and NLTK.
Why it matters: Different document types benefit from different chunking approaches. Technical documentation might work better with semantic chunking, while structured reports might prefer token-based splitting. Optimal chunking improves retrieval quality.
How to use it: Specify the split parameter in index requests with your chosen method and configuration (chunk size, overlap, separators). Each method has specific parameters documented in the API schema.
8. Conversation Context Management
What it does: Tracks conversation history and token usage, automatically warning users when approaching context limits to maintain response quality.
Why it matters: LLMs have context window limits. Long conversations can degrade performance or exceed limits. By monitoring token usage, the system helps users maintain high-quality interactions and suggests starting new threads when needed.
How to use it: Configure RAG_CONVERSATION_LIMIT to set the threshold. When total tokens exceed this limit, users receive a message suggesting they start a new conversation. The system tracks usage across the entire message history.
Docker Installation Example
-
Create a new folder in the project root for your docker deployment, and within it, create an .env file based on sample.env
-
Set up a docker-compose.yml within your docker deployment folder, which runs the OI RAG image created in (1), an OpenAI API, Opensearch, and OpenSearch dashboard. This is an example .yml
services:
3sixty-rag:
image: 3sixty-rag
ports:
- "5000:8000"
env_file:
- ./3sixty-rag.env
depends_on:
- opensearch
- ollama
ollama:
image: ollama/ollama:latest
ports:
- "51434:11434"
volumes:
- "./data/ollama:/root/.ollama"
opensearch:
image: opensearchproject/opensearch:3.2.0
ports:
- "19200:9200"
- "19600:9600"
environment:
http.host: 0.0.0.0
network.host: 0.0.0.0
transport.host: 127.0.0.1
discovery.type: single-node
bootstrap.memory_lock: true # Disable JVM heap memory swapping
OPENSEARCH_JAVA_OPTS: "-Xms4g -Xmx4g"
DISABLE_SECURITY_PLUGIN: "true"
volumes:
- opensearch-data:/usr/share/opensearch/data:rw
opensearch-dashboard:
image: opensearchproject/opensearch-dashboards:3.2.0
ports:
- "15601:5601"
environment:
OPENSEARCH_HOSTS: '["http://opensearch:9200"]'
DISABLE_SECURITY_DASHBOARDS_PLUGIN: "true"
depends_on:
- opensearch
volumes:
opensearch-data:
-
Make sure that 3sixty-rag container's logs folder has been created, and corresponds to LOG_FILE_PATH in your .env file
-
Run docker build -t [3sixty-rag_image_name] . in command line from the project root folder
-
Make sure that you have pulled the embedding model which you want to use. Using the sample.env as an example, you can do this by running ollama pull llama3.1 from within the ollama container
You should now be ready to use this project.
Usage
-
Run docker compose up in your docker deployment folder to start your 3Sixty RAG
-
You can sign in to the OpenSearch dashboard with the OPENSEARCH_USER/OPENSEARCH_PASSWORD which you set in your .env and send requests in the Dev Tools.