OpenSearch
OpenSearch is a search engine based on the Lucene library. It provides a distributed, multi-tenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. OpenSearch is developed in Java.
-
Version Support: 3Sixty currently only supports version> 3 of OpenSearch and does not support version 8
Authentication Connection
-
Name: Unique connection name
-
Server URL: Server URL with protocol, host and port example: http://127.0.0.1:9200/
-
Socket Timeout in milliseconds: How long to wait before requests fail
-
Authentication Type: Select the authentication method to connect to OpenSearch
-
Basic Authentication: Enter user Name and Password to access
-
AWS Signature V4: Specify how AWS credentials should be provided for signing requests
-
Basic Authentication
-
Username: Username for Authentication or blank when no auth needed.
-
Password: Password for Authentication or blank when no auth needed.
AWS Signature V4 Authentication
-
AWS Region: AWS Region where the OpenSearch domain is hosted (e.g., us-east-1, eu-west-1, ap-southeast-2)
-
AWS Service Name: AWS service identifier for request signing. Use 'es' for Amazon OpenSearch Service, 'aoss' for OpenSearch Serverless
-
AWS Credential Source: How AWS credentials should be provided for signing request
-
Default Credential Provider Chain Use this to automatically locate AWS credentials without hardcoding them. It simplifies deployment by checking environment variables, profile files, and IAM roles in order, enhancing security by avoiding secret exposure. This is selected by default.
The default chain checks providers in this sequence:
-
Java System Properties aws.accessKeyId , aws.secretAccessKey, aws.sessionToken
-
Environment Variables AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
-
Web Identity Token and IAM role ARN
-
Shared Credentials and Config File
-
Typically located at ~/.aws/credentials , ~/.aws/config
-
-
ECS Container Credentials
-
Instance Profile Credentials
-
-
Explicit Access Keys
Explicit AWS access keys (Access Key ID and Secret Access Key) should generally be avoided in favour of the Default Credential Provider Chain (which uses IAM roles, environment variables, or profile files). However, explicit keys make sense in specific scenarios where you need to override the automatic behaviour, such as in local development, cross-account access, or in systems where IAM roles cannot be used.
-
AWS Access Key ID: IAM Access Key ID for explicit credential authenticationAWS
-
Secret Access Key: IAM Secret Access Key for explicit credential authentication
-
-
Integration Connection
-
Connection Name: This is a unique name given to the connector instance upon creation
-
Description: A description of the connector to help identify it better
-
Authentication Connector: Your OpenSearch Auth Connector
Job Configuration
Note: ID ENCODING
3Sixty uses the source repository id of a document as a default value for the id in OpenSearch. These can sometimes contain illegal characters, especially if they are file paths, such as from a Filesystem or Amazon S3. As part of the indexing process, the value of this field will be encoded to ensure its validity. Currently, only slashes, spaces and apostrophes are encoded, but this will likely change to full encoding in the future to better support non-standard character sets.
Note: Note: OpenSearch does not support writing of multiple versions of a document and will only write the latest one picked up in a migration. All other versions will be ignored instead of being audited and will not be counted as Skipped
-
ID Attribute: The field that will be used to set the document id
-
Index Name: The name of the collection where the indexes will be created.
-
If the collection already exists and does not have the required mappings, 3Sixty will attempt to update the mappings
-
-
Include Unmapped Properties: All metadata on the document will be processed. This will alter the index mappings in ElasticSearch. Each field on the document will become a "keyword" type field.
-
Index Binaries: Index the binary content as a Base64 string if "Include Content" is On for the job.
-
Batch size: The number of documents to generate before sending a request.
-
For large number of documents, it is better to limit batch size to small (ideally less than 5) as OpenSearch might not be able to handle large number of documents at once.
-
-
Output Renditions as array to the renditionData field: If there are multiple renditions, they will be stored as a list of base64 encoded strings.
-
Term Vectors: Term vectors increase the size of an index but are required for highlighting and More Like This searches.
-
All text based default 3Sixty fields are included by default
-
Term vectors can only be applied to text fields.
-
Term vectors will be enabled for any custom text field added to mappings
-
-
Content Field: Field that holds extracted content. Documents without the field or with empty content will be processed normally
-
Max Content Size: Max batch size is Megabytes (MB). OpenSearch recommends 5-15MB. Documents with content exceeding this size will be skipped or truncated.
-
Exceeded Size Action: Action to take if document makes batch exceed max size. Options are Skip ('s') or Truncate ('t').
-
Truncation Length: Attempt to reduce extracted content size to this number of kilobytes (KB) if max content size exceeded. 1KB is approximately 1000 characters.
Note: If a document is truncated and still exceeds the maximum size. The full document will be sent with the next batch, assuming it fits without being truncated.
Content Search Connection
A Content View Connector defines the who, what and how of search. A better term may be "Data Set" because the data you search and find is based on the configuration of the Content View Connection. More info
Search Configuration
Legacy Fields: All other fields in this tab are legacy features used for the Solr Search Connection and will be removed in future releases.
-
Collection: The name of the collection to query against. OpenSearch refers to these and "Indexes", but for our purposes they are collections.
-
Sort Field/Order: Will contain the values in your field list. Allows you to choose which field to sort on and whether to sort ascending or descending.
-
Facet Fields: Facet fields are simply occurrence counts for the entered fields. Content type counting is the most common example. Facet fields are required for a number of sidebar widgets.
-
Field List: The field values to return in a result set. Similar to the SELECT Field1, Field2 clause in SQL.
-
Result Link: Used on the Discovery UI to determine what to do when a user clicks on the link to the document.
-
Facet Limit: Maximum number of facet values to return.
-
Highlight: Yes if you want contextual highlighting, No otherwise.
-
Highlighted Fields: Comma delimited list of fields for highlighting (i.e. content).
-
Highlight Field Length: The maximum number of characters to highlight.
-
External Links: Setup external links for the search results.
Search Security
Only one of these options may be selected at a time:
-
Filter: The authenticated user's group id is added to each search request. Used in tandem with the User group index task to only allow specified ids to search indexed content
-
Restrict: The restricted users or groups cannot use this connector. Views that use it will not be visible to them, and they will not be able to use it through the Search APIs
-
Default Query: (3.1.1+) This is field allows you to add a query, which will become a Wrapper Query, that will be added to all other search parameters made against this connection. For example, if you wish to only ever see content created between specific dates you would use the following:
{
"range": {
"simflofy_created": {
"gte": "2018-12-22T10:39:00",
"lte": "2021-12-26T10:39:00"
}
}
}
Note that the usual {"query":{}} wrapper is not present. Including it will cause an error on search.
-
elastic_q: This parameter, added by clicking Add Custom Parameter, allows a user to pass in a JSON formatted query to the elastic search server. When using this query method you must replace double quotes with single quote characters.
Here is an example query:
{
'bool': {
'must': [{
'match': {
'document_type': 'accounting'
}
},
{
'match': {
'account_type': ''
}
}]
}
}
Run your query (with proper double quotes) directly against your elastic index using a rest call to test before adding it to configuration.
Response Buffer Size:
Memory (in bytes) used to process responses from OpenSearch. This memory is allocated per search, so use caution when raising it. Default value is 150MB, minimum is 25MB, maximum is 250MB.
Indexing Content:
Tip: PREREQUISITES AND THE FEDERATION WIZARD
These steps can be performed automatically by using the Federation Wizard, but will still require job configuration. If you use the wizard, skip steps 1 and 2.
For indexing content you will need:
* A working Authentication Connection for your source system
* An Integration Connection for your source system
* A Content Service Connection for your source system
* A working Authentication Connection for OpenSearch
* An Integration Connection for OpenSearch
-
Create a job using your two connections
-
In the Details tab Set the source repository's content service connection directly below the job name.
-
In the Details tab make sure the start and end times are set to a wide enough range to capture all the data you wish to index
-
In the Tasks tab, select the Tika Extractor Task.
-
This task will extract the content from a file and set it as a field on the document for indexing
-
In the Mappings tab, select "Basic OpenSearch Mapping" from the Additional Mappings drop-down
-
If this is not present, simply add the field you set on the task in step 2 as a field mapping.
-
The default is content so the mapping would be content ----Field Mapping----> content
-
-
(optional) Add any additional mappings. The target fields will be created and mapped dynamically as part of the migration
-
-
In the Output Specification, select your id attribute (or leave it as the default) and pick what collection to index to.
-
(optional) If you wish to enable highlighting and your extracted content is not in the "content" field, place the name of your content field from your Tika task in Term Vector field.
-
(optional) If you wish to use the More Like This (MLT) to search on custom fields, add them to the Term Vector field.
Viewing Indexed Content
-
Create a Search Connection for OpenSearch if you have not already. Use the authentication connection you used for indexing
-
Using the configuration section above, pick the fields you wish to see and get counts for.
-
You can add the basic 3Sixty metadata by clicking Add All Default Fields
-
-
Under the Federation Menu > Content Views, Create A New Content View.
Content Service Connector
This section covers the specific configuration of the Content Service Connector. For a description of how to set up a content services connector generically see Content Service Connectors.
Supported Method
-
createFile
-
createFolder
-
deleteObjectByID
-
getFileContent
-
getObjectProperties
-
getTypes
-
listFolderItems
-
updateFile
-
updateProperties
Looking to integrate with OpenSearch? We can help.