PII Detection
The PII Detection Job Task is uses regex expressions to detect PII in any document or metadata passing through 3Sixty. The regex expressions are stored in the form of a .properties file.
PII detection requires Tesseract OCR 5.X
-
Windows Install: https://tesseract-ocr.github.io/tessdoc/Downloads.html
-
Mac Install: https://formulae.brew.sh/formula/tesseract (command line: brew install tesseract)
-
Ubuntu Install: https://ubuntuhandbook.org/index.php/2021/12/install-tesseract-ocr-5-ubuntu/
Once installed check the box to attach text as metadata and enter the field name for the extracted content. We recommend using "content" if it does not conflict with other metadata fields in your run.
Important: File size limit is 95MB
Note: PII FLAG
This task will always add the boolean field hasPii for the purposes of mapping and analysis.
Configuration
To use this task go to the task tab in your job. Select the task from the drop down and click the plus circle to configure the task. Click done after making any changes to save.
Condition check
It will execute the task when the condition's result is 'true', 't', 'on', '1', or 'yes' (case-insensitive), or run on all conditions if left empty. This condition is evaluated for each document, determining whether the task should be executed based on the specified values.
Example: If I only want to run this task for PDF documents I would use the expression: equals('#{rd.mimetype}',"application/pdf")
Field To Mark
The output metadata property to store PII detected. The value of this field will be a map,
{
"PhoneNumber": 20,
"Names": 200
}
Break up PII data into individuals fields with a prefix
Instead of adding the PII as a map, 3Sixty will break it up as individual fields with a prefix for easier mapping/processing.
The fields in the example below can be mapped as pii.phonenumber and pii.names.
Prefix for PII fields
If breaking up PII data, the prefix to use for each field. If left blank 'pii' will be used.
Fields To Check
Source properties and/or document to check for PII. Use ALL_PROPS to check all properties, BINARY to check the document (extracted via Tika) or individual property names.
Examples
API Keys
Processor: PIIDetectionTask
Key |
Display Name |
Type |
---|---|---|
use_condition | Check a condition before executing this task. | Boolean |
task_condition |
Condition |
String |
task_stop_proc |
Stop Processing |
Boolean |
pii_field_to_mark |
Field to Mark |
String |
breakup_list |
Break up PII data into individual fields with a prefix. (Ex. prefix_CreditCard) |
Boolean |
field_prefix |
Prefix for pii fields. |
String |
pii_what_to_check |
Fields to Check |
String |
attach_content |
Attach extracted text as metadata. |
Boolean |
content_field_name |
Field name for extracted text content |
String |
Related Links