Text Cleanup

This task replaces commonly incompatible characters in a given field


Configuration

To use this task go to the task tab in your job. Select the task from the drop down and click the plus circle to configure the task. Click done after making any changes to save.

Condition check

It will execute the task when the condition's result is 'true', 't', 'on', '1', or 'yes' (case-insensitive), or run on all conditions if left empty. This condition is evaluated for each document, determining whether the task should be executed based on the specified values.

Example: If I only want to run this task for PDF documents I would use the expression: equals('#{rd.mimetype}',"application/pdf")

More on Conditions

Input Field

The field to perform a text cleanup on.

Output field

The field where the resulting cleaned up text will be saved.

Replace with closest Latin character

Normalizes characters to their standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms (https://www.unicode.org/reports/tr15/).

Replace filename incompatible characters

Replaces filename incompatible characters (/, \, *, >, “, :, |, <) with a given text.

Replace whitespace characters

Replaces whitespace characters with a given text.

Replace non-printable characters

Replaces non-printable characters with a given text.


API Keys

Processor: textCleanupTask

Key

Display Name

Type

use_condition Check a condition before executing this task. Boolean

task_condition

Condition

String

task_stop_proc

Stop Processing

Boolean

input_field

Input Field

String

output_field

Output field

String

replace_latin_closest

Replace with closest Latin character

Boolean

replace_filename_incompatible

Replace filename incompatible characters

Boolean

filename_incompatible_replacement_text

Replace filename incompatible characters with

String

replace_whitespace

Replace whitespace characters

Boolean

whitespace_replacement_text

Replace whitespace characters with

String

replace_non_printable

Replace non-printable characters

Boolean

non_printable_replacement_text

Replace non-printable characters with

String