InputManagement uses Artificial Intelligence and Machine Learning to classify documents. Currently two classifiers are supported: optical and textual.
To classify documents, make sure the classes exist and are correctly configured.
Training
Classification has to be trained. Training happens within the context of a pipeline. Pipelines can be configured in the Administration area. Each pipeline has multiple class definitions. A class definition is a reference to a class with a collection of example documents. These examples can be directly uploaded to the class definition. The better the examples of each class, the better the later classification will be.
A good training set has the following characteristics:
- no attachments (for example of Forderungsanmeldungen and Kontoauszügen)
- only the first 1-5 pages of longer documents
- no extra large documents
- no handwritten documents
- good scan quality (300 dpi)
- only PDFs
- 80 documents per class