Classification

InputManagement uses Artificial Intelligence and Machine Learning to classify documents. Currently two classifiers are supported: optical and textual.

To classify documents, make sure the classes exist and are correctly configured.

Training

Classification has to be trained. Training happens within the context of a pipeline. Pipelines can be configured in the Administration area. Each pipeline has multiple class definitions. A class definition is a reference to a class with a collection of example documents. These examples can be directly uploaded to the class definition. The better the examples of each class, the better the later classification will be.

A good training set has the following characteristics:

  • no attachments (for example of Forderungsanmeldungen and Kontoauszügen)
  • only the first 1-5 pages of longer documents
  • no extra large documents
  • no handwritten documents
  • good scan quality (300 dpi)
  • only PDFs
  • 80 documents per class