EVA module

EVA application alows automatic or manual chart data extraction from PDF files.

Working principle

Currently implemented EVA algorithms are based on parsing of vector graphics from pdf. In contrast to raster graphics, where all objects are represented as pixel intensities, in vector graphics all objects are coded as geometrical objects:

In case of vector graphics we can parse information about different objects (e.g. points, lines, figures) as well as text elements directly from the PDF files using specific python-based packages and then classify these objects to categories (e.g. axes, labels, experimental points, errorbars) based on their properties.

In case of raster graphics we have to use neuronal networks to detect and classify objects on image. We also need to use optical character recognition (OCR) tools to extract text from the image.

Currently introduced features:

PDF files support
Automatic extraction of points/errorbars from scatterplots
Possibility of manual correction of the result
Possibility of manual digitization for raster images

Will be implemented/fixed

Rotated images
Raster images
Other plot types (boxplots/barplots, survival curves)