EVA module
EVA application alows automatic or manual chart data extraction from PDF files.
Working principle
Currently implemented EVA algorithms are based on parsing of vector graphics from pdf. In contrast to raster graphics, where all objects are represented as pixel intensities, in vector graphics all objects are coded as geometrical objects:
In case of vector graphics we can parse information about different objects (e.g. points, lines, figures) as well as text elements directly from the PDF files using specific python-based packages and then classify these objects to categories (e.g. axes, labels, experimental points, errorbars) based on their properties.
In case of raster graphics we have to use neuronal networks to detect and classify objects on image. We also need to use optical character recognition (OCR) tools to extract text from the image.
Currently introduced features:
- PDF files support
- Automatic extraction of points/errorbars from scatterplots
- Possibility of manual correction of the result
- Possibility of manual digitization for raster images
Will be implemented/fixed
- Rotated images
- Raster images
- Other plot types (boxplots/barplots, survival curves)