KISS - Data Scraping
Introduction
Electrical equivalent circuits are used in electrochemistry, electrical engineering, biology and medicine, and as such schematics representing these electrical equivalent circuits are widespread in scientific literature.
In our project, there has been great interest in creating the capabiities to use these equivalent circuits as a data source for training machine-learning algorithms.
As equivalent circuits are most often provided in image form, conversion into a machine readable circuit representation is required for this application. To this end we developed a system able to do this conversion in an automated fashion. While our system has a focus on the elements and circuits used in electrochemistry, it is generic and useful in the conversion of any circuit conforming to the requirements of our circuit encoding, see the eisgenerator model format documentation for more information.
Implementation Details
To meet the challenge described, methods and algorithms where devised to solve the three core problems impeding equivalent circuit dataset creation from scientific publications containing EIS data:
- An algorithm that performs object detection on the pages of scientific publications to identify candidate equivalent circuits.
- An algorithm capable of parsing images of equivalent circuit schematics into net-lists.
- An algorithm capable of parsing net-list into the strings used by eisgenerator.
Model Detection
For circuit detection we use a system based around the yolov5 CNN architecture. You can get information on how to acquire the dataset used to train this stage here.