KISS - Data Scraping

Introduction

Electrical equivalent circuits are used in electrochemistry, electrical engineering, biology and medicine, and as such schematics representing these electrical equivalent circuits are widespread in scientific literature.

In our project, there has been great interest in creating the capabiities to use these equivalent circuits as a data source for training machine-learning algorithms.

As equivalent circuits are most often provided in image form, conversion into a machine readable circuit representation is required for this application. To this end we developed a system able to do this conversion in an automated fashion. While our system has a focus on the elements and circuits used in electrochemistry, it is generic and useful in the conversion of any circuit conforming to the requirements of our circuit encoding, see the eisgenerator model format documentation for more information.

Implementation Details

To meet the challenge described, methods and algorithms where devised to solve the three core problems impeding equivalent circuit dataset creation from scientific publications containing EIS data:

  • An algorithm that performs object detection on the pages of scientific publications to identify candidate equivalent circuits.
  • An algorithm capable of parsing images of equivalent circuit schematics into net-lists.
  • An algorithm capable of parsing net-list into the strings used by eisgenerator.

Model Detection

For circuit detection we use a system based around the yolov5 CNN architecture. You can get information on how to acquire the dataset used to train this stage here.

Circuit Parsing

As can be seen in the flowchart below, circuit parsing is performed by first detecting the elements of the circuit using another CNN. A pipeline leveraging Zhang-Suen thinning and Hough transform is used to detect lines in the image and these are then extensively filtered to only retain the lines serving as connections between circuit elements. Subsequently, the elements and lines are then tested for connections and sorted into a net-list. Up to this stage this system could be used for arbitrary circuits, including use for reverse engineering. Flowchart

String Generation

Using the net-list created by the previous stage, the string for eisgenerator is generated by repeatedly traversing the net-list in the fashion described in the figure below: Network collapse into string

Source and Data

All sources including network weights and, as far as possible, training data are available on GitHub
Cookies are used to store information like your shopping cart. We do not use any tracking cookies. You can decline the storing of cookies, which are then subsequently deleted once you exit the browser. In this case, you will loose the stored information, e.g. your cart.