Using IEEE DataPort to Gain Exposure

Detecting Corrupt or Abnormal Data in Grid-Tied Photovoltaic Plants

By: Dr. Matam Manjunath, Research conducted on behalf of the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy 

Dr. Matam Manjunath

The research discussed in this case study is part of a project titled, Levelized Cost Of Energy Reduction Through Proactive Operations Of Photovoltaic Systems that is sponsored by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Number DE-EE0008157. As part of this project, a 6.2 kW grid-tied photovoltaic (PV) plant was set up to export generated electricity onto the conventional power grid.

The day-to-day monitoring and performance assessment of this PV plant is conducted using data collected from the different electrical sensors connected across the PV plant. Measurement data is stored in the database to perform the daily, monthly, and annual historical performance assessment. However, while observing the recorded data from this PV plant, we noticed that data on certain days appeared to be abnormal or corrupt.

More specifically, corrupt data is when substandard measurements, or non-relevant values, instead of actual measurements get stored in the database. Sometimes communication failures between the data logger and sensors can cause the transmission of corrupt data into the database, but there are likely other causes as well. 

Additionally, the problem of corrupt data is not limited to a few PV plants, but it is affecting PV plants of all sizes and types – domestic and commercial. Currently, PV plant users follow the usual method of observing data in graphical plots to detect corrupt data. This type of approach may be useful for detecting days with severely corrupted data, but not all the days with corrupt data can be seen because the presently available data filtering methods are not strong enough. 

Corrupt or abnormal data is a significant problem in the monitoring of PV plants. Hence, there is a big need to detect and filter corrupt data hidden in the database. Thus, I set out to develop a programmatic method, not the usual method of using plots, to detect the days where abnormal or corrupt data is present. 

A New Method for Detecting Corrupt or Abnormal PV Plant Data

We developed four programmatic methods to detect abnormal data in the database of the PV plant. These methods are useful for filtering the following six parameters of a given PV plant viz.:

  • Plane of Array (POA) irradiance
  • Ambient temperature
  • PV array DC voltage
  • DC current
  • AC current
  • AC voltage. 

The proposed methods rely on the inherent relationships that exist between the parameters of a grid-tied PV plant such as the following:

  • A proportional relationship between the POA irradiance, PV array, DC current, and AC power, which is the product of AC voltage and current
  • An inverse relationship between the ambient temperature and PV array DC voltage

Each method compares a set of parameters and assigns a score to everyday in the range between -∞ to 1, with -∞ indicating highly corrupt data and 1 indicating highly matching data. This work categorizes data on a given day as useful if it scores 0.8 or higher. 

Among the four methods, two methods check the three parameters viz., POA irradiance, PV array DC current, and AC power for proportional relation, and one method checks two parameters viz., ambient temperature and DC voltage for inverse relation. The fourth method compares the POA irradiance to ambient temperature ratio with PV array DC current to voltage ratio. 

In terms of complexity, two of the methods use simple statistical tools and the remaining two use machine learning (ML) tools. The statistical methods can be applied from the first day of operation of the PV plant and are easy to implement. However, the ML methods need some historical data before applying them. These methods need to be trained on the days with no corrupt data.

The proposed methods are tested on open-source data collected from the 271kW ground PV plant of the National Institute of Standards and Technology located in Gaithersburg, Maryland, USA. The proposed methods programmed in Python language are made open source for PV communities to use. 

Benefits of Using the IEEE DataPort Platform

I used the 2 TB of free storage on the IEEE DataPort platform to upload my research dataset so I could have the chance to reach a greater audience. By uploading my dataset to IEEE DataPort, my research reached an audience of more than 700 in two months and I received direct feedback from some of the readers.

Dr. Matam Manjunath’s dataset was voted as the third place winner of the Spring 2019 IEEE DataPort data upload contest.