Python program for Detecting abnormal Data in a PV plant Database - Validation Results from a 273kW NIST PV plant dataset

0 ratings - Please login to submit your rating.


This folder contains two csv files and one .py file. One csv file contains NIST ground PV plant data imported from This csv file has 902 days raw data consisting PV plant POA irradiance, ambient temperature, Inverter DC current, DC voltage, AC current and AC voltage. Second csv file contains user created data. The Python file imports two csv files. The Python program executes four proposed corrupt data detection methods to detect corrupt data in NIST ground PV plant data. First and fourth methods are statistical approaches performing a direct comparison of the parameters. These two statistical methods can be applied from the first day of installing a PV plant. Second and third methods are machine learning based approaches involving training and testing procedures. These two machine learning approaches need some days of historical data prior to applying them. This program is useful to PV plant users, researchers, PV plant monitors, third party service providers to clean their PV plant datasets. By replacing the existing dataset set with their own dataset, one can use the program for filtering their data. This program requires the PV data set to have six parameters: POA irradiance, ambient temperature, Inverter DC current, DC voltage, AC current and AC voltage.


Instructions to use the attached program and dataset

1. Download the attached zip file.

2. Extract the zip file contents in to one folder.

3. Make sure Python 3.6 or latest verion software is installed on your computer where you imported the folder

4. Run the Python file in above folder and it may take 90 min. to complete.

5. Python file executes four methods and produces a number of PNG and PDF files. Generated files get saved into the same folder.

6. With this the purpose of program completes.

Instructions to use the program for your PV dataset

1.Makesure you have the required six parameters data as stated in the abstract. All  parameters should have same timestamp and high resolution say 1 min to 5 min. 

2. At starting of the Python file, type your project name and data resultion relevant to your dataset. Generated PNG and PDF files will have this name appended. 

2. Prior to executing your dataset, remove the files related to NIST dataset. Mainly, manual results file. Otherwise, you could see some erroneous results not necessarily related to your datasets.

3. Run the program and see the results.



I am PhD student from FCUL, Lisbon. I am doing reserach on forecasting energy production for PV. I am interested to see your resreach and data. Can you share your dataset with me? If so, please inbox me a download link at my email adress:


Submitted by Joao Simoes on Mon, 10/25/2021 - 12:01