Skip to main content

Analysis

Forecasting Experts

Citation Author(s):
Anjanita Das (Cognizant Technology Solutions)
Syed Allaudeen Abdul Sathar (Cognizant Technology Solutions)
Manikantha Srinivas Chitturi (Cognizant Technology Solutions)
Mittal J Ashra (Cognizant Technology Solutions)
Priyanka Das (Cognizant Technology Solutions)
Srijit Ghatak (Cognizant Technology Solutions)
Submitted by:
ANJANITA DAS
Last updated:

Abstract

Approach: Imputing using Trend based approach:

 

The dataset provided has energy consumption values for 3248 meters at every half hour interval for the year 2017.

Our approach is as below:

1) We started with exploratory data analysis and found many missing values at half hour intervals for many meters. Considering consumption data is provided at every half hour interval, a complete, non-missing value dataset will have (48 readings/day)* (365 days) i.e. 17520 readings per year for every meter. Upon analysis, it is found that there are only 789 meters which have >75% of total readings and 2459 meters which have <75% of total readings

2) Hence we have computed the daily average consumption values for each of the meters by dividing the total monthly consumption by the corresponding number of monthly readings (non-null).

3) We segregated the master dataset into 2 datasets:

                a) Dataset 1: Meters with daily average consumption for at least 1 month >0.1 mwh (total count: 3242 meters)

                b) Dataset 2: Meters with daily average consumption for all months < 0.1mwh (total count: 6 meters)

4)  Computing missing values for

                a) Dataset 1:

                                i) The daily average consumption < 0.1 mwh have been replaced by null values

                                ii) For every meter, we found the number of months for which daily average consumption is available.

                                iii) For a given meter id (meter_id ‘a’), we are identifying the neighbor population by considering the non-null daily average consumption values. The neighbor population is finalized in such a way that they have the consumption values for all the non-null months for the meter id (meter_id ‘a’). In addition the neighbor population will also have the consumption values for a given month and the previous month for which we have to impute the values.

                                iv) We have taken a rule based approach to reduce the neighbor population to a meaningful set to cut the computational power/time. The iterative rules are given below in their order or execution until we find a minimum of 6 neighbors.

                --> [daily average consumption of meter_id 'a' * 0.3, daily average consumption of meter_id 'a' * 1.7]

                --> [daily average consumption of meter_id 'a' * 0.3, daily average consumption of meter_id 'a' * 2]

                --> [daily average consumption of meter_id 'a' * 0.3, daily average consumption of meter_id 'a' * 3]

                --> [daily average consumption of meter_id 'a' * 0.3, daily average consumption of meter_id 'a' * 4]

                --> [daily average consumption of meter_id 'a' * 0.3, daily average consumption of meter_id 'a' * 6]

                --> All the population of meters derived from step (iii)

                v) The Euclidean distance of each month’s daily average consumption value of a given meter (meter_id ‘a’) is computed with that of the neighbors from the above step to find out those neighbors that are highly frequent across the months. A minimum of 6 and a maximum of 20 neighbors are then chosen for the next step depending on the resultant neighbor population size.

vi) Among the resultant neighbors from the previous step, the daily average consumption values are examined for a given month to detect and remove the outlier values to get the reasonable estimate of missing values.

                vii) The average net trend increase among the neighbors are then calculated by comparing the previous month and given month values and the same has been used to derive the value for the missing month for the given meter.

                viii) The above seven steps are repeated for each meter and for each missing daily average consumption value at a monthly level until we get a complete data set for year 2017.

 

b) Dataset 2:

                i) The missing values of a given meter are imputed by the average of all the non-null monthly level daily average consumption values for the same meter.

 

5) Predicting total monthly consumption values for 2018:

                a) For predicting the consumption value of a given month in 2018 (d), the daily average consumption value of the same month in 2017 (a), along with its previous month (b) and next month (c) values are considered. The weighted average is calculated using the following formula.

                d = (((a*5)+(b*2)+(c*2))/9 ) * 1.02 * (no of days in that month)   [ Considering 2% year-on-year increase in consumption]