Forecasting Connoisseurs

Citation Author(s):: Anjanita Das (Cognizant Technology Solutions)

Syed Allaudeen Abdul Sathar (Cognizant Technology Solutions)

Manikantha Srinivas Chitturi (Cognizant Technology Solutions)

Mittal J Ashra (Cognizant Technology Solutions)

Priyanka Das (Cognizant Technology Solutions)

Srijit Ghatak (Cognizant Technology Solutions)
Submitted by:: ANJANITA DAS
Last updated:: Sun, 11/15/2020 - 21:23

Abstract

Approach: Imputing using Neighbors' mean:

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:8.0pt; mso-para-margin-left:0in; line-height:107%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri",sans-serif; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

To predict the monthly electricity consumption for 3248 households in a coming year (January to December) we followed “Imputing using Neighbors based approach”. This includes, exploratory data analysis to find the missing values of the meter, computation of daily average consumption of all the meters, segregation of the master dataset and computing the missing values in the dataset to predict the monthly consumption value in 2018.

Identifying the Missing Meter level Consumption Values: With exploratory data analysis, we found missing consumption values at half hour intervals for many meters. Considering consumption data is provided at every half hour interval, a complete, non-missing value dataset will have (48 readings/day)* (365 days) i.e. 17520 readings per year for every meter. Upon analysis, it was found that there are only 789 meters which have >75% of total readings and 2459 meters which have <75% of total readings.

Computation of daily average consumption of all the meters: As the next step, we computed the daily average consumption values at monthly level for each of the meters for by dividing the total monthly consumption by the corresponding number of monthly readings (non-null).

Segregation of meters based on the average consumption: We segregated the meters average consumption into three datasets:

1) Meters with daily average consumption at monthly level for at least 1 month >0.5 MWH (total count: 3117 meters)

2) Meters with daily average consumption at monthly level where there is a single instance of >200% increase/decrease month-on-month (total count: 117 meters)

3) Meters with daily average consumption for all months < 0.5 MWH (total count: 14 meters)

Computing the missing consumption values in the datasets for 2017:

Dataset 1: For the meters with daily average consumption at monthly level for at least 1 month >0.5 MWH (3117 meters)

Step 1: The daily average consumption of < 0.5 MWH have been analyzed for the meters.

Step 2: For every meter, we found the number of months for which daily average consumption is available.

Step 3: For a given meter id (a), we identified the neighbor population by considering the non-null daily average consumption values.

The neighbor population is finalized in such a way that they have the consumption values for all the non-null months for the meter id (a). In addition the neighbor population will also have the consumption values for a given month for which we have to impute the values.

Step 4: We have taken a rule based approach to reduce the neighbor population to a meaningful set to cut the computational power/time.

The iterative rules are given below in their order or execution until we find a minimum of 4 neighbors.

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 1.7]

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 2]

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 2.5]

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 3]

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 3.5]

--> [daily average consumption of meter_id 'a' * 0.7, daily average consumption of meter_id 'a' * 4]

--> All the population of meters derived from step (iii)

Step 5: The Euclidean distance of each month’s daily average consumption value of a given meter (a) is computed with that of the neighbors from the above step to find out those neighbors that are highly frequent across the months. A minimum of 5 and a maximum of 30 neighbors are then chosen for the next step depending on the resultant neighbor population size.

Step 6: Among the resultant neighbors from the previous step, the daily average consumption values of the nearest 5 neighbors are considered to get the reasonable estimate of missing values

Step 7: The missing value of a given month is imputed by scaling the daily average consumption values for the same month from the resultant nearest neighbor.

The above seven steps are repeated for each meter and for each missing daily average consumption value at a monthly level until we get a complete data set for year 2017.

Dataset2: For meters with daily average consumption at monthly level where there is a single instance of >200% increase/decrease month-on-month (total count: 117 meters), the missing values of a given meter are imputed by following all the steps detailed out for Dataset 1 and considering those 3117 meters in Dataset 1 to arrive at the nearest neighbors and scaling the daily average consumption values for the same month from the resultant nearest neighbor.

Dataset3: For meters with daily average consumption for all months < 0.5 MWH (total count: 14 meters), the missing values of a given meter are imputed by the average of all the non-null monthly level daily average consumption values for the same meter.

Forecasting consumption values for 2018:

Post this, meters with >150% increase/decrease month-on-month are identified

For these meters, the average consumption of last 3 months of 2017 and median consumption of all the months of 2017 are computed.

i) The average consumption of last 3 months of 2017 is greater than median consumption of all the months of 2017:

a. The consumption values of the months of all the meters is replaced by null if less than median consumption value

b. The null values are computed by the neighbor logic discussed above

ii) The average consumption of last 3 months of 2017 is less than median consumption of all the months of 2017:

a. The consumption values of the months of all the meters is replaced by null if greater than median consumption value

b. The null values are computed by the neighbor logic discussed above

iii) monthly consumption values of 2017 have been scaled by 0.94 considering a rapid increase in use of smart meters and energy efficient devices. This is based on our analysis of historical half-hourly energy readings for the 3248 smart meters.

Analysis

Forecasting Connoisseurs

Abstract

DATA FILES

ANALYSIS SCRIPTS

QUESTIONS?