blond, a building-level office environment dataset of typical electrical appliances
To achieve these goals, a deeper understanding of consumption patterns is needed to reduce the energy footprint: load distribution prediction, power decomposition, device identification, start-up event detection, etc.
Publicly available data sets are used to test, validate, and benchmark possible solutions to these problems.
To do this, we show the blonde data set: continuous energy measurement of typical office environments using normal devices and load profiles at high sampling rates.
We provide voltage and current readings for the aggregation circuit and full match
Real ground data marked (
Single device measurement).
The data set contains 53 devices (16 classes)in a 3-
Phase grid. BLOND-
Measurement sampling kSps with 213 days from October (aggregate)and 6. 4kSps (
Individual appliances). BLOND-
250 consists of the same settings: 50 days, 250 kSps (aggregate), 50kSps (
These are the longest continuous measurements at such a high sampling rate
The ground truth of what we know is labeled.
Electric energy measurement (EEM)
In recent years, due to the transformation from mechanical measurement technology to electronic measurement technology, a large influx of research activities.
Measuring device for measuring power consumption (EEC)
Billing consumers are subject to more scrutiny in terms of accuracy and reliability.
The migration to the full digital EEM is usually due to energy conservation commitments and higher comfort for occupants.
The EEC configuration file can be generated within a smaller time interval (e. g.
, Every day, every hour, by minute)
Because the smart meter allows automatic meter reading.
Recent studies on the psychological effects of EEM feedback have shown that energy conservation and active management of their own EEC requires frequent feedback for a long time, ideally with the use of equipment
However, this requires significant investment in metering hardware, infrastructure and reliable communication channels in order to collect data from small meter fleets. Non-
Load monitoring of invasive equipment (NIALM)
Trying to solve the problem with a single
EEM point, ideally utilizing existing smart meters, provides an overall classified viewbuilding EEC.
Researchers use public data sets to study the properties of devices and build models that represent load distribution and each deviceUse of electrical appliances.
This is beneficial for energy conservation and emission reduction, pattern recognition, energy demand forecasting, and similar research areas.
The existing data sets cover primarily the home and residential environment, as their occupants may save on costs.
Large household appliances (e. g.
, HVAC, washing machine, etc. )
Since families usually contain manageable quantities, they are targeted first to reduce the EEC immediately.
These devices are easier to detect than multiple smaller devices, so most data sets use a measurement interval of 1 sample per second (Sps)
1 minute or less.
In circuits with NIALM and device identification research issues, the use of a sample rate higher than 10kSps facilitates distinguishing the total number and type of devices.
With the sampling rate up to 1 MHz, the amount of information contained in the power signal increases steadily.
The higher sampling rate can capture subtle changes (
High frequency ripple)
This is useful for device identification.
Capture voltage and current waveforms allow energy decomposition algorithms like BOLT to extract patterns directly from raw measurement data.
As far as we know, only (
For reference, as most office staff are not aware of the energy costs they cause, the office building has a great potential to reduce the European Economic Community.
Modern office environment
Fixed device equipped with switch
Mode power supply (SMPS).
Information and Communication Technology (ICT)
Most devices, including computers, monitors, network devices, and battery chargers, use DC current (DC)
A power module is required.
Recently, field research and experiments have been carried out on buildings that provide DC power outlets, eliminating the need for SMPSs.
Recent studies have found that SMPSs have a significant impact on EEM accuracy, and when comparing smart meters with conventional meters, a deviation of up to 582% may occur.
This is mainly caused by non-magnetic interference. Linear and fast
Switching load causes distortion of current sensor reading.
A large part of the reported error is caused by ripple current in a frequency range of up to 150 kHz, and no data set currently covers ripple current.
The authors found a significant correlation between the type of sensor and its measurement accuracy.
Display positive deviation based on sensors (High reading), Hall effect-
Compared with the traditional mechanical and electrical instruments, the sensor based on the sensor mainly returns negative deviation.
To study typical office equipment, especially ICT equipment equipped with SMPSs, in the context of NIALM and EEM, we present BLOND: a building
Level office environment data set.
We offer long term
3-long-term continuous measurement of medium voltage and current waveform
Phase grid of typical office environment collected in Germany from October 2016 to May 2017.
The data set contains readings of the aggregation circuit (smart meter)
And perfectly matched
The high sampling rate of a single device marks the ground real waveform of the voltage and current.
There are a total of 53 device types and 74 device instances, divided into 16 categories, distributed on 111 recorded channels.
All signal tracking is accurately timed with a global sync clock.
The dataset consists of two measurement sequences with different sampling rates in the same environment. BLOND-
50 contains 213 day continuous readings for all 3 stages (aggregated)
50 kSps, real data on the ground (
Individual appliances)at 6. 4kSps. BLOND-
250 50 days with 250kSps (aggregate)and 50kSps (
The device consists of a data acquisition system for aggregation circuits and 15 units for recording a single device, with a maximum of 6 devices measurable per device.
We also provide a pre-calculated 1-
A second data summary to enable research on data with a lower sampling rate.
In order to create a new data set that focuses on ICT devices equipped with SMPSs, this has an advantage over the existing public data set and applies to NIALM-
Related fields, we define the following requirements and desired attributes: Extract High
From SMPS and other non-linear loads.
Existing data sets ()
Covering the range between 10 to 20 yuan per second (kSps)
, Only covers the lower area of the sampling frequency box described in (ref. ).
New research problems can be presented at higher sampling rates, which may lead to improved accuracy and new types of algorithms.
Provides more information than the lower sampling rate (e. g.
, 1 second average).
Therefore, it is beneficial to collect per-per
Device EEC with the sampling rate of the actual power waveform that can represent the voltage and current.
This is useful if it is not possible to easily extract the required information during the data collection process because the use case is not yet known, or different algorithms and filters may ignore the imported data.
This enables us to calibrate and optimize signal quality for a given task.
The result of the gap-
Less data capture for the entire circuit.
Previously recorded data sets contain large gaps that are not recorded or received at all for various reasons.
While technical systems always have a certain range of errors, integrity and integrity should be a high priority issue when it comes to high priority
Frequency energy data set.
Allows accurate matching between the aggregate and the real samples on the ground. The time-
Stamping accuracy is edge-
Effects of high sampling rate.
Since the collection of most data sets is made up of distributed sensors, maintaining an accurate world clock is critical to the overall timing accuracy.
Without proper sync, some sensors may drift over time and blur the aggregationto-ground-truth relation.
This blonde data set is collected in a typical office building in Germany, with the main users being academic institutions and their researchers.
The measured circuit is part of a single layer with 9 dedicated offices and 160 m office space, and the center is (non-electric)heating.
The average working day power density of the entire measurement cycle is 11. 7u2009W/m-
According to the category of 9 typical office buildings. 5 to 13. 5u2009W/m (ref. ).
Throughout the data collection process, the population working in the monitored office ranged from 15 to 20.
Check-in time is closely related to the office work schedule in Germany: most check-in time is between 9: 00 and 18: 00 from Monday to Friday.
There is almost no use of office space on weekends, so there is no electricity consumption.
Major public holidays such as Christmas and New Year also show that the flow of people in the building is very small, and the time spent by the occupants on personal vacations is also very small.
This includes business trips, sick leave and other trips. of-office’ days.
Such data were not collected due to privacy restrictions.
All the passengers were performing the lights.
Work in the duty office using PCs, monitors and other special environmental appliances.
Individuals working in this building spend most of their working hours at their desks, and there will be certain breaks for meetings or other activities outside the designated office.
Some occupants participate in academic work and teaching, giving lectures or attending meetings on a weekly basis.
The power system consists of a 50Hz power supply with 3 circuits with a nominal phase offset of 120 ° (typical 3-phase supply): , , and .
Each office room is connected to one or two circuits and the adjacent office is on the alternating circuit (see ).
The building is not equipped with electric space heaters or air conditioners.
Therefore, the data set contains only the user-
Operating equipment and basic load.
In order to keep the efforts of rewiring to a minimum, existing independent circuits for regular and emergency lighting are excluded from the measurement range and only the user-
Part of the measurement is barrier-free wall sockets.
The office is divided into two departments, each with an independent 3-
Split phase switch, 6 channels.
Since the goal of this data set is to collect aggregated trunk EEC, each phase is combined for measurement purposes, which allows us to use 3-
Phase energy data acquisition system.
The power supply EEM is carried out on the power distribution board with a clear unit designed to meet the requirements of blonde hair.
Level energy equipment radar is a specialized data acquisition system capable of measuring voltage and current waveforms at high sampling rates and bitsrate for a 3-
The power required to operate the sensor and the transparent system itself comes from different circuits, not part of the measuring device.
Three halls with transparent system
Effect-based current sensor installed in cabinet ()
, And measuring boxes in adjacent rooms containing all electronic and processing units.
Connect the electrical cabinet and sensor to the measuring box via 2 CAT
Provides 6 cables for shielding signal transmission and power supply.
The voltage signal is pulled directly from the input dry line.
Sample Digital converter for all six channels (
Phase 3: voltage, current)
In the same time as 250kSps (ref. ).
16-for each signal channel-
Bit accuracy and bipolar value range, allowing direct mapping of the AC power supply waveform into the digital data stream.
The ADC is controlled by an FPGA, triggering a single-
Capture and read the data into memory for buffering.
The resulting packets are forwarded to the USB interface chip for direct communication with a single deviceboard PC. The Linux-based single-
The PC board receives the data and stores it in a file, and then sends the data over the network to the data center for storage.
Each circuit in each room is protected by a 16 a circuit breaker;
Each main phase is protected by a 25 a circuit breaker.
Preliminary checks show that during the course of the day, each stage of the entire EEC is typically less than 16 A.
With the current sensor, we can use 3 primary switches to improve the effective signal bandwidth without exceeding the primary nominal current of each sensor 50a.
Sensor to pre-
Calibration and calibration factors (linear mapping)
It is calculated according to the data table.
The voltage signal is by AC-
AC transformer, it only depends on the on-
Circuit voltage and minimum load during measurement.
The calibration factor of the voltage ADC signal is through the high-
Map it into an ADC signal.
The single equipment EEM is carried out by a fleet of 15 medal units as a summary of ground real data measured by the trunk line.
MEDAL, a mobile energy data collection laboratory, is a non-the-shelf 6-
Port Power Strip ()
Voltage and current sensing infrastructure is added to the compact and portable housing. A single-
Board PC is used to collect EEC data from sensing hardware and run the same software package as CLEAR.
Therefore, the medal system and CLEAR\'s fleet perform the same during setup and operation.
Each medal unit can measure up to 6 user devices with label sockets at the same time: 1 to 6.
All power outlets in the office are directly connected to the medal system for basic load equipment or are not available to prevent the use of unmonitored equipment.
All monitored energy consumption is included in clear measurements and there is only one medal data stream.
MEDAL uses the same voltage sensing circuit and calibration as CLEAR.
All sockets generate independent current signals with Halleffect-
Family-based IC with a range of 5/20/30 a per socket.
Due to the expected ICT equipment of the SMPSs, we chose to configure each medal unit with a high configurationpower socket (
Up to 3600 W on Socket 1), and 5 low-power sockets (
Socket 2 to 6, up to 815 W).
The maximum safe wattage is correctly marked on the housing next to the socket.
Just in case-
When the device exceeds this limit, the signal is limited to the maximum value while still operating safely.
The EEC of the medal system is less than 5w and is not measured in ground real data.
The most common adc that provides simultaneous sampling of all channels may be expensive and not suitable for large-
Data collection system of scale.
So the medal used seven separate orders.
Channel adc: With 12-
Bit resolution and up to 50kSps (ref. ).
Accurate timing and simultaneous sampling are achieved by using a micro-controller as a command and control IC.
Mobile and large office environment-
A different population can be forever.
Change the settings for collecting energy data.
A list of the observed devices and their grouped into classes is provided in.
These devices are mostly compact and portable, meaning they can move around, plug in different sockets, or simply appear and disappear every day.
In order to prevent incorrect marking of the ground truth of the device, the mapping between the medal socket and the actual inserted device is recorded in the device log: a spreadsheet containing a timestamp, class name, device name, nominal power consumption and socket number.
Full logs for each medal are provided in JSON format-
Based on the file format, as a spreadsheet file for easy printing and visual inspection.
Although the device log is mainly based on
Reports and regular inspections by trained professionals cannot avoid certain errors.
The management of this data is done to the best of our ability with the proper skills, care and diligence.
Check every month to update the device log.
Instruct the occupant to issue a change notification in order to enter an update in the device log. An in-
In order to further improve the quality of the data, a retrospective depth assessment was conducted for The Daily EEC.
If it is found that it does not match the actual metering data, additional entries will be added to the device log.
This applies only to the case where the mismatch is determined by using data of adjacent days or by asking the occupants responsible for the medal system.
By checking the daily EEC of the relevant medal system, manually verifying the socket marked as empty in the device log.
If a mismatch is detected, the log is updated accordingly.
The entries in the log dedicate a socket to a specific device.
This does not include information about opening or inserting, but only as a booking.
Long term goal of blonde
Continuous measurement requires a certain fault tolerance at the transmission layer;
Wireless or mesh rendering
A network that is not suitable for this task.
The building is equipped with spare ethernet connections in each room as a reliable transmission network for forwarding all data to a centralized storage system.
Ethernet, IPv4, TCP, and SSH all provide mechanisms to ensure data integrity and automatically detect and re-transmit failed data with a very high probability. BLOND-
50 hired a pull-
Policy in which a single central server periodically extracts new data files from each measurement unit and moves them to a distributed storage system.
CLEAR and MEDAL converts raw data to an HDF5 file and can buffer data for hours or days if no one collects new data.
The central server only needs to move data between systems and buffer the data for up to 24 hours when the storage system is not available.
The architecture separates the phases to allow downtime and scheduled maintenance.
Carefully select the buffer size and temporary storage device to maximize the allowed time before data loss occurs. BLOND-
The 250 uses a significantly higher sampling rate, which makes the pull
Policies are not available due to memory and compute performance limitations.
Strategy for each measurement system to send the original data file directly (chunked)
To the data center.
The server then converts the file and moves it to the storage system.
Due to the higher sampling rate and file size, the available buffer time for each stage is also reduced.
CLEAR and MEDAL are built with the same software stack, which allows us to reuse most of the collection software and buffer policies.
Each measurement system is able to buffer raw data of multiple gigabytes to a local storage device (SD-
Card or USB flash memory)
In case of network failure or data center error.
This allows us to survive multiple days of data collection without any transmission capability.
After re-establishing the network connection, in order to prevent network congestion, all buffer files are transferred in batches at a limited rate.
Take more action to further improve fault tolerance by adopting \'Ram-
The first buffer keeps I/O access to a minimum and reduces the risk of memory wear (
Write endurance of NOR/NAND flash memory).
While the underlying hardware of CLEAR and MEDAL is generic-
Some low-end computing devices.
The horizontal measurement task requires real
The time function realized by carefully selecting the data structure, in-
Guaranteed wrong memory buffer size and I/O access mode
Free data collection.
All connected devices are connected to the same Ethernet and share the sync clock via NTP. Two stratum-
There are 3 time servers on the same layer2 Ethernet.
Internal system clock connected to dedicated real time
Clock chip with spare battery.
Running the daemon in the background and continuously synchronizing the system clock;
Clear use and medal use.
We have implemented most of the data collection, technical verification, data processing, and utilities in Python 3.
A single source file is available under the MIT license in the BLOND repository.
Due to a large amount of data, it is most reasonable to process in distributed and parallelized methods.
We provide examples of usage that can be extended and run in a distributed computing environment.
The software used in the blonde data collection process provides software for converting and collecting measurement data from a range of data acquisition units.
All steps in the technical verification section can be reproduced with the provided script. The 1-
The second data summary was created from the original measurement and can be completely recreated.