Data Pre-Processing: A Crucial Element of Analytics – Driven Embedded Systems
The goal of the Internet of Things (IoT) is to acquire data from various embedded systems and impart analytical processes on that data to improve performance, efficiency, and business outcomes. The ability to create analytics that process massive amounts of business and engineering data is enabling designers in many industries to develop intelligent products and services. Designers can use analytics to describe and predict a system’s behavior, and further combine analytics with embedded control systems to automate actions and decisions.
Whether cloud-based or embedded, the first step in developing analytics is to access the wealth of available data to explore patterns and develop deeper insights. This abundance of data is explored through data pre-processing, a crucial, yet often understated step in the creation of analytics-driven embedded systems.
Datasets can be large in size, come from many different sources and represent many different attributes. Therefore, the software tools used for exploratory analysis and analytics development should be capable of accessing all the data sources and formats that comprise the dataset. As real world data tends to be incomplete, noisy and inconsistent, data preparation is a very important procedure for both data warehousing and data mining.
Data preparation includes data cleaning, data integration, data transformation and data reduction. Data cleaning steps can be used to fill in missing values, smooth the noisy data, identify outliers and correct data inconsistencies. Data integration combines data from multiple sources to form a coherent data store. Data transformation routines segregate data into appropriate forms for mining. Data reduction can be used to obtain a reduced representation of the data while minimizing the loss of information content.
Why Data Pre-processing?
When exploring this wealth of information data pre-processing cleans and prepares the data before predictive models are developed. Predictions from incorrect data can be difficult to debug, or worse, can lead to inaccurate or misleading results that impact system performance and reliability. The goal here is to find the most predictive features of the data and filter it so it will enhance the predictive power of the analytics model. Some common techniques include feature selection to reduce high-dimension data, feature extraction and transformation for dimensionality reduction, and domain analysis such as signal, image, and video processing.
The information gathered from data pre-processing is then taken and implemented across a number of analytics-driven embedded systems. An example of this is the innovation in using Big Data and analytics to make cars smarter. Automotive OEMs are collecting enormous amounts of data from real-world driving situations (think millions of miles of driving), recording data such as engine performance, video, radar, and other signals. This data is used to generate important metrics such as fuel economy and performance at the fleet level. Engineering teams are also using this real-world data to design, develop, and test new types of automotive systems, such as advanced driver assistance systems (ADAS).
In some implementations, analytics are performed in the cloud with the intent to improve the performance of existing embedded systems. For example, BuildingIQ is a leading provider of advanced energy management software and designs climate-control systems to reduce energy consumption in commercial buildings. The analytics incorporated into these systems include engineering data from power meters, thermometers, pressure sensors, and other HVAC sensors combined with business data from weather forecasts, real-time energy prices, and demand response data. The result is a cloud-based service that is able to adjust the building’s existing HVAC embedded systems and lower energy consumption by up to 25%.
In other cases, analytics are better run directly in an embedded system. For instance, a design team at Scania, the Swedish truck manufacturer, embeds analytics into their emergency braking systems to provide real-time crash avoidance to reduce accidents and meet stringent EU safety regulations. Engineering data from cameras and radar are processed in real time for object and road marking detection, which is subsequently fused to signal collision warning alerts and automatic brake requests.
As the Scania case shows, there’s a growing need to put more of the data pre-processing and data reduction on the sensor or embedded device itself in order to optimize speed and power. The accelerating Internet of Things trend towards smarter and more connected sensor networks is only adding to that pressure. This has the benefit of shrinking the amount of data that is transferred over the network, which reduces the cost of transmission and can lower the power consumption of wireless devices. For this reason, good system designers should conduct local preprocessing wherever they are able and only upload the useful information or predictive signal.
As the number of possible applications increase, we’ll be able to benefit from the impact of analytics-driven embedded systems across a wider variety of industries, where data pre-processing will continue to play an important step in how we shape our world and how we operate in it.
About the Author
Paul Pilotte has more than 20 years of experience in technical marketing and development in technical computing, security software, data communications, and test-equipment markets. He is currently a Technical Marketing Manager at MathWorks focusing on MATLAB toolboxes for statistics, optimization, symbolic math, and computational finance. He holds a Bachelor’s and Master’s degrees in electrical engineering from MIT and an MBA from Babson College.