Skip to main content
A tablet device shows data modelling onscreen, with a hand shown exploring the data by touching the screen

Using data in engineering

Sometimes considered dry and boring, data is an important asset that can open up significant opportunities when used well. In engineering, it is being used in multiple ways to accelerate the transition to renewable energy and reduce unnecessary use of resources.

Did you know?

  • Data science is an exciting discipline that’s becoming increasingly important across many industries, including engineering
  • Since 2014, the government has invested more than £2.3 billion in data and AI
  • However, there is a skills gap with as many as 234,000 roles in data science that need to be filled in the UK

Using data to gain insights has become a major priority across all industries in recent times, with the ability to capture and process data opening up many opportunities. In engineering, data is transforming whole industries, through use of realtime analytics and machine learning for predictive maintenance for example, as well as allowing engineers to efficiently manage large-scale physical assets by generating virtual representations as digital twins (Creating a virtual replica, Ingenia 87). Data is fast becoming the world’s most valuable resource.

The untapped potential of data-driven innovation to boost productivity, create new products and services, and create new businesses and jobs has led to major investment from both the UK government and the private sector. The government has invested more than £2.3 billion since 2014 and has a 10-year vision to transform the UK’s data capabilities via the National Data Strategy and National Artificial Intelligence (AI) Strategy. Venture capital investment in data-driven AI companies is also skyrocketing. In 2021 Tech Nation reported that UK AI companies received $3.3 billion between January and August that year, already surpassing the $3 billion invested during the whole of 2020.

Open data in particular – which the Open Data Institute defines as data that anyone can access, use or share – is already used by hundreds of UK companies to offer new insights into everything from travel to recruitment. Transport for London (TfL) and the Ordnance Survey are two of the most high-profile data providers, which has fuelled the creation of dozens of popular apps, particularly in relation to travel planning and geo-location.

Innovating using data is hitting the mainstream and making transformative changes across engineering disciplines, from aerospace and automotive to energy and construction.

Making use of data

Data science is an interdisciplinary area where (mainly) three fields intersect: mathematics and statistics, computer science, and business knowledge – although this is specific to where it’s being applied. The aim of using data science is to extract insights and uncover patterns from data that can be in unstructured formats – such as images and audio – or structured formats – such as tables with rows and columns. Often, these disparate formats also contain meaningless data items, known as noisy data.

A sustainable fleet

A data model helped ENGIE electrify its maintenance fleet

ENGIE is a leading energy and services company focused on producing and supplying low-carbon energy, and performing energy services and facility maintenance. In the UK, ENGIE has a fleet of 2,000 vehicles for its maintenance and service engineers, many of which are fuelled by diesel. Before transitioning to using electric vehicles, ENGIE wanted to know how many charging points it needed and where, with the ultimate aim of installing charging points so that engineers’ vehicles can be charged while they work.

ENGIE worked with the National Innovation Centre for Data (NICD) to develop its new ‘Go Electric’ data model, which aims to ensure that transitioning the company’s fleet to electric vehicles causes minimal disruption to its activities. The team designed a model to identify the best locations for electric vehicle charging points while measuring the impact on productivity. To do this, they used telematics data from 54 ENGIE diesel vans over a six-month period to understand the engineers’ journeys. The data described routes between 300 buildings in the Wakefield area, to understand where engineers stopped and for how long. After initial data cleaning and exploration, the team used mixed-data programming models, optimisation, and event-driven simulation to identify key charging locations.

Go Electric has now been used across Wakefield, Manchester and Rotherham. ENGIE is also using the model to support its own clients to achieve the same fleet-transition objective.

Side of an electric vehicle showing the charger plugged in

As data science evolves, so does the data science process workflow that’s usually followed to complete a project. This workflow is defined by four main phases, which are usually interconnected:

  • business understanding, which involves scoping and planning the project. It’s imperative to address the ‘right’ question before diving into the relevant data.
  • data preparation and understanding, which focuses on importing and aggregating data from various data sources, cleaning it by correcting incorrect or inaccurate data, and then wrangling the data by mapping it to more valuable formats for the task at hand.
  • modelling, which involves building and evaluating a model to define assumptions about the data-generating process, to find patterns or to make predictions about unseen data.
  • deployment, in which the model is integrated into a software environment suitable for users and is regularly checked and updated.

Data science is multidimensional and involves many related tasks. Consequently, it has led to many job specialisations, including data analysts, data engineers and machine-learning engineers, each focusing on specific parts of the process, with many overlapping skills and blurred boundaries. There are challenges associated with all these roles and their associated tasks that are common to many scientific disciplines including engineering.

Data challenges

One of the biggest barriers to successful adoption of data science is insufficient investment in data engineering. This focuses on building stable and optimised data systems and development and deployment ecosystems, to support the data science process. There are, for example, increasing demands for data storage and management skills. This is for several reasons, including the large volumes of data being collected and analysed; growing requirements around high computational efficiency, and the need to run engineering simulations that can explore behaviours under variable conditions.

Another reason is to understand emerging computing paradigms, including edge computing, in which system architectures will be decentralised, moving computing and storage closer to the data source to deal with issues such as unpredictable networks, latency and bandwidth limitations. These are all data engineering skills and taking advantage of these methods and tools can help build robust systems, to handle high-volume and high-velocity data flows that can be scaled and adapted.

View looking up at the Tokyo Skytree, the world's tallest broadcast tower, a structure that is made of steel

Civil engineers use descriptive analytics to analyse factors that affect the stability and rigidity of buildings to earthquakes © Unsplash

Making batteries last longer

Developing data models to make sustainable energy storage more reliable

Connected Energy (CE) is an engineering-led innovator in energy storage, with its systems in use across the UK and Europe. It specialises in technology systems for grid decarbonisation by repurposing second-life electric vehicle (EV) batteries. This will help the transition to more sustainable, but intermittent, power generation methods such as wind or solar by storing excess generated energy and discharging the batteries when solar or wind generation is low. Importantly, by redeploying partially degraded EV batteries in stationary storage, CE almost doubles the usable lives of the batteries before recycling.

The project with NICD initially focused on developing models to better understand how battery health degrades over time. Initial time–series forecasting exposed a need for more accurate data, by performing several physical experiments and creating a cloud-based automatic data-cleaning pipeline to process and store this data for future reporting and investigation. This allowed CE and its external customers to access periodic descriptive reports, via PDF or dashboard.

As a result, CE’s reporting and monitoring process improved greatly, ensuring that batteries are discharged equally, and outputs are highly accurate. Improved data accuracy has increased confidence in offering warranties on second-life battery systems, improving the attractiveness of the overall offer.

A battery bank in a shipping container shows rows of batteries for electric vehicles

A stacked battery bank for electric vehicles © Connected Energy

Data analytics is the science of analysing data to extract valuable insights, and it is becoming a core skill set for engineering. Now, tools developed to handle large and complex datasets have enabled large-scale, real-time analytics. Four broad categories – descriptive, diagnostic, predictive, and prescriptive analytics – can be used depending on the requirements. Descriptive analytics focuses on using historical data to identify trends and relationships. Civil engineering, for example, uses descriptive analytics to identify relationships between parameters that interact and affect the stability and earthquake rigidity of buildings. Similarly, mechanical engineering uses descriptive analytics to identify trends in equipment performance and spot faulty machinery, to reduce the time taken for failure recovery and increase productivity.

Predictive analytics is used to make predictions about future outcomes using data-driven models. Here, classical statistical methodologies, machine learning and deep learning techniques are all used for predictive modelling, depending on the task, type and volume of data available. In practice, predictive analytics can be used in various fields of engineering. In transport planning, for example, engineers use time–series analysis for traffic forecasting to design transport management solutions. Electric vehicle (EV) engineers can use models to predict the ability of EVs to complete specific journeys for given battery levels and collaborate with transport engineers to determine the optimal locations of charging points (as in ‘Electrifying a maintenance fleet’). Chemical engineers use predictive analytics to calculate chemical properties of mixtures and materials. A model trained on spectrometry data for mixtures, for example, could predict the chemical components of an unseen mixture to ensure that it meets the specified requirements, before being further used in production. This is a major asset to improve productivity and support quality assurance.

Optimising hull protection

Using machine learning to assess a ship's condition

AkzoNobel is a global expert in paint and coatings, and owns familiar brands such as Dulux. Its products are used to decorate homes and businesses, protect infrastructure such as pipelines and turbines, and to coat aircraft, vehicles and marine vessels.

Working with NICD, AkzoNobel focused on the performance of anti-corrosive coatings, which help to protect its customers’ boat hulls from everything from saltwater damage to barnacle adhesion. Although they can last for up to 30 years, ships need to check back into dry dock every five years for inspection. Vessel operators want to keep time in dry dock to a minimum, to save time, money and fuel. AkzoNobel was looking for a way to allow customers to predict how their hulls were holding up, without having to wait for a docking or expensive dive inspection.

The project developed a new approach to handle data capture. This included data cleaning and identification of the most informative characteristics in the data (feature extraction), across AkzoNobel’s internal coating and vessel inspection data, along with external data describing vessel movements and global water conditions. The datasets used were too large to be processed locally so the project performed batch processing in the cloud for this task. After evaluating several classification algorithms, the result was a random forest classifier (a commonly used machine-learning algorithm) that can accurately predict the level of corrosion present on a given vessel. AkzoNobel was then able to develop a minimum viable product that could offer real value to its customers.

A ship's hull from below that is painted black at the top and red at the bottom. There are chains at each side of the hull

© AkzoNobel

Finally, diagnostic analytics is used to identify causal relationships by comparing trends and investigating the reasons behind them, such as a drop in performance, whereas prescriptive analytics uses data to determine an optimal course of action. Electrical engineers use the prescriptive analytics to design an optimal power grid that avoids failures and outages in a building by introducing interventions to prevent costly maintenance activities and downtime.

However, while harnessing the data analytics skill sets is vastly improving the efficiency and accuracy of solutions and driving progress, finding people with these highly technical skills is becoming a major problem for most industries, including engineering (see ‘Building data skills’). This is a problem that will need to be solved, if data science is to be used to its full potential to unlock innovation across sectors in the UK.

Building data skills

Data science is a relatively new discipline, and the skills required to address more complex problems are highly specialised and wide-ranging. The 2021 report Quantifying the UK Data Skills Gap found that there were as many as 234,000 data roles still to be filled.

 As demand for data skills continues to accelerate, engineering will need to remain creative about ways to recruit, retain and retrain its existing workforce. A change in engineering education can also ensure that all engineering graduates are equipped with the data science skills they will need to drive innovation over the coming decades and help to install a data-driven culture across engineering businesses.

 Apart from the technical and coding skills that a data scientist should have, the ability to work collaboratively and communicate their findings in an audience-specific way are equally important. Typical challenges for data scientists include combining the appropriate scientific tools, the responsibility of aggregating various data sources, preparing the data and asking the right questions.

The back of a woman with long blonde hair in a ponytail wearing a read dress. She is pointing at data across several screens and someone unseen from the side also points at the screen

© bp

There are numerous paths that someone can take to become a data scientist. In recent years many universities have added bachelor’s degrees in data science to their curriculum. Similarly, more and more universities offer relevant master’s courses that usually accept candidates with undergraduate degrees in IT, maths, computer science, engineering, physics, or related fields. For people who want to acquire a more focused background, a PhD can provide them with not only in-depth technical knowledge but also transferable skills to apply edgy techniques in industry and of course in research.

But hiring new talent is very competitive and demand is vastly outstripping supply, so upskilling existing employees is a viable alternative. Training presents a perfectly good way to get upskilled, providing the employer accepts the constraints. It can take months to learn the ideas behind one data science skill and years to become proficient.

Alternatively, consultancies can offer employers the comfort of a team approach and a broader set of data expertise. Outsourcing data skills can enable projects to progress immediately, but it will need a healthy budget.

There are also some pragmatic downsides for the business when knowledge is outsourced. When the consultants hand over the deliverables and leave the premises, will the business really understand what is happening under the hood, as insights are served up from a shiny new analytics dashboard? Will it realise when the dashboard has become inaccurate because the data on which the model was trained is now unrepresentative? It is therefore imperative that the data scientists left in the business ensure that all the relevant knowledge is transferred from the consultancy team.

The talent supply gap and the problems that industry faces to upskill employees, has led to new ideas around ways to engage existing talent on data-driven problems.

One model, run by NICD, supports an organisation’s existing workforce to better understand its own data. NICD’s team of data scientists brings skills covering data wrangling, analytics, scalable compute, machine learning, visualisation and more to address an industry problem. Upskilling is achieved through collaborative projects, where organisations use current business problems (resolved by using their own data) as the testbed to build relevant skills. These projects focus on innovations, creating efficiencies or exploring new products and services that can be integrated into their existing infrastructure to maximise impact. This engagement method also removes the need to take employees away from the ‘day job’ to undertake external training and allows the organisation to understand the relevant data skills required to address their own business needs.

The outcome of projects like these is to leave collaborating organisations with a solution that they understand, built from the ground up, which they can continue to develop using their own in-house talent and skills they have obtained on the journey.

Innovating using data is hitting the mainstream and making transformative changes across engineering disciplines, from aerospace and automotive to energy and construction.

This article has been adapted from "Embracing data in engineering", which originally appeared in the print edition of Ingenia 93 (December 2022).



Professor Hodgson has over 25 years of experience in both commercial and academic environments, working exclusively around software innovation and research. His ideas, generated from working in successful software spin outs and academic research, led to the creation of the NICD, where he focuses on addressing the data skills gap to improve overall UK productivity.

Dr Kontaratou obtained her PhD in computational statistics from Newcastle University. She helps organisations discover new data technologies and develop the skills they need to tackle their data science problems. She is currently focused on deep learning.

Keep up-to-date with Ingenia for free