The pace and scope of change in technology can be daunting, but if we design based on sound principles, we have nothing to fear. Rob Kellaway, from our Data Solutions team, looks at the options.
Data engineers are largely tasked with building and maintaining data pipelines for analysts and data scientists to address use cases from the business. This is a relatively immature discipline and accepted design principles are scarce. It is worth exploring the areas that could potentially be used as a framework for design.Read more
Data engineering is not just Extract, Transform and Load (ETL). It is about problem-solving and designing for “immutable data”. This is data that does not change in itself, just changes at a point in time. For example, you may know that Boris Johnson is Prime Minister in January 2020, but that same Prime Minister was Theresa May in June 2019. The post of Prime Minister hasn’t changed, just the person who is currently in post.
Alongside immutable data, the design consideration of “idempotent” processing needs taking into account. Idempotent operations means that the same input will consistently produce the same output, with no side effects. This leads to a surity of processing that provides a stable platform for Big Data analysis.
The likelihood of run-time or configuration errors can be reduced by specifically undertaking unit testing for these scenarios. There is nothing less efficient than compromising the safety of your processing by allowing configuration or run-time errors to creep in.
Being able to effectively reduce the possibility of not being able to debug code for creating data pipelines, design your processes to reproduce the results with confidence. If you relate this to ETL, you should be able to run the same job again on the same data and obtain the exact same results as before.
Validation of Data
The quality of your engineering solution must not be compromised by poor quality data. Therefore, effective data management should be employed to gatekeep what is coming into your pipeline.
Tracing Data Lineage
Data lineage mapping is facilitated by immutable data, which allows a trace on where data has been sourced from, its target state and how it has been transformed in the pipeline.
Building Legendary Service
At Nationwide, we are investing significantly in people and technology to increase the value of our data. We recognise data design is a key tool for managing our data as an asset.
Sounds interesting? We’re committed to listening to our members and making changes to improve our service to them, using data to make enabled decisions. This can be amazingly rewarding.
Check out our job vacancies or start a conversation with us above and join Nationwide’s #data revolution.Read less