Do you like your data curated?
The key to understanding the distinction between lakes and warehouses for data storage is the notion of curated data.
Data lakes are typically not modelled (just files of data from disparate sources brought together in a file system), so data can be pulled together at the last moment to answer the question of the day. This is called ‘’late bind’’. It needs different technologies to work well together and can be difficult to guarantee good performance. However, it can be done with the right effort and investment.
By contrast, in warehouses the data is curated.
Rob Kellaway, Head of Data Design, explores which is better.Read more
In fact, it's not possible to say whether lakes or warehouses are better. This is more a question of which is the most cost effective solution and how sure you are that you know the questions you need to ask. These considerations are the ones which will ultimately dictate the choice.
They are two specific methods of storing data for reporting purposes. The data lake is better suited to the needs of data scientists working on use cases in a lab environment. The multiple data sources and the ability to quickly prototype analysis is a natural fit for data scientists to gain insight by bringing them all together to make sense of what is being detailed.
On the other hand, a data lake is not effective for running operational reporting, due to the uncurated data. Data warehouses are usually populated with structured, good quality data, which has been verified at source.
Hybrid systems are being developed, which enable lakes and warehouses to work in tandem via the use of Cloud technology. Rather than always productionise in the data warehouse, we will see an eco-system of technologies emerge, and we will pick the right platform, for the right reason, at the right price point, and ensure consistent governance over our data.
Data engineers will need to continue to work with data scientists to ensure that hybrid systems work well together.
At Nationwide, we are looking for data engineers and data scientists to take us further on a journey to provide legendary service and to continue to serve our members in the best way possible.
If you’re looking for a new challenge in your career, check out our opportunities above and 'start a conversation' about joining Nationwide’s Data & Analytics Community.