Building Capacity – The Architecture of Data Science
Data Science is attracting attention. Across the world, the need to marry the nuances of statistics with the discipline of programming is recognised as key to improving customer service and keeping pace with changes in markets.
A survey in 2016 concluded that 80% of Data Science is preparing and cleansing data (the 80/20 rule). That survey of Data Scientists became perceived wisdom and a widely recognised problem statement for Data Science.
This is a key issue for any implementation of Data Science. How to provide an architecture that works to minimise this problem is at the heart of any decision made.
From a Data Scientist perspective, it’s not important what actual technology is employed. This is because the models and algorithms that are used are defined mathematically. Therefore, the trusted source of truth is the mathematical definition of the algorithm.
However, for non-functional requirements, this is not as straightforward. For example, the availability and cost of experts for a certain programming language and technology varies heavily. When it comes to maintenance, the chosen technology has a major impact on a project’s success.
Therefore, the challenge is to provide a platform with high interoperability at minimum cost. Choices over selecting a stack of technology by the same vendor as opposed to bringing in “best in class” products for specific functions have to be keenly evaluated.
The benefits of a stacked approach should outweigh those cons of employing separate systems that will need further work to make them integrated.
Another challenge of Data Science tooling is around how to productionise systems and processes where there is a high proportion of proof of concept (PoC) working or hackathons. When it comes to industrialisation and enterprise projects, architectural guidance on technology usage must be in place.
Data scientists are great innovators. They are usually able to rapidly progress to a solution without necessarily having non-functional requirements (NFRs) such as scalability and maintainability in mind. Therefore, there is a need for an architectural framework to underpin their work and ensure that NFRs are properly addressed.
Data science can work well with Agile methods of project management. Prototyping and rapid development means that the discipline can flourish within an Agile environment.
Sounds interesting? At Nationwide Building Society, we are invested in listening to our members and making changes to improve our service to them. This can be amazingly rewarding. Being able to save members time and money is key to what we do.
Data Scientists find insight and, at Nationwide, we are enriching our data enabled capability, with Data Scientists supporting the ways in which we can make our service to members even better and reduce our costs in the process.
Check out our jobs and see if helping our members with Data Science at Nationwide is for you.Read less