One of the biggest projects under way in the Ocado Technology data team has been building our own solution for storing and processing extremely large sets of customer data in the cloud. An online-only supermarket, Ocado.com has a dedicated customer base of over 580,000 active users who visit our website several times a week and spend more than £100 per shop.
The nature of our business brings a level of complexity unlike anything else in the retail world; indeed, many have claimed that grocery is the holy grail of the retail market - and data analytics is playing a big role in customer retention.
Managing this sea of customer data is no easy task. Although we initially used Hadoop, the rapid growth of our customer base meant we had to find something that could scale quickly, and with minimal maintenance overhead, so we refocused on BigQuery and the Google Cloud Platform.
When we made the decision to adopt the cloud, we had several end goals in mind. Firstly, we wanted to improve the customer experience and streamline the feature design and testing of our Ocado.com webshop. Secondly, we realized that a cloud-first strategy would truly empower our business teams to have greater insight into our merchandising operations. Finally, we were looking to improve the responsiveness of the Ocado.com webshop by using some of the intrinsic advantages of the cloud (i.e. elastic performance and storage).
The initial work was carried out by a small cross-functional team that included a product owner, software engineers, business analysts, and business users. Even though the product owner was part of Ocado Technology, he had a horizontal line into the retail division to ensure that the solution remained cost effective and business viable. The business user was expected to adopt the report to make the transition from initial proof of concept to production easier. Overall, the entire team needed to be agile as the technology was new and therefore the final solution could change quickly along the way.
We timeboxed and ran the first instance of the project in Kanban to deliver a proof of concept as quickly as possible and with minimal cost, hoping to illustrate the value of the project as soon as possible - and then quickly implement it at a larger scale.
Switching from Hadoop to a managed service such as BigQuery part way through the project revealed a series of cost and performance improvements. No longer did we need to decide how many instances to bring up in the cluster, nor wait for it to instantiate, but we simply ran our queries and paid for the IO. We also didn’t have to perform maintenance on the servers or the Hadoop distribution. Google handled all of the uptime needs and backend patching. Most of all though we saw BigQuery out perform our Hadoop cluster by over 80 times on our largest dataset and for half the cost.
It would have been completely cost ineffective to scale our cluster to meet the BigQuery performance level. A further side benefit was that the data in BigQuery was accessible in various other GCP services (e.g. DataProc) and so we could leverage that power in multiple ways but only have to store a single copy of the data.
At the beginning, we accepted that the data came with lower quality assurance and zero SLAs. Once the proof of concept proved valuable, the pipelines were productionised. This is where we discovered lots of learnings as actually the data teams weren’t best placed to do productize the pipelines, especially with respect to quality and governance. We then spent a lot of time figuring out who could optimally handle this task and found that the data producing teams were best placed to do it.
The next phase was to involve the data producing teams (in this case, the webshop) and have them build the pipelines using the technology platform we supported. This way, we made sure that producers can have direct control over the quality, timeliness and meaning of their data. Organisationally, it also allowed the new features being built by the webshop team at the request of the retail business to be considered in this pipeline, with the data department acting more as advisors.
To learn more about how Ocado Technology adopted BigQuery, please register for this webcast.
Looking back, this project allowed the data team to gain valuable insight into the process of moving to the cloud and, more specifically, of using the Google Cloud Platform for customer analytics. With the Ocado Smart Platform, we’re looking forward to replicate this success story and roll it out to other grocery retailers as well as continuing to use it for ourselves.
Dan Nelson, Head of Data