One of the biggest projects under way in the Ocado Technology data team has been building our own solution for storing and processing extremely large sets of customer data in the cloud. An online-only supermarket, Ocado.com has a dedicated customer base of over 580,000 active users who visit our website several times a week and spend more than £100 per shop.
The nature of our business brings a level of complexity unlike anything else in the retail world; indeed, many have claimed that grocery is the holy grail of the retail market – and data analytics is playing a big role in customer retention.
Managing this sea of customer data is no easy task. Although we initially used Hadoop, the rapid growth of our customer base meant we had to find something that could scale quickly, and with minimal maintenance overhead, so we refocused on BigQuery and the Google Cloud Platform.
When we made the decision to adopt the cloud, we had several end goals in mind. Firstly, we wanted to improve the customer experience and streamline the feature design and testing of our Ocado.com webshop. Secondly, we realised that a cloud-first strategy would truly empower our business teams to have greater insight into our merchandising operations. Finally, we were looking to improve the responsiveness of the Ocado.com webshop by using some of the intrinsic advantages of the cloud (i.e. elastic performance and storage).
The initial work was carried out by a small cross-functional team that included a product owner, software engineers, business analysts, and business users. Even though the product owner was part of Ocado Technology, he had a horizontal line into the retail division to ensure that the solution remained cost effective and business viable. The business user was expected to adopt the report to make the transition from initial proof of concept to production easier. Overall, the entire team needed to be agile as the technology was new and therefore the final solution could change quickly along the way.
We timeboxed and ran the first instance of the project in Kanban to deliver a proof of concept as quickly as possible and with minimal cost, hoping to illustrate the value of the project as soon as possible – and then quickly implement it at a larger scale.
Switching from Hadoop to a managed service such as BigQuery part way through the project revealed a series of cost and performance improvements. No longer did we need to decide how many instances to bring up in the cluster, nor wait for it to instantiate, but we simply ran our queries and paid for the IO. We also didn’t have to perform maintenance on the servers or the Hadoop distribution. Google handled all of the uptime needs and backend patching. Most of all though we saw BigQuery out perform our Hadoop cluster by over 80 times on our largest dataset and for half the cost.
It would have been completely cost ineffective to scale our cluster to meet the BigQuery performance level. A further side benefit was that the data in BigQuery was accessible in various other GCP services (e.g. DataProc) and so we could leverage that power in multiple ways but only have to store a single copy of the data.
At the beginning, we accepted that the data came with lower quality assurance and zero SLAs. Once the proof of concept proved valuable, the pipelines were productionised. This is where we discovered lots of learnings as actually the data teams weren’t best placed to do productise the pipelines, especially with respect to quality and governance. We then spent a lot of time figuring out who could optimally handle this task and found that the data producing teams were best placed to do it.
The next phase was to involve the data producing teams (in this case, the webshop) and have them build the pipelines using the technology platform we supported. This way, we made sure that producers can have direct control over the quality, timeliness and meaning of their data. Organisationally, it also allowed the new features being built by the webshop team at the request of the retail business to be considered in this pipeline, with the data department acting more as advisors.
To learn more about how Ocado Technology adopted BigQuery, please register for this webcast.
Looking back, this project allowed the data team to gain valuable insight into the process of moving to the cloud and, more specifically, of using the Google Cloud Platform for customer analytics. With the Ocado Smart Platform, we’re looking forward to replicate this success story and roll it out to other grocery retailers as well as continuing to use it for ourselves.
Dan Nelson, Head of Data
Dan Nelson March 10th, 2017
Posted In: Blog
Welcome to the Ocado Technology Webinars, where you can hear from the people building the ground-breaking, game-changing technology that powers Ocado, the world’s largest online-only grocery retailer.
In this webinar Alex Howard Whitaker, cloud services engineer at Ocado Technology, talks about the challenges and opportunities involved in adopting a cloud-first strategy using Amazon AWS.
00:46: Starting from scratch means there are many choices and there is no right or wrong answer
01:08: Ocado Technology wanted to use AWS for existing systems and had to take an Agile approach to its implementation
01:39: Adopting a public cloud solution brought questions around security and performance
02:28: The development team looked at various cloud success stories, including Netflix
02:49: In order to ensure consistent adoption, the cloud teams created a set of best practices and guidelines
03:36: Ocado chose to use managed services offered by a service provider wherever possible to accelerate development without any downtime
04:08: The systems created to manage the cloud were also hosted in exactly the same way as the cloud applications themselves
04:34: The cloud teams evaluated both AWS and GCP and found the former better suited for front-end, operational services while the latter more focused on back-end, data analytical systems
06:08: Amazon AWS provides a myriad of services and there was a lot to learn about their individual characteristics
06:28: The first AWS implementation was relatively simple and straightforward
06:59: The second attempt joined the network hub account with a VPN back end
07:50: The third configuration aimed to decentralize various end points to improve access speed
08:44: The cloud team learned a lot from deploying live applications into the AWS configurations, particularly around data segregation, service limits and throttling
11:53: In the fourth version of the AWS implementation, the Ocado Technology team used the information gained from live deployment to create a more flexible configuration that could scale easily
12:19: The new architecture was based on microservices that used APIs to ensure abstraction of resources
13:00: Using access control and tagging to create better permissions for AWS applications
15:40: The architecture of our deployment process included an app registry, cloud provisioning, AMI build automation, cloud formation scripts and more
17:25: Using AWS Elastic Beanstalk, Ocado Technology deployed 250+ applications over a choice of stacks (Java, Python, NodeJS) and servers
17:52: Concluding remarks
You can keep up to date with the webinars by subscribing to our YouTube channel. This article provides clickable links that take you directly to the highlighted part of the video clip.
Alex Voica January 9th, 2017
Posted In: Blog
Being the world’s largest online-only grocery supermarket with over 500,000 active customers means we get the opportunity to interact with people all across the UK on a daily basis. Ocado prides itself on offering the best customer service in the industry which is one of the many reasons why our customers keep coming back.
Since Ocado doesn’t have physical stores, there are mainly two ways our customers and our employees interact directly. The first (and probably most common) is when our drivers deliver the groceries to the customers’ doorsteps; the second is when customers call or email us using our contact center based in the UK.
Today we’re going to tell you a bit more about how a customer contact center works and how Ocado is making it smarter.
On the surface, Ocado operates the kind of contact center most people are already familiar with; we provide several ways for our customers to get in touch, including social media, a UK landline number, and a contact email.
Customers can email, tweet or call Ocado
When it comes to emails, we get quite a variety of messages: from general feedback and redelivery requests to refund claims, payment or website issues – and even new product inquiries.
Getting in touch with a company can sometimes feel cumbersome. To make the whole process nice and easy for our customers, we don’t ask them to fill in any forms or self-categorise their emails. Instead, all messages gets delivered into a centralised mailbox no matter what they contain.
Ocado customer service representatives filtering customer emails
However, a quick analysis of the classes of emails mentioned above reveals that not all of them should be treated with the same priority. In an old-fashioned contact centre, each email would be read and categorised by one of the customer service representatives and then passed on to the relevant department.
This model has a few major flaws: if the business starts scaling up quickly, customer service representatives may find it challenging to keep up, leading to longer delays which will anger customers. In addition, sifting through emails is a very repetitive task that often causes frustration for contact centre workers.
Clearly there must be a better way!
Unbeknownst to many, Ocado has a technology division of 1000+ developers, engineers, researchers and scientists working hard to build an optimal technology infrastructure that revolutionises the way people shop online. This division is called Ocado Technology and includes a data science team that constantly finds new ways to apply machine learning and AI techniques to improve the processes related to running retail operations and beyond.
After analysing the latest research on the topic, the data science team discovered that machine learning algorithms can be adapted to help customer centres cope with vast amounts of emails.
The diagram below shows how we created our AI-based software application that helps our customer service team sort through the emails they receive daily.
The new AI-enhanced contact centre at Ocado
One of the fields related to machine learning is natural language processing (NLP), a discipline that combines computer science, artificial intelligence, and computational linguistics to create a link between computers and humans. Let’s use an email from a recent customer as an example to understand how we’ve deployed machine learning and NLP in our contact centres:
The machine learning model identifies that the email contains general feedback and that the customer is happy
The software solution we’ve built parses through the body of the email and creates tags that help contact cenre workers determine the priority of each email. In our example, there is no immediate need for a representative to get in touch; the customer is satisfied with their order and has written a message thanking Ocado for their service.
We strive to deliver the best shopping experience for all our 500,000 + active customers. However, working in an omni channel contact centre can be challenging, with the team receiving thousands of contacts each day via telephone, email, webchat, social media and SMS. The new software developed by the Ocado Technology data science team will help the contact centre filter inbound customer contacts faster, enabling a quicker response to our customers which in turn will increase customer satisfaction levels. – Debbie Wilson, contact centre operations manager
In the case of a customer raising an issue about an order, the system detects that a representative needs to reply to the message urgently and therefore assigns the appropriate tag and colour code.
This new ML-enhanced contact centre demonstrates how Ocado is using the latest technologies to make online shopping better for everyone.
Ocado was able to successfully deploy this new product in record time as a result of the close collaboration between three departments: data science, contact centre systems, and quality and development. Working together allowed us to share data and update models quickly, which we could then deploy in a real-world environment. Unlike a scientific demonstration where you’re usually working with a known set of quantities, the contact centre provided a much more dynamic scenario, with new data arriving constantly. – Pawel Domagala, product owner, last mile systems
Our in-house team of data scientists (check out our job openings here) trained the machine learning model on a large set of past emails. During the research phase, the team compared different architectures to find a suitable solution: convolutional neural networks (CNNs), long short term memory networks (LSTMs) and others. Once the software architecture was created, the model were then implemented using the TensorFlow library and the Python programming language.
Python is the de-facto most popular programming language in the data science community and provides the syntax simplicity and expressiveness capabilities we were looking for.
TensorFlow is a popular open-source machine learning toolkit that scales from research to production. TensorFlow is built around data flow graphs that can easily be constructed in Python, but the underlying computation is handled in C++ which makes it extremely fast.
We’re thrilled that TensorFlow helped Ocado adapt and extend state-of- the-art machine learning techniques to communicate more responsively with their customers. With a combination of open-source TensorFlow and Google Cloud services, Ocado and other leading companies can develop and deploy advanced machine learning solutions more rapidly than ever before. – Zak Stone, Product Manager for TensorFlow on the Google Brain Team
Understanding natural language is a particularly hard problem for computers. To overcome this obstacle, data scientists need access to large amount of computational resources and well-defined APIs for natural language processing. Thanks to the Google Cloud Platform, Ocado was able to use the power of cloud computing and train our models in parallel. Furthermore, Ocado has been an early adopter of Google Cloud Machine Learning (now available to all businesses in public beta) as well as the Cloud Natural Language API.
If you want to learn more about the technologies presented above, check out this presentation from Marcin Druzkowski, senior software engineer at Ocado Technology.
Make sure you also have a look at our Ocado Smart Platform for an overview of how Ocado is changing the game for online shopping and beyond.
Alex Voica October 13th, 2016
Posted In: Blog