Christopher Robison, Lead Data Scientist, Overstock.com
Some time ago, I participated in a panel about data engineering and its use cases. Among the panel members were leaders in big data, data engineering, enterprise data warehousing and me, a humble data scientist. What brought us together? A common fascination with the changing state of data and our need to adapt quickly to keep up with that changing data landscape. Amazingly, even with varied backgrounds, perspectives, and experiences, we all arrived at one central conclusion; we need real-time data to stay competitive.
As I’ve explored this need for real-time data and implemented its uses in a nearly $2 billion enterprise like Overstock.com, I’ve found the importance of interacting with users immediately to personalize in real-time. In my work, I’ve discovered three important aspects that help build an organization that functions on real-time data. Those three points are focused on (1) tools, (2) people, and (3) insights.
As a data scientist, I’ve spent most of my career consuming data on various platforms, analyzing bulky batch queries in large relational databases, but as my career has progressed to using real-time data, my data consumption has also changed to a nuanced streaming infrastructure.
In an ever-changing landscape of catchy computation engines and a sea of NoSQL databases, it’s tough to tell which database or platform is the best or the right one for me, my team, or my organization. My fellow data gurus on the panel agreed; there is no silver bullet to this problem. While real-time diagnostics require the latest and greatest in cutting-edge marketing architecture, the majority of organizations will never move completely away from the need for a reliable, stable, relational database for post-hoc analytics and business strategy. In fact, the top four databases on, both, Stack Overflow’s developer survey and db-engines.com are all relational databases suggesting a continued and present need for the traditional approach.
In an ever-changing landscape of catchy computation engines and a sea of NoSQL databases, it’s tough to tell which database or platform is the best
By combining a relational database with nuanced streaming infrastructure, organizations can create a customized marketing architecture to help enable real-time data use. There are several good organizations out there that can help build the architecture that is right for you, but the key is finding one that fits your organization, team, and goals.
After choosing the right tools, you need to have the right people with a good mix of skillsets, backgrounds, and views, just like the panel I sat on. This diversity leads to healthy debates around data governance and infrastructure, bringing maturity, sophistication and a sense of competitiveness to a discipline that is quickly evolving. Within this fast-moving landscape, the decisions we take lightly today can have an enormous influence on the tech debt and problems we face in the future.
Most important, just as systems need to work together, people and teams across an organization must collaborate. We have all heard that data scientists spend 80 percent of their time working on non-data science tasks, centered around gathering and preparing data. By breaking down silos, you can shift the productivity of your data scientists and create workflows in order to best utilize the skills they were hired to use and leverage data engineers and ETL professionals to do that other 80 percent.
You’ve chosen the right set of flexible and scalable tools for your organization and you have organized a team of strong, diverse,and intellectually curious individuals. What’s next?
You need to be able to drive actionable insights, while the data is relevant and reliable. You then need the ability to do post-hoc analysis on decisions and strategy to roll into future real-time decisions. For a large e-commerce company, this all comes down to driving customer expectations. By utilizing real-time technologies to personalize immediately and robust analytics to evaluate strategy, organizations can become leaders in setting customer expectations rather than playing catch-up to competitors.
Our efforts should result in technology and tools that empower our business partners to make decisions on our insights and act strategically based on decades of domain knowledge and recent verified data. We must remember“cool”and “innovative” tech that goes unused is a waste of time and resources. In a recent article, Shant Hovsepian, co-founder and CTO of Arcadia Data and member of Forbes Technology Council said, “Ultimately, the usefulness of streaming technologies will be measured by the businesses who depend on them for critical capabilities and use cases.”
Building an environment that is designed for real-time data usage is not done overnight. It takes a serious investment of time and resources, resources focused on the right people and tools and driven on actionable insights, but the benefit of that investment is crucial to future and immediate success.