Discover more from Modern Data Democracy
A Friday Fight and the Internet of BI
On BI Maximalism, Open BI protocol and many other new terminologies nobody asked for.
Last Friday, for once, I decided to accept Benn Stancil's (a.k.a modern data bully) routine challenge for a fight. The topic? The future of BI. Benn wrote once again a great piece on the latest trend in the dataverse: the "unbundling of BI". The BI of the 2000s included ingestion of data, storage, transformation, visualization and discovery. All of these problems needed to be solved so that business could have a pretty dashboard with the most important KPIs with a reasonable speed. But in the last decade or so, we have started stripping BI off its clothes, piece by piece, to find out what it is in the end all about. Ingestion of data? We got Airbyte and Fivetran for that. Storage? Snowflake. Transformation? We have dbt now and some new kids on the block: the metrics layer (shout out to the folks at metriql). Discovery? We have Amundsen. And of course not to mention all the great competition for every single piece of the "old BI" closet.
The future of BI, as Benn puts it, is all about answering questions: "If you—an analyst, an executive, or any person in between—have a question about data, your BI tool should have the answer." I'm all in with Benn on this one. That is exactly what drove me to start my analytics startup Veezoo (with my colleague Till Haug and my brother Marcos): "to make access to information as simple as just asking for it". Like, literally asking for it.
Ok, so what is there to fight about? Bear with me still.
Apart from BI including only consumption of information, it should, according to Benn, include all modes of consumption. Now, on an abstract sense, I agree with Benn on this one as well. Users don't want to think about where they will get their answers from. They want answers. However, between the lines, what I believe is implied (and the further discussion on Twitter made it clearer) is that this tool will likely be a single product. The ultimate BI solution that unites data scientists and business users.
BI Maximalism
This got me thinking. Is that our final destiny? After we've stripped all of BI's clothes and have chosen step-by-step the best tool for the right job. After we've done our dimensional modelling and our metrics modelling in a central place and avoided vendor lock-in. After we've created the "data OS". Did we do all of this so that we could have one and only one ultimate BI product that managed to implement and solve all of our needs - from dashboards to reports to notebooks to alerts to ad-hoc exploration to NLQ (natural language querying) to automatic analysis of data to spreadsheets? One solution that caters to the needs of all these heterogeneous groups of users? From business users to data scientists?
To borrow a terminology from the crypto space, people call this "maximalism" (as in Bitcoin maximalist): when someone believes the trend is for consolidation around a single solution that fits all our needs. So without intending to diminish any of the merits of such a position, I will from here on call this "BI maximalism". This, I would argue, is the status quo of the industry, mostly for practical reasons. Just try bringing any other BI solution into a company and you will hear - well-argued and quite reasonable - complaints about the costs of maintaining separate systems. And yes, tools targeted at data scientists are usually considered outside of the BI umbrella, allowing them to co-exist separately.
As I mentioned to Benn, here is a collection of possible disadvantages, as of today, of a multi-BI setup: inconsistent answers, hard to maintain, creation of silos and fragmented collaboration. But I urge you to consider: are these inherent problems to the solution? If we end up having a "data OS" that centralizes our data definitions and resources, wouldn't we want separate data apps? Isn't that the trend we are following? And how far could we bring this? Could we go around these theoretical caveats? What would it require?
But first, why would we want this?
Why not BI Maximalism?
A simple answer would be: it just seems more likely that we will get better individual solutions if these are developed by diverse sets of teams, not under the same vision and same culture. That is the story of every David that went after a Goliath in the tech space. A highly-motivated, missionary team that was so focused about a specific problem with a specific solution that just kept on learning, iterating and innovating, ultimately outpacing their competition. There is a lot of learning involved in building a perfect data notebook, as there is in building a NLQ solution or a great dashboard interface.
And if we decrease the cost of innovation and lower the entry barrier for new data apps, we may experience a real BI Renaissance. I may not be well-informed, but I don't know of any successful standalone "data alerts" solution out there. (If you know one, just drop me a mail or write a comment somewhere). This is probably even in your BI checklist: "be able to create alerts on certain conditions in the data". And there goes all the BI solutions out there reinventing the wheel so that we can have data alerts. And all data alerts startups having to expand to reporting, so they can have a MVP. This just seems like a lot of waste of potential.
I have recently read the thought-provoking book Radical Markets by Eric Posner and Glen Weyl, where they argue that property is a kind of monopoly. This outrageous idea is based on the fact that if you own something, you are the sole person to dictate how this thing can be used, which as the authors argue may lead to inefficiencies. Similarly, in the case of BI, the incumbent solution inside a company monopolizes how data gets consumed. Maybe it is not the best interface for the business user, but, hey, we’ve already spent so much money putting it in place and it just doesn't make sense to bring product X just because their feature Y is better.
Ultimately, I believe that a more open, ecosystem-like BI approach would lead us to a more efficient, competitive and innovative market. And this view of BI, I will (hesitantly) call "the Internet of BI" (or “BI mesh”? “Micro BI”?).
Data OS, Data Apps and... the Internet of BI?
(This is the place where I try to bring another analogy that nobody asked for.)
So, let's say we have solved the problem of inconsistent answers. We have a data OS that abstracts away the access to the data, definitions, metrics, well-defined dimensions, that regulates how we use the underlying resources, etc. Well, this also decreases the cost of maintenance, since we only need to adapt things in one place using one language. Great! So what's missing for our multi-BI ecosystem panacea? We still have siloed analysis with fragmented collaboration.
This is where a protocol between solutions would come in handy to break through the walled gardens. Our own "TCP/IP" for a connected BI ecosystem. The Open BI protocol (working title).
How would this Open BI protocol (or Open BI standard) look like? I don't know yet for sure, I'm just the provoker here. But I can imagine some way of handing over the analysis to another solution. Would this involve SQL? Maybe. Maybe with some more meta information. Like "hey, this is an answer to a question, please add this to the dashboarding solution". Or: "Please create an alert on that". Or: "Please do some crazy fancy automatic analysis on it". We may just hand over answers and let the other side interpret it. You probably have by now some ideas to share. Just chime in your 2 cents.
One thing I could see very easily happening here is agreeing on a standard representation of an "answer" (a "widget" or however you wanna call it, but I still prefer “answer”) and a collection of “answers” (a "dashboard", a "report"?). So that we could - and I think I am with all the orphaned Chartio users on this one - easily swap dashboarding solutions like we change clothes. (Sure, there is still all the transformation stuff that Chartio did, but we use dbt for that now.)
Gartner BI leaders are reading this thinking: "yeah, right, no way we will implement this standard, vendor lock-in is our bread and butter". But well, maybe it's not for them to come up with this standard. Perhaps it's for the new generation of BI solutions of the dbt Era to take the first step. Even if you are not the new, fancy open-source solution out there, implementing an open standard (like this Open BI proposal, which may involve the underlying metrics layer definition) could already go a long way in convincing new prospects to adopt your solution without reluctance.
Ok. So are we done here? We can have solutions talking to other solutions, but is this enough? Seems still rather fragmented. And where do I come in for the answers?
Introducing the BI Browser
(By now you may have realized that I have the tendency to introduce new terms… like the rest of our industry.)
So, hear me out, a single interface that combines all the interfaces together in one solution… Did we just go back to where we started? Not quite. The BI browser would implement the Open BI standard and offer a single point of access to the underlying data apps. You will probably need some kind of search on top, maybe a sidebar for the categories (e.g. datasets, reports, etc.) and add an open marketplace to discover solutions you may be lacking, e.g. that data alert startup that didn't exist before. Unlike BI Maximalism, it’s not a BI solution, but an open platform.
Ok, a portal. You just invented the infamous portal of the corporate BI world. Yes, it's actually not such a bad idea. It doesn't need to be clunky and owned by a single vendor like SAP. But people want to know where they should go to if they want answers. Will data catalogs be our entry point? Maybe. Perhaps you've read the Minerva blog post from AirBnb. If not, take some minutes to read after you're done with my ramblings. Personally, I think data catalogs are too 'data-y', 'tables-y'. And I also don't fully agree with "business users talk about metrics, not 'tables'". Not that they talk about 'tables'. More that they often talk about other things apart from 'metrics'. There are many operational use cases for data that go beyond analyzing a certain metric and breaking it down by XYZ (or “Analytical Mad Libs”, if you may).
In any case, the "BI browser" would be our gateway to the answers we are seeking. You may be a data scientist or a business user, but that's where you will find your answer with the right interface for you.
And this is how we come back to where we started. When I realize, I don't fundamentally disagree with Benn - it's hard to disagree with him, even though we do see things differently in many topics (e.g. natural language interfaces and self-service). But these are topics for another post.
Subscribe to Modern Data Democracy
Personal, high-falutin' rants on data democratization, modern data stack and natural language interfaces by JP Monteiro.