PAW 2019 Podcast Series

PAW 2019 Conference Series – Tony Ayaz – Gemini Data Inc.

Welcome to the 2019 Predictive Analytics World (PAW) Conference Series. Recorded live in Las Vegas, this series is bringing interviews straight to you from exhibitors and speakers at this year’s event. In this interview, host Jamin Brazil interviews Tony Ayaz, CEO and Co-founder of Gemini Data Inc.

Find Tony Online:



Gemini Data Inc.


Tony, when did you start Gemini?


We started the company in 2015.


2015.  We’re in the Happy Market Research Podcast right now.  We’re at Predictive Analytics World and Marketing Analytics World and there’s lots of worlds in this particular conference. I think there’s like twelve.  Have you guys been to this conference before? 


This is actually our first time at this conference.  And for us, I think it’s a win because what we’re looking for is real users with real pain a little bit beyond the typical IT folks that we’re looking for.


Got it.  So, you’re based out of San Francisco.  You started the business in 2015; so, you’ve had some success obviously.  Tell me a little bit about what you guys do.


Sure.  At Gemini Data, we help our customers with digital transformation initiatives.  What I mean by that is we help customers achieve data availability. And data availability is a necessary requirement today if you’re really looking to do something significant or on digital transformation, AI, or ML initiatives.  And what we mean by that is that you have to access to data and you have to make that data available. And there’s a lot of talk about out-of-the-box machine learning solutions and things that are out there in the market. But the reality is that if you’re doing complex things and trying to run your business, you need data diversity, and you only get that through data availability.  And so, what we do is we leverage the customer’s existing investments in various, different data platforms: it could be in a CSV; it could be in a data lake. It doesn’t really matter to us. We have a method that we apply called Zero Copy Data Virtualization that actually takes your data that’s sourced without you to move or copy that data or do the complex ETL processes that we’ve all been used to for the past two to three decades, which just simply doesn’t scale with AI.    


Data diversity is a term I’ve never heard before, but it is my favorite one in this conference.  Diversity is something that we’ve… we’re becoming more and more aware, especially in the Bay Area, like Silicon Valley…  I’d say globally you’re seeing… The math is that if you have more diversity in your senior leadership team, then you have a better world view, which gives you an improved advantage in the marketplace, right?       




And what’s interesting is how you’re connecting that in with data.  It isn’t about a single… right? It’s about different types of data.  You mentioned CSV versus data lake, which are vastly different, like profoundly different.  Your system is able to ingest both of those? 


I wouldn’t say ingest, access those systems.


Got it


We don’t want you to move or copy the data, but we allow you to access it in a unified way.


OK, cool.  So that bypasses some PIII?


Yes, it bypasses it in the sense that you’re giving access to people that should have access to it.  So we follow the same protocols of data access they have as their role or authority would provide them.  But we take it a step further of looking into five years from now, Zero Trust Networks are going to be deployed, which is a new, let’s call it, security protocol or methodology, which basically changes things versus where we’re at today:  It’s the perimeter of defense, which I’m going to put firewalls around things; I’m going to give you access to things you should have; and then when you’re not an employee, for example, I take you off. Think of this as more of a real-time basis of how you should have access, when you should have access, right by you as a user using the system.  Nobody has to manually set things up for you. The machine kind of knows what you should have access to, what you shouldn’t. It protects you. And this is something that’s far more deeper and can evolve, but you can only do that by applying modern architectures that have been around less than five years, I would say, to go to this next level for security.      


That’s really interesting.  You’ve been in the industry a long time.  What do you see as some big trends both from things that have evolved relatively recently in the last two to three years and then where we’re going in the next two to three years?


If I may go even a little past two to three years…  So, in 2005 is where I like to start is when the evolution of Big Data started, right?  It was the dotcom crash, but then things were coming up. Big Data, grab all the data that’s going to solve world hunger.  It’s going be awesome.


I actually think I saw that tweet.


Yeah, probably.  Right. At the time, there was nothing wrong with that.  That’s what you had to do. There’s a whole bunch of data coming in.  Nobody knew how to collect it. So the idea was centralize all this data.  Just grab it. And then there was a lot of successful companies that came through. which one of I had the pleasure of being at.  It was called Splunk in the early days. We grabbed the data, brought it in, and centralized it, made it easy for people. Well, that was 2005, and at the same time, data lakes came out and the whole Dupe and Open Source.  Fast forward to 2013 or into current time, you’re dealing with data chaos. And what’s happening is now that everybody has actually collected everything you could imagine. I call it a messy filing cabinet. Imagine if you went to your filing cabinet and didn’t have proper files and you just shoved papers in there, every time you need to go look through the papers, you have to sift through one by one.  Now, think about the petabytes of data that’s out there.      




That strategy does not work.  If you’re just collecting it, you’re making it very had to access.  And so where we’re going tomorrow, meaning the industry, from an AI perspective is back to that point about AI needs data diversity, right?  You need to make sure that you’re looking at all different data. So, if somebody tells you to move your data somewhere else and port it here or put it in the cloud, they’re doing a dissatisfaction because we’re playing the same game again.  You’re moving that data again, waiting to get access to it, and what I think customers would need today, and if they’re thinking about AI, is I’ve made my investments but I need to make it easier to access. And the way we access that is we make it easy for you to apply standard [unclear] that’s been around for three decades, and you can use it across all these complex systems and bring the data together.  Whether it’s CSV, whether it’s in a database or a data lake, it doesn’t really matter; we’re giving a uniform way to access it.     


And then, it’s accessed and then is there also display and interact with on the other side of it?  


Absolutely.  So, we have our interface that you can look at the data; we integrate it with a graph database:  much like you can use LinkedIn to see your first- or second-degree contacts. Image if you could do that with your data.  So we bring the data together; we allow you to see the relationships, which that by itself provides a significant value to customers because 51% of data scientists dilemma is getting access to the right data set to apply machine learning.  And, if you’re an analytics person, you’re relying on IT way too much to get that access. So we provide that as an option. We have other analytics capabilities on top of that. But the other thing that we do that’s interesting is, if you’ve invested in a Tableau or a Looker or a Business Intelligence of your choice, we don’t want to disrupt the business user.  So, they want data diversity. So we actually can send that data into their BI tool of choice as well. 


So you’re really fitting like an API basically or this middle ware (I don’t know what the right framework is), but that allows a Rosetta Stone of sorts, right, where it’s able to then interpret that messy data structured and then…   


Yeah, think of it as a…  If you want to classify data management and data integration technologies that have been around for two, three decades, we’re now at a point that they’re trying to apply that towards AI, which basically means there’s a lot of consulting and ETL and time and preparation and people needed to do that.  With the amount of data that is being spit out and what you need to do with AI, that doesn’t scale. So, we’re bringing a modern approach from a cloud prospective how you can access data to source, not move it, and accelerate the analytics process. 


Oh, that’s huge, that’s huge.  That speed-to-insight is what’s king right now.  


Exactly.  And to your question about the industry, there’s been recent acquisitions with Tableau and everything that’s happened.  In our opinion, that’s kind of validated the need for the market. Now look, if I’m Sales Force and I have the large, diverse data sets and I need to integrate them together and bring Tableau together into that, that’s a fantastic purchase.  But what if you’re not ready to make that migration to the cloud? What if your data is on premise? What if you don’t want to move it around? And customers need to leverage those systems and bring that power to them. But, in reality, what also people have to think about is how am I going to make it easier for my business users, who are not technical, to get access to that data.  And that’s why we really rely on sequel or we make it a graph-interaction with the data so everybody can understand it. We all know the challenges of hiring technical talent.     


War on talent for like the last three years and getting bloodier.  It’s just mind-blowing what’s happening right now on that front.  




So, who’s your ideal customer?


Starting from the top, I would probably say a Chief Data Officer or anybody in that function or reporting within that group.  Below that, I would say Head of Analytics or business intelligence leaders. And then I would say a layer below that, it would be data engineers that are sometimes tasked with getting this and we can provide tremendous automation for those folks as well.


Got it.  Favorite customer story?


Favorite customer story I would say is in the health care industry.  I’m not at liberty to mention the customer but… 


We never are if it’s pharma or health care.  It’s always top secret.  


It’s a health care but the best quotes I heard from the Chief Technology Officer was that “Hey, you guys bring the best of both worlds to me.”  He goes, “I have my data governance people here that are always telling me how to protect the data and make sure that we don’t violate any compliance issues or things like that.  Then I have my Chief Research side that’s always looking at cutting-edge, innovative things. And they’re supposed to look at AI how to improve things. And you guys are bringing the best of both worlds.”  And what he meant by that was we’re moving in those data-sharing economy that let’s say you’re a researcher for cancer and you have a cure for some ailment of cancer; and I’m a numbers-cruncher, but I have all this data based on medical device data that when we applied this medication to that we could solve that problem; and a third party may be a health care provider trying to see how many patients are accessing that or could improve the lives of people or reduce insurance rates or whatever may be.  This is all stored in different areas. If we could actually share our data in the context that we had together, those are tremendous things that we can solve together. And that’s why I really like the health care side where we’re giving them access to so many different data sources that can have profound effects on the better good and health and other beings’ aspects of life that we can hopefully provide for our customers.      


That is super powerful.  I love that story. That just crystallizes the importance of the work that you guys are doing.  


Exactly.  Thank you.


If someone wants to get in contact with you or Gemini data, how would they do that?


You go to our website or contact  I’m always accessible, so  Yeah, it’s very easy to talk to us.


Yeah, perfect.  Tony, thanks so much for being on the Happy Market Research Podcast.


Thank you for having us.


Everybody else that’s listening to this show, please take the time to screenshot it.  This is my – sorry to the other guests – favorite episode so far. The data-sharing part I thought was really interesting that Tony brought up.  The data-sharing economy: that is such a powerful framework for us to start understanding how we’re going to make better decisions as we incorporate more data diversity in those.  So definitely take the time, screenshot this. Hope you tag us on Twitter, LinkedIn, whatever your social media platform of choice is. Thanks so much for all the support. Have a great rest of your day.