Welcome to the MRMW NA 2019 Conference Series. Recorded live in Cincinnati, this series is bringing interviews straight to you from exhibitors and speakers at this year’s event. In this interview, host Jamin Brazil interviews Rudy Bublitz, Director of Digital Taxonomy.

Contact Rudy Online:



Digital Taxonomy


My guest is Rudy with Digital Taxonomy.  Digital Taxonomy.co.uk is name of their website.  I hope you’ll check them out. It’s really an interesting combination of AI and human judgment to transform unstructured text into actionable insights.  I was impressed with their framework for how to derive and pull out high quality in that really-becoming-a-crowded space. I hope you’ll check it out.  


So, my guest today is Rudy with Digital Taxonomy.  Tell us a little bit about Digital Taxonomy.


We are a software provider.  We have two products in our portfolio.  We focus on verbatim coding within the survey research context and that includes traditional verbatim coding where humans are employed.  But our application tries to make best use of both a natural language processing, text analytics, sentiment analysis as well as a machine learning capability to try to automate the things humans do.  We don’t want to replace humans; we just think it’s… you know we can optimize their performance by making best use of these tools that have been around for a while, trying to work that into a human interface.  So, that’s the real struggle.


So, talk to me a little bit about how you guys are different.  There’s a couple of different text analytics companies out there in the market research space.


A couple?


Yes.  [laughter]


And all the freebies and all the …


Right, and then you have AWS, you know what I mean, at scale.


Yeah, you can go out, and you can write things in R, and you can do a lot of things on your own.  But if you’re going to create an interface that humans are going to interplay with those things and control them – those different technologies, that’s a bit harder.  The human interface is really the most important piece because you have to have ways for a lot of people in an organization to use the software, not just specialists. A lot of this stuff… There’s a roomful of three or four specialists that are in charge text analytics, machine learning, and they go off and do their thing.  We’re trying to create an application that can be used by analysts, by coders, by DP staff, by end-customers, everyone.


What is one of your favorite projects that you guys worked on?


Oh, I’ve done a bunch of work.  I’m kind of a Londonphile, and I’ve done a bunch of work on London hotels and restaurants, massive numbers of reviews.  So, I’ve come to know the restaurant and hotel industry quite well in London, which is very helpful. Many of these are the very posh places.  Using the text analytics and a combination of a little bit of human assistance, built basically my own Yelp for London for hotels and restaurants.  You pick sort of the things that are important to you and the ratings of the restaurants. and I can show you selections in that category.


So, from a workflow perspective, how do companies interact with you?  Do they provide you their unstructured data in like a file or are you usually part of a quantitative study?  How does it…?


So, right now we license the software.  So an agency will license the software from us.  The data can come in from any form. And we have an open API as well; so, if it’s a consistent method for transporting data from tools like Decipher, from Askia, from Survey Monkey, anywhere, we can set up an automated process so that  just flows, in the evening. We also have the ability to code in surveys; so, we can analyze open-end as soon as it’s entered in a survey and provide results directly back to the survey. So, most of that is automated, but there’s definitely a file-drag-and-drop as well; it’s very simple.  And so then, the agency or client will go ahead and use any of the tools, code the verbatims, then provide data for whatever is next: tables, reporting; more classically today, it’s visualizations, which is another really fun front for us.


Yeah, that’s super interesting.  Have you guys heard of a company called mTAB?


Yeah, mTAB is a wonderful package.  And I’ve spoken to them, yeah. You know it’s interesting to see these tabulation tools coming to the fore again.  Tabulation is kind of the dirty secret of market research. It’s how the data initially gets represented before it becomes the beautiful dashboard.  Yeah, they have quite a good product.


Yeah, they really do.  It seems like there’d be a really interesting partnership opportunity in that kind of a…


Yeah, and we’re open to that, absolutely.  You know with the API, for instance, you could plug our text analytics directly into a table.  And so, you click on a cell and here are the sentiment results for the people in that cell based on some open-end.  Could be done.


Yeah, it’s super interesting.  Well, if you’d like that connection, drop me a note and I’ll connect you to the CEO.  




So, we are live, obviously, on the floor of MRMW in Cincinnati, Day 2.  You guys are exhibiting. What do you think about the conference so far?


It’s a good conference.  It’s a very high-tech conference.  Most of the people here… The phrase “preaching to the choir” keeps hitting me because it’s going to be difficult… and I speak later today.  I’ve altered my presentation based on Day 1 – some points that are just moot for this audience. It’s a given, and so I’d rather just skip past that and try to talk about where we’re thinking about going in the future.  But it’s a great conference; a lot of very smart people; a lot of great corporate involvement. I am a Cincinnatian; I’m very proud of the market research industry in Cincinnati. We have a few companies here that really impact the industry:  Proctor & Gamble, Burke Market Research, MarketVision, Directions; Kroger is here; Federated’s here; and the old Jergens company, which is now Kao Research. We have a really long tradition in this city of consumer products: lot of soap, lot of candles, lot of shampoo.


Thank you, Proctor & Gamble.  


And food with Kroger.


Yeah, Kroger.  That’s right, absolutely right.  Massive brands.    


They’re huge.  And now Kroger’s brought 8451 here, the Dunnhumby component.  Market research in this city is quite vibrant.


Your talk today, you’re going to be looking, obviously, as you said, more in the future.  That’s super interesting, especially in the context of text analytics, which, I think, I like “right-now” technologies.  Sometimes you can be too early, and sometimes you can be too late. Text analytics feels like it is the Goldilocks of industry over the next probably three to five years.  I think video analytics are, obviously, trending as well, but I see this as really the precursor of video analytics hitting scale because, obviously, you have to have that technology dialed in, and whoever winds up being the standard in this text-analytics space is going to have a big impact, I think, ultimately on the adoption of video analytics and how that data is consumed.


Yeah, I agree.


So, give us a little bit of highlight.  What are you thinking?


As we said, text analytics has become quite common; there are a lot of applications, and it’s difficult to separate yourself just there.  What we’re thinking is: in this industry, there are a lot of agencies; there are a lot of little studies; our samples are small. And we don’t play together very well; agencies don’t talk; they don’t share even though many of us in this room have worked for multiple agencies, who now compete.  

So we’re thinking if we could take a kinder, gentler approach where we could design a place where people could safely and securely pool their results based on text analytics or more traditional machine learning and make that available as sort of a consortium to the industry.  So, how many code frames do we need on hotels and restaurants, and soap, and shampoo? There are probably hundreds because each agency has their own. And yet they’re collecting the same data; they’re reporting it in the same way. If we could pool together the texts so that you might be able to go to a library somewhere and just pull it down, at least the core concepts in that category and start with a history of coded information, again securely protected.  You know maybe we could move this needle forward a little faster. And it wouldn’t matter which text analytics tool you use or which machine learning tool. That’s going to change. I mean they’re really sort of the new in the last five years.

It’s actually very old technology; I’ve been doing this for 20 years.  But in the last five years, a lot of the big players have got involved; so, it’s going to get better.  But I think, philosophically, if we were to work better along those lines, I think that would push the needle forward faster than worrying about the actual bits and bytes of the technology.  


This is a cool point that you’re making, and I actually see this at a quantitative level too.  At a micro-level, this is what happens inside of a survey: I recruit a panelist; the panel knows that it’s a female; the female comes into my survey; and, lo and behold, what’s the first question?  “What’s your gender?” So all the sophistication that we’ve built out over the last 20 years with online data collection has really netted out to a lot of the same bad behaviors and redundant behaviors that had to exist…  Actually, I’ll tell you this: we were better off doing in-mall intercepts because then I could sight-screen five questions, right, and I didn’t have to ask you those. But now we’ve like regressed into these protective shells.  I think it’s absolutely ridiculous and the pain that we’re causing at the panelist-level.

What’s interesting is like you’ve got Lucid and others who have been great at aggregating panels and getting them into survey platforms, but one of their big problems is every panel company has a different definition of age, right, as an example.  So you don’t have a clear path, an API, a triple S structure, JSON or XML or whatever that clearly defines what that category needs to look like or that categorization needs to look like so that you can then skip or auto-populate those questions. And whoever cracks that nut. by the way… It is the gold ring.  I really believe this and this is why: Because then you can start taking things like unstructured social data, structure it, and then feed that into the survey systems too. What you’re talking about becomes a very powerful… I mean it’s the tail that wags the dog in the industry and solves a lot of problems at a lot of levels inside of the workflows.  


I agree.  Age is great.  Age is just a number, by the way, Jamin.  


Well said, sir.  I’m knocking on 50, so I feel you.


I got you beat.  [laughter] Think about that in the context of open-ends.  So, if I’m going to ask you “What do you like about your cellphone?”, “What do you like about your car?” – you’re going to say similar things to what everyone says.  So why wouldn’t I map the core-driving competencies within that field so that I can sort of automatically categorize you maybe by age and by which of those competencies you mentioned and structure the survey, change the survey that I ask you based on that?  If I know that what you liked about your cell phone is its size, then I can go different directions. So, that’s trickier because that’s unstructured, but that’s what we’re saying. If we were to pool together the number of times someone asks the question, “What do you like about your cellphone?” and come up with those driving categories, and people could start from there.  It would be much more proactive. You could certainly code it whenever you want it, you know. And that’s what we say is you have to have a human interface to the system. But I think we could push forward more quickly if we all would play together.


You’re right:  The collaboration needs to exist.  My guest today has been Rudy Bublitz, Digital Taxonomy.  Rudy, if somebody wants to get in contact with you, how would they do that?


I would say that email is best:  Rudy@DigitalTaxonomy.co.uk.  Yes, we have a London base.  It’s a long thing. Or just call me:  513-307-4925 day or night.


And we’ll, of course, leave that information in the show notes.  Rudy, thanks for being on the show today.  


My pleasure.  Thanks, Jamin.