Kunal Jain — June 10, 2015
Beginner Big data Career Interviews


With the growing usage of Hadoop, Datameer launched a custom big data analytics application to help people generate insights from data faster than ever before. Datameer provides applications with several benefits such as lightweight architecture, ease of integration, fast execution time and ease of use among many other advantages.

In order to understand their journey and product philosophy, we decided to connect with the management of Datameer. Fortunately, we got an opportunity to talk to Mr.Stefan Groschupf, Founder and CEO of Datameer.

datameer CEO big data analytics

Stefan is named one of the most “Innovative Germans Under 30” by Stern Magazine in 2003 and his company, Datameer, was voted one of the “World’s Most Innovative Companies” by Fast Company in 2013. We discuss the dynamic changes happening in big data industry and how Datameer plans to innovate in this evolving field.

Below is the complete transcript of our conversation.

AV(Analytics Vidhya):  Hi Stefan! Thanks for devoting your valuable time for this interview. To begin with, tell us about your journey from designing open source technologies to co-founding Datameer.

SG(Stefan Groschupf): When I was 16, one of the few movies you could see after the wall came down in East Germany was “Three Days of the Condor” in which Robert Redford used a PDP8 to analyze books. I always found that I couldn’t read enough books and the idea of being able to write software that could analyze text really fascinated me. I absolutely love reading and discovering new ways of thinking, new insights. The idea of being able to write software that could analyze text completely blew my mind. That’s why I developed, at some point, data classification clustering algorithms.

I did quite a lot of work on open-source plugins for Eclipse and also worked on JBoss. That’s eventually how I started working on Nutch, the technology that spun out Hadoop. Again, I was fascinated early on by the creative process of creating functionality around data, specifically, text data. I worked on network word graphs, with early thesaurus datasets and with Weka, which is one of the first data-mining, open-source frameworks books.

Open source is a great way to learn new technologies but also enjoy the creative process.


AV: When you started open source technologies, managing big data was a rare thing.  Not everyone thought of doing that. What were the challenges that you faced while building up this entity?

SG: We founded Datameer early when most investors and companies were thinking about infrastructure (e.g. Cloudera). We believe Hadoop is such a game changer that it requires an enterprise platform on top and that is where we can have the biggest impact to give power to self-service. We had the idea early when the market wasn’t quite ready for it but now we have first mover advantage where the market is exploding.

We have to be financially disciplined. We have only raised $37 million dollars so far and our next competitor just billed that last year. I think there’s a lot of hype in this space for good reason. There are potentially huge returns of investment. I think that we have built a very solid company and we have phenomenal growth year-over-year. We finance that growth by having a great product that people are willing to pay for.

Overall, if there’s a great opportunity, Silicon Valley or the world of emerging technology will always try to capitalize on it. If you have such a unique shift in the market like big data or data in general, brought to the market, then it’s clear that there will be a number of start-ups that want to jump on that bandwagon. Guess what? Building a company is really hard. Not everyone will do it.


AV: What are Datameer’s current offerings? How have you positioned your products in the market?

SG: Companies understand the value of analyzing big data, but the real subject matter experts like doctors, marketers, or financial analysts need a way to access the data on their own without needing to rely on IT or a data scientist.

Datameer gives business users a self-service, end-to-end big data analytics tool that sits natively on Hadoop. This approach drastically reduces both the cost and the time it takes to get to insight. What once were complex algorithms are now buttons you can click that will “automagically” identify groups, relationships, patterns, and even build recommendations based on your data.

We’re built to run where, and how, you want so we can run on the cloud for business departments or on-premise for our onsite Enterprise edition.


AV: Since you have targeted multiple industries, how difficult is it to deal with data from multiple industries in a place where data privacy & security is the major concern of companies?

SG: As we don’t deliver packaged vertical applications, data privacy and security is an implementation detail specific to the industries, organization and use cases where our product is deployed, not something we ship as a one-size-fits-all. However, as Datameer has robust, industry-leading capabilities to support data governance (including data privacy and security), organizations can easily implement their own policies, and the policies of the applicable regulators by configuring Datameer according to their needs.

We are very serious about providing an enterprise-grade product for our customers and offer features such as encryption, anonymity, access control and auditing.


AV: How does Datameer challenge the frontiers of today’s big data analytics industry? Also, what challenges do you face in doing so?

SG: Traditional and legacy data warehouses and business intelligence systems are complex, expensive, and time consuming to deploy. Hadoop brings a new way to store and analyze data — since it is linearly scalable on low cost commodity hardware, it removes the limitation of storage and compute from the data analytics equation. Instead of pre-optimizing data in the traditional ETL, datawarehouse, and BI architecture, Hadoop stores all of the raw data and, with Datameer, applies all transformations and analytics on demand.

We’re also a self-service solution built for business users. Business users are the subject matter experts, and should be the ones working with the data directly. They shouldn’t have to go to IT every time they want to ask a new question.

Lastly, we’re an all-in-one tool for data integration, analytics and visualization. In traditional business intelligence, there are typically three different tools and three different teams involved. With these features combined, there is no other tool on the market that allows you to get from raw data to insight in such a short period of time.

The biggest challenge is the cultural shift that needs to happen for organizations to become truly data driven. IT has traditionally been the gatekeeper of information and they have been holding data, and the insights from that data, close to their chest. However, for businesses to truly find the benefits of big data, they have to democratize its access. It really needs to be the people who work with the data the most – the doctors, scientists, business analysts, etc. – that have access. It needs to be less of protecting the old way of doing things, just because that’s how it has always been done, and more of a new data management process.

The learning curve that comes with the advent of any new technology. Businesses for decades have been using traditional and legacy BI technology and now there is an opportunity to transform the process and leverage Hadoop’s capabilities to get to new insights faster than ever. We have a lot of successful customers in a variety of vertical industries like Citi, Bank of America, AT&T, Vivint, American Airlines, and Cox Auto, just to name a few, that have seen instant results from their big data analytics. Our goal is to show other companies how they can find the same success.


AV: How do you envisage the growth of Big Data & Internet of Things (IoT) industry across the world?

SG: I think that the IoT phenomenon has the potential to offer unlimited access to new data. It’s not the devices per se, but the ability to analyze the data that these devices capture. We’re already seeing this in motion — both on an individual level and on a larger, industrial level. For example, the U.S. Women’s Olympic Cycling team found their competitive advantage by recording and analyzing their physiological and psychological data. As a result, they went from a five-second deficit at the world championships to earning a Silver medal in the 2012 London Olympics by 8/100th of a second — a triumphant feat that was achieved not only through dedication and athletic ability, but also through enhancing training with insights gained from analyzing big data.

At the other end of the spectrum, Fortune 500 energy company is collecting data about the average production of oil, gas, and water from each of its wells and combining it with historical well performance and geospatial data to look at efficiencies and deficiencies based on location and equipment. Extending this knowledge to non-IT users, like production engineers, the company was able to increase production by 60 percent and realized $100 million in incremental revenue.


AV: What is coming next from Datameer?

SG: We will continue to lower the barrier to entry for people to access insights from their data. I just took a look at the original drawing of the Datameer plans and they are 100 percent the same from when we started a few years ago. It was three boxes — data in, analytics and visualization. We still have a few things to build out, a few boxes we have to tick off that were in the original drawing, but we’re pretty close. We’re on the same path as we’ve always been on. That’s pretty cool, actually, compared to many other companies that had to find themselves and pivot many times. We’re still where we wanted to be.


AV: What would be your advice be to aspiring young folks seeking to build their career in data analytics / big data analytics and other related domains?

SG: Become a domain expert. As you’re looking at your career choices, I think it’s important to understand what Moore’s Law and logarithmic growth of knowledge really means. For a Java programmer, it’s cool to know assembly, but no one is working in assembly anymore. We have entire school programs dedicated to data science. This is cool for the next few years, but it will go away and there will be completely different technology in the future. There will be cool open-source libraries that you just plug in and don’t need data scientists anymore. Become a domain expert.


We’d like to sincerely thank you for taking your valuable time and letting us know about your uncommon experience. I am sure you will be an inspiration for many budding entrepreneurs.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

About the Author

Kunal Jain

Kunal is a post graduate from IIT Bombay in Aerospace Engineering. He has spent more than 10 years in field of Data Science. His work experience ranges from mature markets like UK to a developing market like India. During this period he has lead teams of various sizes and has worked on various tools like SAS, SPSS, Qlikview, R, Python and Matlab.

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *