Learn everything about Analytics

19 Data Science Tools for people who aren’t so good at Programming

, / 57


Programming is an integral part of data science. Among other things, it is considered that a mind which understands programming logic, loops, functions has higher chances of becoming a successful data scientist. So, what about people who never studied programming subject in their school or college ?

Are they doomed to have a unsuccessful career in data science ?

I’m sure there are countless people who want to enter data science domain but don’t understand coding very well. In fact, I too was a member of your non-programming league until I joined my first job. Therefore, I understand how terribly it feels when something you have never learnt haunts you at every step now.

Good news is, I found out a way! Rather, I’ve found out 19 ways using which you can ignite your appetite to learn data science without doing coding. These tools typically obviate the programming aspect and provide user-friendly GUI (Graphical User Interface) so that anyone with minimal knowledge of algorithms can simply used them to build predictive models.

Many companies (specially startups) have recently launched GUI driven data science tools. I’ve covered most of tools available in industry today. Also, I’ve added some videos to enhance your learning experience.

Note: All the information provided is gather from open-source information sources. We are just presenting some facts and not opinions. In no manner do we intent to promote/advertise any of the products/services.

data science tools for non programmers


List of Tools

1. RapidMiner

RapidMiner (RM) was originally started in 2006 as an open-source stand-alone software named Rapid-I. Over the years, they have given it the name of RapidMiner and also attained ~35Mn USD in funding. The tool is open-source for old version (below v6) but the latest versions come in a 14-day trial period and licensed after that.

RM covers the entire life-cycle of prediction modeling, starting from data preparation to model building and finally validation and deployment. The GUI is based on a block-diagram approach, something very similar to Matlab Simulink. There are predefined blocks which act as plug and play devices. You just have to connect them in the right manner and a large variety of algorithms can be run without a single line of code. On top of this, they allow custom R and Python scripts to be integrated into the system.

There current product offerings include the following:

  1. RapidMiner Studio: A stand-alone software which can be used for data preparation, visualization and statistical modeling
  2. RapidMiner Server: It is an enterprise-grade environment with central repositories which allow easy team work, project management and model deployment
  3. RapidMiner Radoop: Implements big-data analytics capabilities centered around Hadoop
  4. RapidMiner Cloud: A cloud-based repository which allows easy sharing of information among various devices

RM is currently being used in various industries including automotive, banking, insurance, life Sciences, manufacturing, oil and gas, retail, telecommunication and utilities.


2. DataRobot

DataRobot (DR) is a highly automated machine learning platform built by all time best Kagglers including Jeremy Achin, Thoman DeGodoy and Owen Zhang. Their platform claims to have obviated the need for data scientists. This is evident from a phrase from their website – “Data science requires math and stats aptitude, programming skills, and business knowledge. With DataRobot, you bring the business knowledge and data, and our cutting-edge automation takes care of the rest.”

DR proclaims to have the following benefits:

  • Model Optimization
    • Platform automatically detects the best data pre-processing and feature engineering by employing text mining, variable type detection, encoding, imputation, scaling, transformation, etc.
    • Hyper-parameters are automatically chosen depending on the error-metric and the validation set score
  • Parallel Processing
    • Computation is divided over thousands of multi-core servers
    • Uses distributed algorithms to scale to large data sets
  • Deployment
    • Easy deployment facilities with just a few clicks (no need to write any new code)
  • For Software Engineers
    • Python SDK and APIs available for quick integration of models into tools and softwares.

With funding of ~60Mn USD and more than 100 employees, DR looks in good shape for the future.


3. BigML

BigML is another platform with ~Mn USD in funding. It provides a good GUI which takes the user through 6 steps as following:

  • Sources: use various sources of information
  • Datasets: use the defined sources to create a dataset
  • Models: make predictive models
  • Predictions: generate predictions based on the model
  • Ensembles: create ensemble of various models
  • Evaluation: very model against validation sets

These processes will obviously iterate in different orders. The BigML platform provides nice visualization of results and has algorithms for solving classification, regression, clustering, anomaly detection and association discovery problems. You can get a feel of how their interface works using their YouTube channel.


4. Google Cloud Prediction API


The Google Cloud Prediction API offers RESTful APIs for building machine learning models for android applications. This platform is specifically for mobile applications based on Android OS. Some of the use cases include:

  • Recommendation Engine: Given a user’s past viewing habits, predict what other movies or products a user might like.
  • Span Detection: Categorize emails as spam or non-spam.
  • Sentiment Analysis: Analyze posted comments about your product to determine whether they have a positive or negative tone.
  • Purchase Prediction: Guess how much a user might spend on a given day, given his spending history.

Though the API can be used by any system, there are also specific Google API client libraries build for better performance and security. These exist for various programming languages- Python, Go, Java, JavaScript, .net, NodeJS, Obj-C, PHP and Ruby.


5. Paxata

Paxata is one of the few organizations which focus on data cleaning and preparation, NOT the machine learning or statistical modeling part. It is an MS Excel-like application that is easy to use, with visual guidance making it easy to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. Like others mentioned here, Paxata eliminates coding or scripting, so overcoming technical technical barriers involved in handling data.

Paxata platform follows the following process:

  1. Add Data: use a wide range of sources to acquire data
  2. Explore: perform data exploration using powerful visuals allowing the user to easily identify gaps in data
  3. Clean+Change: perform data cleaning using steps like imputation, normalization of similar values using NLP, detecting duplicates
  4. Shape: make pivots on data, perform grouping and aggregation
  5. Share+Govern: allows sharing and collaborating across teams with strong authentication and authorization in place
  6. Combine: a proprietary technology called SmartFusion allows combining data frames with 1 click as it automatically detects the best combination possible; multiple data sets can be combined into a single AnswerSet
  7. BI Tools: allows easy visualization of the final AnswerSet in commonly used BI tools; also allows easy iterations between data preprocessing and visualization

With a funding of ~25Mn USD, Praxata has set its foot in financial services, consumer goods and networking domains. It might be a good tool to use if your work requires extensive data cleaning.


6. Trifacta

Trifacta is another startup focussed on data preparation. It has 2 product offering:

  • Wrangler – a free stand-alone software
  • Wrangler Enterprise – licensed professional version

Trifacta offers a very intuitive GUI for performing data cleaning. It takes data as input and provides a summary with various statistics by column. Also, for each column it automatically recommends some transformations which can be selected using a single click. Various transformations can be performed on the data using some pre-defined functions which can be called easily in the interface.

Trifacta platform uses the following steps of data preparation:

  1. Discovering: this involves getting a first look at the data and distributions to get a quick sense of what you have
  2. Structure: this involves assigning proper shape and variable types to the data and resolving anomalies
  3. Cleaning: this step includes processes like imputation, text standardization, etc. which are required to make the data model ready
  4. Enriching: this step helps in improving the quality of analysis that can be done by either adding data from more sources or performing some feature engineering on existing data
  5. Validating: this step performs final sense checks on the data
  6. Publishing: finally the data is exported for further use

With ~75Mn USD in funding, Trifacta is currently being used in financial, life sciences and telecommunication industry.


7. Narrative Science

Narrative Science is based on a unique idea in the sense that it generates automated reports using data. It works like a data story-telling tool which used advanced natural language processing to create reports. It is something similar to a consulting report.

Some of the features of this platform include:

  • incorporates specific statistics and past data of the organization
  • makes of the benchmarks, drivers and trends of the specific domain
  • it can help generate personalized reports targeted to specific audience

With ~30Mn USD in funding, Narrative Science is currently being used in financial, insurance, government and e-commerce domains. Some of its customers include American Century Investments, PayScale, MasterCard, Forbes, Deloitte, etc.

Having discussed some startups in this domain, lets move on to some of the academic initiatives which are trying to automate some aspects of data science. These have potential of turning into successful enterprise in future.


8. MLBase

MLBase is an open-source project developed by AMP (Algorithms Machines People) Lab at University of California, Berkeley. The core idea is to provide an easy solution for applying machine learning to large scale problems.

It has 3 offerings:

  1. MLib: It works as the core distributed ML library in Apache Spark. It was originally developed as part of MLBase project, but now the Spark community supports it
  2. MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions.
  3. ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included in MLI and MLlib.

This undertaking is still under active development and we should hear about the developments in the near future.



Weka is a data mining software written in Java, developed at the Machine Learning Group at University of Waikato, New Zealand. It is a GUI based tool which is very good for beginners in data science and the best part is that it is open-souce. You can learn about it using the MOOC offered by University of Waikato here. You can learn more about it in this article.

Though weka is currently more used in the academic community, but it might be the stepping stone of something big coming up in future.


10. Automatic Statistician

the automatic statistician

Automatic Statistician is not a product per se but a research organization which is creating a data exploration and analysis tool. It can take in various kinds of data and use natural language processing to generate a detailed report. It is being developed by researchers who have worked in Cambridge and MIT and also won Google’s Focussed Research Award with a price of $750,000. Though is it still under development and very minimal information is available about the project, it looks like it is being backed by Google. You can find some information here.


More Tools

I have discussed a selected set of 10 examples above but there are many more like these. I’ll briefly name a few of them here and you can explore further if this isn’t enough to whet your appetite:

  • MarketSwitch – This tool is more focussed on optimization rather than predictive analytics
  • algorithms.io – This tool works in the domain of IoT (Internet of Things) and performs analytics on connected devices
  • wise.io – This tool is focussed on customer handling and ticket system analytics
  • Predixion – This is another tool which works on data collected from connected devices
  • Logical Glue – Another GUI based machine learning platform which works from raw data to deployment
  • Pure Predictive – This tool uses a patented Artificial Intelligence system which obviates the part of data preparation and model tuning; it uses AI to combine 1000s of models into what they call “supermodels”
  • DataRPM – Another tool for making predictive models using a GUI and no coding requirements
  • ForecastThis – Another proprietary technology focussed on machine learning using a GUI
  • FeatureLab – It allows easy predictive modeling and deployment using GUI

If you’re hearing these names for the first time, you’ll be surprised (like I was :D) that so many tools exist. But the good thing is that they haven’t had a disruptive impact as of now. But the real question is will these technologies achieve their goals? Only time can tell!


End Notes

In this article, we have discussed various initiatives working towards automating various aspects of solving a data science problem. Some of them are in nascent research stage, some open-source and others being used in the industry with millions in funding. All of these pose a potential threat to the job of a data scientist, which is expected to grow in the near future.These tools are best suited for people who abhor programming & coding.

Do you know any other startups or initiatives working in this domain? Please feel free to drop a comment below and enlighten us!

You can test your skills and knowledge. Check out Live Competitions and compete with best Data Scientists from all over the world.

This is article is quiet old now and you might not get a prompt response from the author. We would request you to post this comment on Analytics Vidhya Discussion portal to get your queries resolved.


  • Mayur P says:

    Hi…How about Tibco Spotfire ? is it good to learn & follow ?

    • Aarshay Jain says:

      Hi.. I’m not sure about Tibco Spotfire. Just had a quick look. First look is good. I recommend searching for people on LinkedIn or quora who are using the tool and get feedback from them.

  • Kannan Chandrasekaran says:

    Why not Azure ML? that doesn’t required lot of coding effort, it is a cloud based machine learning platform. Drag & Drop to create the analytics/Machine learning model.

    we can create a workspace in Azure machine learning studio, its is free for exploration. Just a live/Hotmail/outlook account is required.

    • Aarshay Jain says:

      Thanks for pointing. Yes AzureML is another one. As mentioned in the article, its not an exhaustive list.

  • Sharddha says:

    So the question to be asked here is that why are companies still asking for only R & SAS or Python skills. Why are they not using the GUI tools?

    • Aarshay Jain says:

      I agree the penetration is less. I don’t think these tools are meant for the IT industry. These will infiltrate into domains like heathcare, finance which are still dominated by domain experts.
      Another thing is although the tools are there, but they have not reached their ultimate potential. I have downloaded the free versions wherever possible and found that this is lot of scope for improvement. But the thing is some of these are backed by big funding and they can grow over time and enhance their scope.

      • Brijesh J says:

        That’s actually a very good question. My guess is that it has more to do with reluctance to adopt any technologies (read software) and be more open to the general concept and what it can do. My experience in Silicon Valley is that most people are still fumbling around and trying out new new things. Dust has not yet settled, and when it settles it Python will still be around and ‘few’ of these software tools.

        • Aarshay Jain says:

          Thanks Brijesh for some first hand experience.. Totally agree with your thoughts..

      • Wasifur says:

        Even in domain intensive fields like Quantitative finance, Python is heavily used in pricing and risk calculation APIs for highly customized financial products. These flexibility are not available in the drag and drop tools.
        As far SAS is concerned, it has 200+ horizontal and vertical solutions which requires no programming. For example Customer intelligence, Cyber security, Credit risk, AML, ect. Also SAS has several drag and drop tools like E-Miner, Visual analytics ect. Most of the world largest commercial banks uses E-miner for credit scoring.

  • Himanshu says:

    The post is nice but a bit demotivating in the sense that the field of data science has also started moving towards automation 🙁
    Will the boom in this field turn down in near future?
    Please throw some light!

    • Aarshay Jain says:

      A pertinent questions but something that only time can tell. The thing is that nothing is eternal but if you keep your eyes open and if you’re ready to up your game as and when required, you should do fine 🙂

  • Asesh Datta says:

    Hi Aarshay,
    Compliments the kind of research you have done to compile this list. I am one of those who does not have much knowledge of current day programming. So I was fascinated to use at least one of those top ten. Please let me know what is the use of these tools in Indian job scenario. I know data analysis is the sexiest job of the decade starting 2016.
    Appreciate the future of Start Ups to come up with various types of Data Analysis Tools which does not require programming skills. Is there any training requirement to learn and use these tools?
    Appreciate your response.

    • Aarshay Jain says:

      Thanks Asesh. A very good question indeed. Unfortunately, I am not really the right person to answer these queries. I recommend searching for specific tools on LinkedIn and you’ll get people working on these. 🙂

  • Good compilation Aarshay. Thank you. Another one to add to the list is IBM Watson. It provides exploration, refinement and prediction in a drag & drop kind of mode without the requirement for coding. Though the list of ML algorithms is limited right now in the Watson Cloud version I think it will grow in the future.

    • Aarshay Jain says:

      Thanks Karthikeyan! I didn’t know IBM Watson also has such offerings. I’m sure IBM Watson will grow it further. Will check it out for sure 🙂

  • James says:

    Gosh, should i still stick to R ?

    i know the tools are scattered all over the industry, do you have a chart that shows the 4 quadrants on leadership products ?

    Honestly each tool seems to duplicate the functions used across them, each trying to out bid each other ?

    Perhaps we should stick to the big names that can fund products like google and microsoft ?

    • Aarshay Jain says:

      I’m sorry I don’t have such a chart. I however have some thoughts,
      1. I would not recommend leaving R. R would provide a good base and then using these tools will be just a matter of time.
      2. All tools are not exactly same. Some are differentiated on specific use cases. Some are more prominent in few particular industries. So unless you are focussed on an industry and you are confident that a particular tool is being used there, I don’t think learning any 1 tool would help.
      3. I have downloaded the free versions of RapidMiner and Trifacta. My general perception is that these have a learning curve involved. You’ve to get used to the interface and identify its merits and demerits.

      Hope this helps!

      • Shivendra Sharma says:


        Leaving R will be suicidal, considering that R is the ‘lingua franca’ of data science. It has only grown and will keep evolving and retain its position as the standard tool for data science. Python has just entered the realm, but it will never replace R at least because of its general programming purpose.

        As far as the GUI tools are concerned, they have very limited usage but are enjoyable to use. So I’m in agreement with Aarshay here. They might see very easy to use, but when you start getting deep inside, GUI programmes can become quite difficult. SPSS was one such case when I actually found R to be more clean and straightforward to use, honestly.

        • Shivendra Sharma says:

          By the way Aarshay, I would provide Rcommander as an addition but only for those who are experienced in R. It gives a fresh new usage to the otherwise 100% command line option for R users. It looks somewhat like excel and the best part is, executing every command in Rcommander provides the user with the exact code that involves. So its a really nice way to learn R’s nuts and bolts.

          • Aarshay Jain says:

            Thanks for sharing your thoughts Shivendra. I completely agree with you on this. I’ve never used Rcommander but should be a good option if you’ve used it 🙂

  • Hi Aarshay, thanks for a great article. I would add Automatic Business Modeler to this list. Automatic Business Modeler provides full automation of the essential, yet time-consuming activities in the predictive modeling process. These include: fast variable selection, finding interactions between variables, transformations of variables, and best model selection. The system requires no programming skills or advanced knowledge of model construction.

    • Aarshay Jain says:

      Thanks for the information. Yeah sure its another good tool. This is just a list of few tools to get the idea across that such tools exist. ABM would be another one 🙂

  • Tami C says:

    Do you have thoughts about Alteryx? I haven’t done a deep look into it, but it is on my short list and I would be interested in your perspective.

    • Aarshay Jain says:

      Didn’t come across it earlier. Had a quick look. It looks good. Thanks!

  • Manel Navarro says:

    What about KNIME? Visual flows, open source and multiple add ons available. It can include java and R snippets.

  • Nazly Santos says:

    Thanks for the big list of tools!
    Even if it is an expensive tool, IBM SPSS Modeler is great to do real data mining and text mining. We use it at my company with great results, and my colleagues (who are not used to programming) feel really good with it.
    Another tool I recently used was TIMi (http://www.anatella.com/html/timi-suite.html), which is very simply to use, and useful in CRM data analysis (it is also expensive, but the trial works fine).
    Just to mention them as they are not in the list, (even if it is more common for companies to purchase them than for individuals) 🙂

    • Aarshay Jain says:

      Thanks Nazly for sharing some first-hand experience.
      Agree that IBM SPSS Modeler must be a good tool. I probably would like to share a different though on your last point. Though IBM SPSS is a big brand, but I think the new tools are comparable and work as MLaaS (ML As a Service). So they would be much more competitive in terms of price and might give a tough competition in near future.

  • Brijesh J says:

    Also Moa from the Weka family.

  • suresh says:

    such an excellent blog.detailed explanation i recommend to all my students.wonderful information .one of the best blog for learners.recommended one our best regards from sbrtrainings

  • Thanks for the excellent review, Aarshay.

    We’re happy to be included, but had to smile at the title of your post.

    Saying DataRobot is for people who aren’t so good at programming is like saying spreadsheets are for people who aren’t so good at calculators.

    Most of our users are excellent programmers, in fact, many, like @twiecki say “(These models) could be done using scikit-learn, but (DataRobot) it is a huge time saver and produced results better than my own humble attempts.” See Thomas’ full comments here: https://www.datarobot.com/blog/using-machine-learning-to-predict-out-of-sample-performance-of-trading-algorithms/

    The platform does accelerate the work of both business analysts and expert data scientists by automating many routine math and coding tasks, but more importantly it takes data science to a new level by applying massive compute power to build and test thousands of models in parallel to very quickly discover and deploy the optimal model for each specific data science problem.

    Doing that simply isn’t possible manually… anymore than calculating the results of each cell on a spreadsheet would be… no matter how good you are at Reverse Polish Notation. ;-))

    Happy Modeling!

    • Aarshay Jain says:

      Hi Kirby,

      Thanks for sharing your thoughts. I totally agree with you. I think I did a bad job at putting forward my thoughts in the title statement.

      I would like to clarify that when I say “Tools for people not so good at programming”, I meant it as an advantage. Programming experts can definitely use both types of tools, ones which require that skill or others which don’t. But tools like DataRobot empower the non-programmers as well to be at par with expert coders. Like you said, similar tasks can be done using scikit-learn, but one needs to know python programming to do that. DataRobot is actually empowering someone who doesn’t know Python to make the same level of models in a few clicks, if not more.

      I hope this thought process is aligned with the ideology behind DataRobot. I don’t have a first-hand experience at using DataRobot, but I really appreciate your efforts towards automating some aspects of data science and I would definitely use your platform whenever I get an opportunity. 🙂


  • Aarshay,

    You’re correct – data savvy business people can definitely use DataRobot, it’s not just for advanced data scientists.

    The process is dead simple: 1) upload data, 2) specify what you want to predict, and 3) press GO. After that, you can explore features of the data, observe various models competing on the leaderboard, and a (very) short time later explore the leading models, choose one, and deploy it into production. I am no data scientist, and even I can get enormous value out of the platform.

    People who are good at math and programming can get even more value out of it, and perhaps can appreciate even more how difficult it would be to try to replicate this process using more traditional approaches.

    I hope you get a chance to try DataRobot soon, virtually everyone I know who’s experienced it first hand has become an instant fan.


    • Aarshay Jain says:

      It sounds pretty cool. I’m sure it’ll be an amazing experience.

      Thanks for your thoughts!


  • Prasanna says:

    Tool No. 4 is Google Cloud Prediction API, and not Could. Just a minor typo in an otherwise very nice review.

  • Bijay says:

    Aarshay – Thanks for sharing the list of ready to go tools.Its quick snapshot of what can be experimented with.

    For existing Microsoft stack community ,Azure ML is also picking up fast ,especially when too much domain knowledge is required.Its making data analyst or domain experts’ job easy when data is more structured and all you need to do is uploaded over Azure and run different features/ML etc.

    Definitely Azure ML taking the bigger pie out of “open source” data scientist work areas.


    • Aarshay Jain says:

      Thanks for sharing your thought. Haven’t worked on Azure, but its definitely a competitor in the rat race 🙂

  • Azhar says:

    Hello Folks, What about the Microsoft’s SSIS and SSAS and SAP’s BO and BODS.
    I mean whether these tools too fall under the data science category.
    if not then what make the difference ?

    • Aarshay Jain says:

      Azhar, I’m sorry I haven’t used these tools. Maybe others can share their thoughts. It might also be a good idea to start a discussion thread on discuss.analyticsvidhya.com!

  • John Taveras says:

    I would add a newcomer, Easy Regression at http://www.easyregression.com. Currently it has three offerings — regression, logistic regression and clustering — and provides a nice and simple interface for building models.

  • Alivia Smith says:

    The thing about some of these tolls is that really blackbox: you put data in and it spits predictions out. This sounds miraculous but in my experience it’s hard to put into production and leads to operational teams who don’t understand where your scores come from so they won’t use them.

    Have you heard of Dataiku though? [full disclosure, I work there]. We believe that it’s important to keep control of your data and understand what happens to it. So our tool has a GUI that allows people with no coding experience to click along and clean data, and deploy models (we integrate scikitlearn and mllib algorithms). It also allows “clickers” to collaborate with coders who work in python, r, sql, hive, pyspark etc, so you have a project that’s production ready. That way you can track all the different steps of the process, iterate on them, understand where your scoring come from, and monitor it.

    Check it out and let me know what you think: http://www.dataiku.com/dss/

  • lalthan says:

    Revolution Analytics actually had a partnership with Alteryx where all the data blending/data exploration etc would be seamlessly integrated right out of their drag and drop workflow and the heavy work load would be done by Revolution Analytics on the background with all the R libraries and scripts available out of the box without manually requiring to write R codes. Sadly, I believe this partnership had stopped after Microsoft took over Revolution Analytics.

  • Go Mavankal says:

    Great thread, so much information, just saying thanks. I just started a graduate school program in Data Science and was assured to learn that the R programming I am learning is not just yet another tool, but worth the effort to master.

  • prachi patil says:

    Hi Aarshay, thanks for a great article.
    9 Predictions That Would Decide The Future of Java Programming

    In this article, we will focus our attention on the predictions related to the world of programming. Obviously, our attention will be on Java predictions. To learn Java programming, take the help ofJava programming course in Pune.

    Guess what?

    These predictions are well made and almost certain to come out true!

    So, let us begin with them….

    1. Smartphones will do all but voice calls:

    2. Javascript will rule:

    3.Bosses will turn ruthless:

    4. Databases will grow in size:

    5. Video curbs the HTML star:

    6. Binary protocols would emerge:

    7. Rest continues to rule:

    8. PHP will fight against Node.js:

    9. Almost all would have programming knowledge:

    Keeping these predictions in mind, look for best Java training institutes in Pune.