19 Data Science and Machine Learning Tools for people who Don’t Know Programming

Aarshay Jain 31 May, 2020 • 10 min read

This article was originally published on 5 May, 2016 and updated with the latest tools on May 16, 2018.

Introduction

Programming is an integral part of data science. Among other things, it is acknowledged that a person who understands programming logic, loops and functions has a higher chance of becoming a successful data scientist. But, what about those folks who never studied programming in their school or college days?

Is there no way for them to become a data scientist then?

With the recent boom in data science, a lot of people are interested in getting into this domain. but don’t have the slightest idea about coding. In fact, I too was a member of your non-programming league until I joined my first job. Therefore, I understand how terrible it feels when something you have never learned haunts you at every step.

data science tools for non programmers

The good news is that there is a way for you to become a data scientist, regardless of your programming skills! There are tools that typically obviate the programming aspect and provide user-friendly GUI (Graphical User Interface) so that anyone with minimal knowledge of algorithms can simply use them to build high quality machine learning models.

Many companies (especially startups) have recently launched GUI driven data science tools. I have tried to cover a few important ones in this article and provided videos as well, wherever possible.

Note: All the information provided is gather from open-source information sources. We are just presenting some facts and not opinions. In no manner do we intent to promote/advertise any of the products/services.

 

List of Tools

RapidMiner

RapidMiner (RM) was originally started in 2006 as an open-source stand-alone software named Rapid-I. Over the years, they have given it the name of RapidMiner and also attained ~35Mn USD in funding. The tool is open-source for old version (below v6) but the latest versions come in a 14-day trial period and licensed after that.

RM covers the entire life-cycle of prediction modeling, starting from data preparation to model building and finally validation and deployment. The GUI is based on a block-diagram approach, something very similar to Matlab Simulink. There are predefined blocks which act as plug and play devices. You just have to connect them in the right manner and a large variety of algorithms can be run without a single line of code. On top of this, they allow custom R and Python scripts to be integrated into the system.

There current product offerings include the following:

  1. RapidMiner Studio: A stand-alone software which can be used for data preparation, visualization and statistical modeling
  2. RapidMiner Server: It is an enterprise-grade environment with central repositories which allow easy team work, project management and model deployment
  3. RapidMiner Radoop: Implements big-data analytics capabilities centered around Hadoop
  4. RapidMiner Cloud: A cloud-based repository which allows easy sharing of information among various devices

RM is currently being used in various industries including automotive, banking, insurance, life Sciences, manufacturing, oil and gas, retail, telecommunication and utilities.

 

DataRobot

DataRobot (DR) is a highly automated machine learning platform built by all time best Kagglers including Jeremy Achin, Thoman DeGodoy and Owen Zhang. Their platform claims to have obviated the need for data scientists. This is evident from a phrase from their website – “Data science requires math and stats aptitude, programming skills, and business knowledge. With DataRobot, you bring the business knowledge and data, and our cutting-edge automation takes care of the rest.”

DR proclaims to have the following benefits:

  • Model Optimization
    • Platform automatically detects the best data pre-processing and feature engineering by employing text mining, variable type detection, encoding, imputation, scaling, transformation, etc.
    • Hyper-parameters are automatically chosen depending on the error-metric and the validation set score
  • Parallel Processing
    • Computation is divided over thousands of multi-core servers
    • Uses distributed algorithms to scale to large data sets
  • Deployment
    • Easy deployment facilities with just a few clicks (no need to write any new code)
  • For Software Engineers
    • Python SDK and APIs available for quick integration of models into tools and softwares.

 

BigML

BigML provides a good GUI which takes the user through 6 steps as following:

  • Sources: use various sources of information
  • Datasets: use the defined sources to create a dataset
  • Models: make predictive models
  • Predictions: generate predictions based on the model
  • Ensembles: create ensemble of various models
  • Evaluation: very model against validation sets

These processes will obviously iterate in different orders. The BigML platform provides nice visualizations of results and has algorithms for solving classification, regression, clustering, anomaly detection and association discovery problems. They offer several packages bundled together in monthly, quarterly and yearly subscriptions. They even offer a free package but the size of the dataset you can upload is limited to 16MB.

You can get a feel of how their interface works using their YouTube channel.

 

Google Cloud AutoML

Cloud AutoML is part of Google’s Machine Learning suite offerings that enables people with limited ML expertise to build high quality models. The first product, as part of the Cloud AutoML portfolio, is Cloud AutoML Vision. This service makes it simpler to train image recognition models. It has a drag-and-drop interface that let’s the user upload images, train the model, and then deploy those models directly on Google Cloud.

Cloud AutoML Vision is built on Google’s transfer learning and neural architecture search technologies (among others). This tool is already being used by a lot of organizations. Check out this article to see two amazing real-life examples of AutoML in action, and how it’s producing better results than any other tool.

 

Paxata

Paxata is one of the few organizations which focus on data cleaning and preparation, and not the machine learning or statistical modeling part. It is an MS Excel-like application that is easy to use. It also provides visual guidance making it easy to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. Like the other tools mentioned in this article, Paxata eliminates coding or scripting, hence overcoming technical barriers involved in handling data.

Paxata platform follows the following process:

  1. Add Data: use a wide range of sources to acquire data
  2. Explore: perform data exploration using powerful visuals allowing the user to easily identify gaps in data
  3. Clean+Change: perform data cleaning using steps like imputation, normalization of similar values using NLP, detecting duplicates
  4. Shape: make pivots on data, perform grouping and aggregation
  5. Share+Govern: allows sharing and collaborating across teams with strong authentication and authorization in place
  6. Combine: a proprietary technology called SmartFusion allows combining data frames with 1 click as it automatically detects the best combination possible; multiple data sets can be combined into a single AnswerSet
  7. BI Tools: allows easy visualization of the final AnswerSet in commonly used BI tools; also allows easy iterations between data preprocessing and visualization

Praxata has set its foot in financial services, consumer goods and networking domains. It might be a good tool to use if your work requires extensive data cleaning.

 

Trifacta

Trifacta is another startup with a heavy focus on data preparation. It has 3 product offerings:

  • Wrangler: A free stand-alone software. Allows up to 100MB of data
  • Wrangler Pro: An upgraded version of the above. It allows both single and multi-user and the data volume limit is 40GB
  • Wrangler Enterprise: The ultimate offering from Trifacta. It does not have any limit on the amount of data you process and allows unlimited users. Ideal for big organizations

Trifacta offers a very intuitive GUI for performing data cleaning. It takes data as input and provides a summary with various statistics by column. Also, for each column it automatically recommends some transformations which can be selected using a single click. Various transformations can be performed on the data using some pre-defined functions which can be called easily in the interface.

Trifacta platform uses the following steps of data preparation:

  1. Discovering: this involves getting a first look at the data and distributions to get a quick sense of what you have
  2. Structure: this involves assigning proper shape and variable types to the data and resolving anomalies
  3. Cleaning: this step includes processes like imputation, text standardization, etc. which are required to make the data model ready
  4. Enriching: this step helps in improving the quality of analysis that can be done by either adding data from more sources or performing some feature engineering on existing data
  5. Validating: this step performs final sense checks on the data
  6. Publishing: finally the data is exported for further use

Trifacta is primarily used in the financial, life sciences and telecommunication industries.

 

MLBase

MLBase is an open-source project developed by AMP (Algorithms Machines People) Lab at the University of California, Berkeley. The core idea behind this is to provide an easy solution for applying machine learning to large scale problems.

It has 3 offerings:

  1. MLlib: It works as the core distributed ML library in Apache Spark. It was originally developed as part of MLBase project, but now the Spark community supports it
  2. MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions
  3. ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included in MLI and MLlib

 

Auto-WEKA

Auto-WEKA is a data mining software written in Java, developed by the Machine Learning Group at the University of Waikato, New Zealand. It is a GUI based tool which is very good for beginners in data science. The best part about it is that it is open-source and the developers have provided tutorials and papers to help you get started. You can learn more about it in AV’s article.

It is primarily used for educational and academic purposes for now.

 

Driverless AI

Driverless AI is a magical platform for enterprises from h2o.ai that supports automatic machine learning. A 1 month trial version is available as a docker image at this link. All you have to do is using simple dropdowns select the files for train, test and mention the metric using which you want to track model performance. Sit back and watch as the platform with an intuitive interface trains on your dataset to give excellent results at par with a good solution an experienced data scientist can come up with.

These are some mindblowing features of Driverless AI

  • It supports multi GPU support for XGBOOST, GLM and K-Means and more which results in excellent training speeds even for large complex datasets
  • Automatic feature engineering, tuning and ensembling of a variety of models to produce highly accurate predictions
  • Great features for interpreting the model along with a panel for real time feature importance ranks during the training process

 

Microsoft Azure ML Studio

When there are so many big name players in this field, how could Microsoft lag behind? The Azure ML Studio is a simple yet powerful browser based ML platform. It has a visual drag-and-drop environment where there is no requirement of coding. They have published comprehensive tutorials and sample experiments for newcomers to get the hang of the tool quickly. It employs a simple five step process:

  • Import your dataset
  • Perform data cleaning and other preprocessing steps, if necessary
  • Split the data into training and testing sets
  • Apply built-in ML algorithms to train your model
  • Score your model and get your predictions!

 

MLJar

MLJar is a browser based platform for quickly building and deploying machine learning models. It has an intuitive interface and allows you to train models in parallel. It comes with built-in hyper-parameters search and makes deploying your model easier. MLJar offers integration with NVIDIA’s CUDA, python, TensorFlow, among others.

You only need to perform three steps to build a decent model:

  1. Upload your dataset
  2. Train and tune many Machine Learning algorithms and select the best one
  3. Use the best models for predictions and share your results

Currently the tool works on a subscription plan. It has a free plan as well with a 0.25GB dataset limit. It’s definitely worth checking out.

 

Amazon Lex

Amazon Lex provides an easy-to-use console for building your own chatbot in a matter of minutes. You can build conversational interfaces in your applications or website using Lex. All you need to do is supply a few phrases and Amazon Lex does the rest! It builds a complete Natural Language model using which a customer can interact with your app, using both voice and text.

It also comes with built-in integration with the Amazon Web Services (AWS) platform. Amazon Lex is a fully managed service so as your user engagement increases, you don’t need to worry about provisioning hardware and managing infrastructure to improve your bot experience.

 

IBM Watson Studio

How could we leave out IBM Watson from this list? It is one of the most recognizable brands in the world. IBM Watson Studio provides a beautiful platform for building and deploying your machine learning and deep learning models. You can interactively discover, clean and transform your data, use familiar open source tools with Jupyter notebooks and RStudio, access the most popular libraries, train deep neural networks, among a a vast array of other things.

For people just starting out in this field, they have provided a bunch of videos to ease the introductory phase. You can choose to take a free trial and check out this awesome tool by yourself. The above video guides you through how to create a project in Watson Studio.

 

Automatic Statistician

the automatic statistician

Automatic Statistician is not a product per se but a research organization which is creating a data exploration and analysis tool. It can take in various kinds of data and uses natural language processing at it’s core to generate a detailed report. It is being developed by researchers who have worked in Cambridge and MIT and also won Google’s Focussed Research Award with a price of $750,000.

It is still under active development but it’s one to keep an eye on in the near future. You can check out a few examples of how the final reports pan out here.

 

More Tools

  • KNIME – This tool is awesome for training machine learning models. It takes some getting used to initially but the GUI is awesome to get started with. It produces results on par with most tools and is free of cost as well
  • FeatureLab – It allows easy predictive modeling and deployment using GUI. One of the best selling points it has is automated feature engineering
  • MarketSwitch – This tool is more focussed on optimization rather than predictive analytics
  • Logical Glue – Another GUI based machine learning platform which works from raw data to deployment
  • Pure Predictive – This tool uses a patented Artificial Intelligence system which obviates the part of data preparation and model tuning; it uses AI to combine 1000s of models into what they call “supermodels”
  • ATH Precision – This tool by Analyttica has 600+ analytical functions inbuilt, which are available at a point of a click. ATH Precision also gives you R & Python codes for all 600+ functions

If you’re hearing a lot of these names for the first time, you’ won’t be the only one! The market for automated machine learning is expanding as more and more data is collected. Will they flood the market in the next few years? Time will tell. But these are excellent tools to assist organizations that are looking to start out with machine learning or are looking for alternate options to add to their existing catalogue.

 

End Notes

In this article, we have discussed various initiatives working towards automating various aspects of solving a data science problem. Some of them are in a nascent research stage, some are open-source and others are already being used in the industry with millions in funding. All of these pose a potential threat to the job of a data scientist, which is expected to grow in the near future. These tools are best suited for people who are not familiar with programming & coding.

Do you know any other startups or initiatives working in this domain? Please feel free to drop a comment below and enlighten us!

 

Aarshay Jain 31 May 2020

Aarshay graduated from MS in Data Science at Columbia University in 2017 and is currently an ML Engineer at Spotify New York. He works at an intersection or applied research and engineering while designing ML solutions to move product metrics in the required direction. He specializes in designing ML system architecture, developing offline models and deploying them in production for both batch and real time prediction use cases.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Mayur P
Mayur P 05 May, 2016

Hi...How about Tibco Spotfire ? is it good to learn & follow ?

Kannan Chandrasekaran
Kannan Chandrasekaran 05 May, 2016

Why not Azure ML? that doesn't required lot of coding effort, it is a cloud based machine learning platform. Drag & Drop to create the analytics/Machine learning model.we can create a workspace in Azure machine learning studio, its is free for exploration. Just a live/Hotmail/outlook account is required.

Sharddha
Sharddha 05 May, 2016

So the question to be asked here is that why are companies still asking for only R & SAS or Python skills. Why are they not using the GUI tools?

Himanshu
Himanshu 05 May, 2016

The post is nice but a bit demotivating in the sense that the field of data science has also started moving towards automation :( Will the boom in this field turn down in near future? Please throw some light!

Asesh Datta
Asesh Datta 05 May, 2016

Hi Aarshay, Compliments the kind of research you have done to compile this list. I am one of those who does not have much knowledge of current day programming. So I was fascinated to use at least one of those top ten. Please let me know what is the use of these tools in Indian job scenario. I know data analysis is the sexiest job of the decade starting 2016. Appreciate the future of Start Ups to come up with various types of Data Analysis Tools which does not require programming skills. Is there any training requirement to learn and use these tools? Appreciate your response.

Karthikeyan Sankaran
Karthikeyan Sankaran 05 May, 2016

Good compilation Aarshay. Thank you. Another one to add to the list is IBM Watson. It provides exploration, refinement and prediction in a drag & drop kind of mode without the requirement for coding. Though the list of ML algorithms is limited right now in the Watson Cloud version I think it will grow in the future.

James
James 06 May, 2016

Gosh, should i still stick to R ?i know the tools are scattered all over the industry, do you have a chart that shows the 4 quadrants on leadership products ?Honestly each tool seems to duplicate the functions used across them, each trying to out bid each other ?Perhaps we should stick to the big names that can fund products like google and microsoft ?

Aleksandra Besińska
Aleksandra Besińska 06 May, 2016

Hi Aarshay, thanks for a great article. I would add Automatic Business Modeler to this list. Automatic Business Modeler provides full automation of the essential, yet time-consuming activities in the predictive modeling process. These include: fast variable selection, finding interactions between variables, transformations of variables, and best model selection. The system requires no programming skills or advanced knowledge of model construction.

Tami C
Tami C 06 May, 2016

Do you have thoughts about Alteryx? I haven't done a deep look into it, but it is on my short list and I would be interested in your perspective.

Manel Navarro
Manel Navarro 06 May, 2016

What about KNIME? Visual flows, open source and multiple add ons available. It can include java and R snippets.

Nazly Santos
Nazly Santos 06 May, 2016

Thanks for the big list of tools! Even if it is an expensive tool, IBM SPSS Modeler is great to do real data mining and text mining. We use it at my company with great results, and my colleagues (who are not used to programming) feel really good with it. Another tool I recently used was TIMi (http://www.anatella.com/html/timi-suite.html), which is very simply to use, and useful in CRM data analysis (it is also expensive, but the trial works fine). Just to mention them as they are not in the list, (even if it is more common for companies to purchase them than for individuals) :)

Brijesh J
Brijesh J 07 May, 2016

Also Moa from the Weka family.

suresh
suresh 09 May, 2016

such an excellent blog.detailed explanation i recommend to all my students.wonderful information .one of the best blog for learners.recommended one our best regards from sbrtrainings

Kirby Wadsworth
Kirby Wadsworth 09 May, 2016

Thanks for the excellent review, Aarshay.We're happy to be included, but had to smile at the title of your post.Saying DataRobot is for people who aren't so good at programming is like saying spreadsheets are for people who aren't so good at calculators.Most of our users are excellent programmers, in fact, many, like @twiecki say "(These models) could be done using scikit-learn, but (DataRobot) it is a huge time saver and produced results better than my own humble attempts." See Thomas' full comments here: https://www.datarobot.com/blog/using-machine-learning-to-predict-out-of-sample-performance-of-trading-algorithms/The platform does accelerate the work of both business analysts and expert data scientists by automating many routine math and coding tasks, but more importantly it takes data science to a new level by applying massive compute power to build and test thousands of models in parallel to very quickly discover and deploy the optimal model for each specific data science problem.Doing that simply isn't possible manually... anymore than calculating the results of each cell on a spreadsheet would be... no matter how good you are at Reverse Polish Notation. ;-))Happy Modeling!

Kirby Wadsworth
Kirby Wadsworth 09 May, 2016

Aarshay,You're correct - data savvy business people can definitely use DataRobot, it's not just for advanced data scientists.The process is dead simple: 1) upload data, 2) specify what you want to predict, and 3) press GO. After that, you can explore features of the data, observe various models competing on the leaderboard, and a (very) short time later explore the leading models, choose one, and deploy it into production. I am no data scientist, and even I can get enormous value out of the platform.People who are good at math and programming can get even more value out of it, and perhaps can appreciate even more how difficult it would be to try to replicate this process using more traditional approaches.I hope you get a chance to try DataRobot soon, virtually everyone I know who's experienced it first hand has become an instant fan.Kirby

Prasanna
Prasanna 09 May, 2016

Tool No. 4 is Google Cloud Prediction API, and not Could. Just a minor typo in an otherwise very nice review.

Bijay
Bijay 11 May, 2016

Aarshay - Thanks for sharing the list of ready to go tools.Its quick snapshot of what can be experimented with.For existing Microsoft stack community ,Azure ML is also picking up fast ,especially when too much domain knowledge is required.Its making data analyst or domain experts' job easy when data is more structured and all you need to do is uploaded over Azure and run different features/ML etc.Definitely Azure ML taking the bigger pie out of "open source" data scientist work areas.Thanks Bijay

Azhar
Azhar 11 May, 2016

Hello Folks, What about the Microsoft's SSIS and SSAS and SAP's BO and BODS. I mean whether these tools too fall under the data science category. if not then what make the difference ?

John Taveras
John Taveras 16 May, 2016

I would add a newcomer, Easy Regression at www.easyregression.com. Currently it has three offerings -- regression, logistic regression and clustering -- and provides a nice and simple interface for building models.

Alivia Smith
Alivia Smith 27 May, 2016

The thing about some of these tolls is that really blackbox: you put data in and it spits predictions out. This sounds miraculous but in my experience it's hard to put into production and leads to operational teams who don't understand where your scores come from so they won't use them.Have you heard of Dataiku though? [full disclosure, I work there]. We believe that it's important to keep control of your data and understand what happens to it. So our tool has a GUI that allows people with no coding experience to click along and clean data, and deploy models (we integrate scikitlearn and mllib algorithms). It also allows "clickers" to collaborate with coders who work in python, r, sql, hive, pyspark etc, so you have a project that's production ready. That way you can track all the different steps of the process, iterate on them, understand where your scoring come from, and monitor it.Check it out and let me know what you think: http://www.dataiku.com/dss/

lalthan
lalthan 02 Jun, 2016

Revolution Analytics actually had a partnership with Alteryx where all the data blending/data exploration etc would be seamlessly integrated right out of their drag and drop workflow and the heavy work load would be done by Revolution Analytics on the background with all the R libraries and scripts available out of the box without manually requiring to write R codes. Sadly, I believe this partnership had stopped after Microsoft took over Revolution Analytics.

Go Mavankal
Go Mavankal 11 Jun, 2016

Great thread, so much information, just saying thanks. I just started a graduate school program in Data Science and was assured to learn that the R programming I am learning is not just yet another tool, but worth the effort to master.

prachi patil
prachi patil 03 Oct, 2016

Hi Aarshay, thanks for a great article. 9 Predictions That Would Decide The Future of Java ProgrammingIn this article, we will focus our attention on the predictions related to the world of programming. Obviously, our attention will be on Java predictions. To learn Java programming, take the help ofJava programming course in Pune.Guess what?These predictions are well made and almost certain to come out true!So, let us begin with them….1. Smartphones will do all but voice calls:2. Javascript will rule:3.Bosses will turn ruthless:4. Databases will grow in size:5. Video curbs the HTML star:6. Binary protocols would emerge:7. Rest continues to rule:8. PHP will fight against Node.js:9. Almost all would have programming knowledge:Keeping these predictions in mind, look for best Java training institutes in Pune.

Comments are Closed

  • [tta_listen_btn class="listen"]