Learn everything about Analytics

Home » Getting started with Julia – a high level, high performance language for computing

Getting started with Julia – a high level, high performance language for computing

Learning new tools and techniques in data science is sort of like running on treadmill – you have to run continuously to stay on top of it. The minute you stop, you start falling behind.

As part of this learning, I continuously look out for new developments happening in new tools and techniques. It was in this desire to continuously learn that I came across Julia about a year back. It was in very early stages then – it still is!

But, there is something special about Julia, which makes it a compelling tool to learn for all future data scientists. So, I thought to write a few articles on it. This is first of these articles, which provides the motivation to learn Julia, its installation, current packages available and ways to become part of Julia community.

 

What is Julia?

687474703a2f2f6a756c69616c616e672e6f72672f696d616765732f6c6f676f5f68697265732e706e67

Julia is a high-level, high-performance dynamic programming language for technical computing, with easy to write syntax. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.

 

Why another programming language?

The simplest way to understand its power is to think of it as a language which has a wide range of statistical packages like R, it is easy to write and learn like Python and has execution speed similar to C / C++. If you are still not convinced about what I have mentioned, have a look at benchmarks of a few common benchmarks below:
Julia_benchmark

C compiled by gcc 4.8.2, taking best timing from all optimization levels (-O0 through -O3). C, Fortran and Julia use OpenBLAS v0.2.12. The Python implementations of rand_mat_stat and rand_mat_mul use NumPy (v1.8.2) functions; the rest are pure Python implementations.

 

A Summary of Features in Julia

Some of the important features to highlight from data science capabilities are:

A more comprehensive list of features can be accessed here

 

Installation of Julia

Now that you might be raring to give Julia a try for all the promises made above, let me quickly walk through various options to test drive your new sedan (which has sports car like acceleration):

  • Option 1: Try Juliabox in browser – The simplest of option – no setup required. Just go to Juliabox, sign in using Google (sorry, if you don’t have a Google account – try the next version) and your instance is ready to fire.

Juliabox

  • Option 2 – Use an IDEJuno seems to be the best IDE available right now. Sadly, JuliaStudio is no longer supported. The best way to install it is to download the combo package from Julia site itself.
  • Option 3 – Using Command line – If you are the hardcore programmer, who can’t think of a programming language without a command line, don’t worry! There is an option for you as well. You can download the package here.
  • Option 4 – Using iJulia notebooks – If you are a Python explorer and have used iPython for your interactive data exploration – here is an awesome news. iJulia notebooks are equally awesome and carry over similar interface. In order to install iJulia, you need to install iPython first, then install Julia 0.3 or later. Next start Julia and add package “IJulia” and start using it. You can find more details here.

The installation was pretty simple and straight forward. I have tried Juliabox as well as Juno. Option 1 and 2 come with a few demo examples before hand. You can just follow the comments (starting with #) to understand and give the code a test run.

 

A few important packages

There are a total of 610 packages on Julia as on date (9th July 2015). If you filter out packages for which tests have failed or which have not been tested, you are only left with 381 packages. Among these I have filtered out the ones related to data science and have more than 15 stars. That leaves us with the following packages:

PackageDescriptionVersionStars
BackpropNeuralNetA neural network in Julia0.0.318
BokehBokeh Bindings for Julia0.1.026
BoltzmannRestricted Boltzmann Machines in Julia0.1.019
CalculusCalculus functions in Julia0.1.846
ClusteringA Julia package for data clustering0.4.033
ConvexA julia package for disciplined convex programming.0.0.6108
CppUtilities for calling C++ from Julia0.1.018
DataArraysData structures that allow missing values0.2.1621
DataFrameslibrary for working with tabular data in Julia0.6.7206
DataFramesMetaMetaprogramming tools for DataFrames0.0.133
DataStructuresJulia implementation of Data structures0.3.1052
DecisionTreeDecision Tree Classifier and Regressor0.3.836
DistancesA package for evaluating distances(metrics) between vectors.0.2.021
DistributionsA package for probability distributions & associated functions.0.7.4101
DSPFilter design, periodograms, window functions, and other digital signal processing functionality0.0.832
FunctionalCollectionsFunctional and and persistent data structures for Julia0.1.234
GadflyCrafty statistical graphics for Julia.0.3.13684
GeneticAlgorithmsA lightweight framework for writing genetic algorithms in Julia0.0.386
GLMGeneralized linear models in Julia0.4.678
GLMNetWrapper for fitting Lasso/ElasticNet GLM models using glmnet0.0.423
GraphsWorking with graphs in Julia0.5.590
HDF5Saving and loading Julia variables0.4.1865
HypothesisTestsHypothesis tests for Julia0.2.916
ImagesAn image library for Julia0.4.3973
JuMPModeling language for Mathematical Programming (linear, mixed-integer, conic, nonlinear)0.9.2162
MachineLearningJulia Machine Learning library0.0.337
MambaMarkov chain Monte Carlo (MCMC) for Bayesian analysis in julia0.4.1144
MarkdownMarkdown parsing for Julia0.3.021
MatchAdvanced Pattern Matching for Julia0.1.329
MixedModelsA Julia package for fitting (statistical) mixed-effects models0.3.2241
MLBaseA set of functions to support the development of machine learning algorithms0.5.141
MochaDeep Learning framework for Julia0.0.8297
MultivariateStatsA Julia package for multivariate statistics & data analysis (e.g. dimension reduction)0.2.121
NLoptPackage to call the NLopt nonlinear-optimization library from the Julia language0.2.131
OpenStreetMapJulia OpenStreetMap Package0.8.120
OptimOptimization functions for Julia0.4.2116
OrchestraHeterogeneous ensemble learning for Julia.0.0.527
PGMA Julia framework for probabilistic graphical models.0.0.125
PyCallPackage to call Python functions from the Julia language0.8.1183
RCallEmbedded R within Julia0.2.116
RDatasetsJulia package for loading many of the data sets available in R0.1.234
RegressionAlgorithms for regression (e.g. linear / logistic regression)0.3.217
RifJulia-to-R interface0.0.1247
StatsBaseBasic statistics for Julia0.6.1557
StreamStatsCompute statistics over data streams in pure Julia0.0.227
TimeSeriesTime series toolkit for Julia0.5.1037

P.S. There is a lot of development happening on the language and the libraries. So this can change very quickly.

 

A few things to note:

  • Gadfly looks to be the most popular package. This might well be because it is being used as a showcase library across all the products in the ecosystem
  • The core data science libraries look more evolved than some of the other libraries. Mocha for DeepLearning, Orchestra for optimization, DataFrames or distributions are all on more evolved version comparatively

 

How to install & use a package?

Installing and using a package in Julia is dead simple. If you want to install / add a package, simply type this in your programming interface

Pkg.add("Gadfly")

This will install the package as well as its dependencies.

 

Once the package is installed, you can load it simply by calling “using”

using Gadfly

Simple!

 

The Julia ecosystem:

Julia is supported by a close knit community of developers. Here are a few mailing lists, you can be a part of:

  • julia-news – for important announcements, such as new releases.
  • julia-users – discussion around the usage of Julia. New users of Julia can ask their questions here.
  • julia-stats – special purpose mailing list for discussions related to statistical programming with Julia. Topics of interest include DataFrame support, GLM modeling, and automatic generation of MCMC code for Bayesian models.
  • julia-opt – discussions related to numerical optimization in julia. This includes Mathematical Programming (linear, mixed-integer, conic, semi-definite, etc.), constrained and unconstrained gradient-based and gradient-free optimization, and related topics.

In addition to these newsletter, you can also look at juliabloggers.com . The site looks like a developing ecosystem as of now though.

 

End Notes

I hope that you have got a good overview of this powerful language under development. I was pretty excited when I saw it first and I continue to check this language for new developments closely. In the next articles to come, we will understand the data structured available in Julia, its interface with other languages e.g. Python and solve one of the case studies using Julia to understand its power.

What do you think of Julia? Are you all set to give it a try? Does the future excite you? Do let us know your thoughts through comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play
This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved

One Comment







Download Brochure






Download Brochure