An Introduction to Julia for Data Analysis
This article was published as a part of the Data Science Blogathon.
Which language do we use when it comes to data analysis? Of course, Python, isn’t it? But there is one more language for data analysis which is growing rapidly. Some of you might guess the language – I am talking about Julia. I always like to learn something new. So when it comes to data analysis, of course, I will give it a try. After learning this language, I was pretty impressed. I hope you will also like this language after reading this article.
You may get some articles about Julia programming language in which they discuss the basics. But in this article, I will discuss only those points that are highly used when analyzing Julia.
So, without further ado, let’s get started.
The “Julia” Things
While learning Python, you will hear from the instructor – In Python, everything is an object. But in Julia, everything is generic functions – there is no existence of classes in Julia! Instead of class, they gave functions more powerful and provided new features. If you are familiar with data analysis, you will notice that we hardly use classes. But there is a huge usage of functions. As Julia is specifically created for Scientific Computing, Data Science, and Machine Learning, they don’t include the class. Julia has
Easy-to-use syntax like Python
Statistical capabilities like R, and
Fast Performance like C++
So you can expect some similar syntax of these 3 languages in Julia.
Now let’s talk about some inbuilt data structures and functions we use for data analysis.
Data Structures in Julia
After hearing the name ‘Array’, you will recall the arrays of C or C++ and be disappointed by thinking that – are we again gonna deal with that array of C where we have to implement every method? Don’t think about it. Although they named it Array, this is just a python list with some little tweaks. In Julia, we declare arrays just like we declare lists in python.
In the above code, we can also see that to print anything in the console, here we use the println() function.
Now here comes the interesting part. If you want to see the list type in Python, you will see the output below.
But for Julia, you will see something like this.
Though this data structure is called Array, we will see that Julia recognized it as a vector of strings when we want to see the array type. Vectors are the dynamic arrays in C++, remember? We already have a feature of C++ in Julia.
Now how to insert and delete an element from an array in Julia? Just like we did in the stack data structure in C++ – using push() and pop(). In C++, we call these two as an object, but in Julia we use this as a function with a little modification. See the code below.
Now you will ask why there is a bang operator after a function name. In Julia, there are two types of functions – Mutating and Non-Mutating. In the Mutating function, the function takes the arguments and modifies them. But in the Non-Mutating function, the function takes the arguments and returns a new result calculated by those arguments. We denote the mutating function by adding a bang operator after the function name, and the non-mutating function has no bang operator after the function name.
Here push!() and pop!() are Mutating functions as they are modifying the existing array instead of returning a new one. If we run the code above, we get the output below.
You can also access the arrays by denoting the array index in square brackets at the end of the array name. Just remember one thing – in python, array indexes start from 0. But for Julia, array indexes start from 1. For example, if you want to know the first element of the my_list array, print the value of my_list. It will give the output “Julia”.
You can also insert a value in a specific index by assigning a new value in a specific list index, just like in Python. Here is an example.
my_list = "Java"
In Python, we create dictionaries using curly braces or the dict() function. But in Julia, we use Dict(). Below is an example of a dictionary.
my_dict = Dict(1 => "one", 2 => "two", 3 => "three", 4 => "four") println(my_dict)
From the above code, you can see that there is a ‘=>’ sign between the key-value pair. In python, we use “:” between key-value pairs. If we print the dictionary, you will see the output below.
Hmm, the dictionary is not ordered in that case. If you want an ordered dictionary, you have to use OrderedDict from the OrderedCollections package. See this link.
You can retrieve a value of a dictionary by indexing it by the corresponding key. For example, if we want to print my_dict, we get the output “one” in that case.
println(my_dict) # prints "one"
To add a new entry, you have to index the dict by the desired key and assign a value with the = operator, just like in Python.
my_dict = "five" println(my_dict) # prints Dict(5 => "five", 4 => "four", 2 => "two", 3 => "three", 1 => "one")
You can delete a key from the dictionary by using the delete!() function.
delete!(my_dict, 5) # it removes the key 5 and its corresponding value
If you want to get the deleted value, you can use the pop!() function.
popped_value = pop!(my_dict, 5)
Remember how we glued two lists with the zip() function and then created a dictionary with it in Python? You can do this here too. See the example below.
In Python, we denote strings by single quotes or double quotes. But in Julia, we have to denote strings by double quotes and characters are denoted by single quotes. This feature is taken from C++. You can easily notice that. Here is an example of a string.
my_str = "Julia is an awesome language. I really like it."
You can also write multiline strings. See the example below.
my_string = " Bravo 6, Going Dark. "
In Julia, we can write multiline strings using one double quote, which we can’t do in Python. If you want to maintain the same convention of writing multiline strings in Python, you can use triple double quotes too.
my_str = """ All stations, This is Bravo 6. Get down. """
If you are well familiar with the Call of Duty game franchise, you will easily recognize whose quote is this. We also can concatenate two strings by using the ‘*’ operator.
str1 = "hello" str2 = "goodbye" println(str1 * str2) # prints hellogoodbye
If you want to give space between two strings while concatenating them. Use the join() function.
join(["hello", "goodbye"], "")
In Julia, I always want one more thing in Python – String Interpolation. See the example below.
Name = "Subhradeep Rang" println("My name is $Name") # prints “My name is Subhradeep Rang”
From the above code, you can see that we can pass a variable inside a string by adding a ‘$’ sign before the variable name. This feature is also available in Kotlin.
Broadcasting operators in Julia
In Python, we can do almost everything using lists. But we can’t do this in python lists when it comes to array operations. Here comes the NumPy array. But for Julia, this is not the case. They introduced Broadcasting operators. See the example below.
list1 = [1, 2, 3, 4] println(list1 .+ 2) # it prints [3, 4, 5, 6]
In the above example, we add 2 to every array element. If we want to do this in python, we have to use numpy arrays. But here, we are using broadcasting operators. For this, we have to add ’.’ before any operator. This works for the functions too, which we discuss next.
Now here comes the most interesting part. In Julia, we define the function differently. Here we don’t use brackets or any indents. See the example below.
function square(x::Int64) return x*x end
See, it is that simple. Here we end the function with the end keyword. Here we can also specify the argument type by double colon(::). Here we can also broadcast functions. If we want to find out the square of every element of the list1 that we previously used, we have to write the function just like below.
println(square.(list1)) # it will print [1, 4, 9, 16]
Interesting, right? I just like Julia for those awesome features. You can also write compact functions in Julia. If we want to write the square function as a compact function, we write it just like the one below.
square(x::Int64) = x*x
When you don’t have to write more lines in a function, you can use this. Now you can ask me – you are creating non-mutating functions till now. What about the mutating function? Of course, we will create that type of function. See the example below for a mutating function.
I think you already guessed the main function of this function. We can define a mutating function just like we did previously, but this time we are adding a bang(!) between the function name and the open parentheses. Thanks to the broadcasting operators, we don’t have to write for a loop. Here we want to add 1 to each element of an array. As this is a mutating function, we don’t have to return any value. Just pass the array, and boom! Your array is now modified.
Now a question may arise among beginners: which language do we have to learn first – Python or Julia? Now for beginners, I recommend learning Python first. Python is not only easy to understand but also highly used in the domain of Data Science and Machine Learning. If you already have a good grasp of Python, then you can try Julia. It will add a new feather to your skillset. After researching a little bit about this language, if I am not wrong, Julia will not be hidden for a long time.
It is impossible to include all the Julia features in one article. But if you want to learn all the stuff, follow this link. In this article, we covered the following:
- How to use the inbuilt data structures of Julia, i.e., arrays, dictionaries, etc.
- What is the Broadcasting Operator, and how to use it?
- Different types of functions, i.e., mutating and non-mutating functions, and their implementations in Julia.
- How to apply one function on a list of values by broadcasting.
Here we can also use other packages just like Python, but I am not including those in this article. I will talk about those packages later. Until then, stay tuned. And don’t forget to check out my other articles in AnalyticsVidhya.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.