How to Interact with Operating System using Python and Jupyter Notebook
This article was published as a part of the Data Science Blogathon
Python is one of the most popular and frequently used languages of recent times, being used for various tasks such as data science, data analytics, web development, machine learning, and automating many tasks. In these processes, many tasks are operating system-dependent. I use Jupyter Notebook to use Python.
While working on a lot of projects, especially when I tried to automate one task or another, I felt such an urge, that only if I can talk to the Operating System directly from my Jupyter Notebook, instead of clicking on the mouse or tapping at my touch-pad frantically whenever I needed to create a new directory/folder, or deleting one. Or even for the simplest of the tasks, to know “Where am I”?
Oh. Don’t you worry, I am not going on a philosophical rant, all I wanted to know is which directory I am working on right now. Also, How to change that directory, without going out of Jupyter.
Accessing the Operating System or in other words, the flexibility to talk to my computer directly, from inside my Jupyter notebook gives me immense freedom to automate my tasks fully.
Without much ado, lets us see what is this magical tool, and why am I so excited about it.
Import Miscellaneous operating system interfaces
import os import platform
Small brief about what each one of these Modules does, and where to install them from?
The good news first: You need not install any of these Modules. They come pre-installed with python. So you just open your Jupyter notebook, and
import them the way as shown above, and you are good to go.
OS is a short form for Operating systems.
OS comes under Python’s standard utility modules. It helps to interact with the OS directly from within the Jupyter Notebook. It makes it possible to perform many operating system tasks automatically. This module in Python has functions for creating a directory, showing its contents, showing the current directory, and also functions to change the current directory, and many more.
Python provides an in-built module
platform that fetches system information.
platform comes under Python’s standard utility modules.
platform module retrieves information regarding the system(platform) on which you are currently working (or on which the program is currently being executed). By system info (or platform info) it means the information regarding the device you are using, your OS, version of the OS, node, and more. This module will be particularly useful to check the compatibility of your system with the python version installed on your system or whether the hardware specification of your system meets the requirement of the program or software or applications (or games) you are planning to use.
Note: Throughout this blog, I will be using “Folder” and “Directory” interchangeably. They mean the same, just called by different names in different operating systems.
Tell Me about my system
Let us start with getting out system information using
platform. We have already imported the module, and are ready to use it’s functions and methods.
uname_result(system='Windows', node='LP000007031003', release='10', version='10.0.17763', machine='AMD64', processor='AMD64 Family 23 Model 17 Stepping 0, AuthenticAMD')
Let us understand the output in brief.
The system is the Operating System Family. Returns the system/OS name, such as ‘Linux’, ‘Darwin’, ‘Java’, ‘Windows’.
Node Returns the computer’s network name (may not be fully qualified!). An empty string is returned if the value cannot be determined.
Release Returns the Operating system’s release, e.g. ‘2.2.0’ or ‘NT’ or 10. Reading it together with the system makes sense. Like here, it’s
Windows 10 (System Release).
version Returns the system’s release version, e.g. ‘#3 on degas’. An empty string is returned if the value cannot be determined.
Machine Returns the machine type(or in other words, Processor family), e.g. ‘i386’ or ‘AMD64’. An empty string is returned if the value cannot be determined.
Processor Returns the (real) processor name, e.g. ‘amdk6’. An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for machine().
What’s my python Version?
The above code returns exactly what it says. Noting more, nothing less.
The python version I have installed here is Python 3.7.6. For your code, you will get your respective version as output.
Now as we have enough information about our machine, system and python version, let us go to more depth.
Let us see what is our current working directory. The current working directory is set to the directory from which the script is executed.
Knowing what is the current working directory is useful, as we often refer to files in python. The file in the current working directory can be referred to and accessed directly, by just giving the file name. For files in other directories, you need to give the path for the file. The path can be either an absolute path (eg. C:UsersnilabhnishchhalPYTHON_BOOK_NILABHBlogsPython Modules Blogs) or a relative path (relative to the current working directory).
You may like to see this concept in action, in the Image processing Automation Blog at Analytics Vidya.
What’s My Current Working Directory (CWD)?
You may think of the Current Working Directory (CWD) as the folder where Python is operating. When a file is called by just its name (and not by path), python looks for it in the CWD. So name only referencing of the file will be successfully executed only if the file is in the current working directory. This is worth mentioning that, CWD is the folder where the python script is running. This is not the path where the python script is located.
'C:\Users\nilabhnishchhal\PYTHON_BOOK_NILABH\Blogs\Python Modules Blogs\OS'
The output above is your current working directory. If you for some reason, do not want to work in this directory, don’t worry. It’s not that you are stuck there forever or something. You can change it with ease using
OS the module as well.
But Why would you like to or need to change the current working directory?
If you want to bring some data file or image file from your computer to your python environment, so that you can do operations on it, you can import files using various methods from any directory on your computer. But for that, you need to give the full path of the file. That’s OK if you have to do that only once. But if you have to work in a particular directory, and create more directories inside, subdirectories and files as well, it’s better to change the current working directory for once, as it takes away the need to give a full path all the time.
Refer to my Blog on Image Processing using Python and you will appreciate the kind of effort saved when you have the directory you want to work in, as your current working directory.
How to change my CWD?
os.chdir('C:\Users\nilabhnishchhal\PYTHON_BOOK_NILABH\Blogs\Python Modules Blogs')
'C:\Users\nilabhnishchhal\PYTHON_BOOK_NILABH\Blogs\Python Modules Blogs'
Who has logged in?
This will give you the login username, who has logged in at present in the machine.
What’s there in my Directory?
Once you know the directories, now it’s time to learn how to get inside them. But before we see that, let’s ask ourselves, why would we like to do that?
There can be many reasons, but one use case which readily comes to my mind is this. You want to access all the text files in a particular folder, or in a number of folders, and merge their content. In order to do so, you first have to see what all files are there in the directory, and then get their names as a list (or any other data structure, though the list is the best option). And once you get to access the file names, you can use string methods to get your hands on the text files (or any other extension you are interested in). To have a taste of a similar scenario, visit my article on
Automate your Image manipulation using Python , where I used this method to access all the jpg files (image files) in a folder and converted them all to pdf files, in one go.
['blogs_folder_screenshot.png', 'blogs_folder_screenshot_2.png', 'blogs_folder_screenshot_3.png', 'blogs_folder_screenshot_4.png', 'blogs_folder_screenshot_5.png', 'blogs_folder_screenshot_6.png', 'blogs_folder_screenshot_child.png', 'blogs_folder_screenshot_grandchild.png', 'OS']
Running the above code without any path specified returns the files and directories in the current working directory. But that’s not a limitation. You can access any directory’s content by simply giving the path to the directory as an argument.
os.listdir(path = 'C:/Users/nilabhnishchhal/PYTHON_BOOK_NILABH/Blogs/')
['.ipynb_checkpoints', '0eb9d5af5e9d8b5837f12e453762e638.jpg', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON - Jupyter Notebook.pdf', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON.ipynb', 'Baops.gif', 'Boolean Logic.docx', 'Cloud Blogs', 'Image Processing Blogs', 'Link to all my blogs.txt', 'Memory Blogs', 'Number System Blogs', 'Python Modules Blogs', 'TDS Checklist.pdf', 'Useless.txt']
How to create a new Directory?
In order to automate your tasks, many times you would need to create a new directory. Sometimes in order to arrange things in order, and at times, to just temporarily store stuff (files, other directories, etc. )
OS module gives you an option to create your own directory, by specifying the path where you want to create it. So we are going to create one new directory named “Trial Folder” in the Blogs folder. The below image shows that there is no such folder in Blogs at present.
os.mkdir(path = 'C:/Users/nilabhnishchhal/PYTHON_BOOK_NILABH/Blogs/Trial Folder')
Now let us have a look at the same folder once again. The image is the screenshot of the folder view after the above code is run.
How to make Multiple Directories?
As we saw above, creating one directory is easy in python using
os.mkdir(). You will see here, making multiple nested directories are no more difficult well. But we use another function for that. i.e.
We will create another directory in the same folder, Blogs, by the name “Parent” and create its children and grandchildren directories, nested into each other. In order to do so, the last directory (also known as leaf directory) shall be written as such all previous ones exist. See the code and the screenshot after the result below.
This function makes the directory recursively. What it means is, while making the leaf directory(the last end of the path) if any intermediate directory is missing, this method will create them all. Hence in the following example, while creating the leaf directory ( named “Grand Child” here), the function recursively finds that the two previous directories namely “Child” and “Parent” are also non-existing. Hence, it creates all of them, recursively.
If you want to make nested directories, You can use the function
How to delete files?
To delete a file, use
os.remove("path"), where “path” is the path of the file to be removed. This function deletes only files, and if the path given is for a folder(directory), it will throw an error “IsADirectoryError”.
First let us see which are the files in a directory, and then delete the one we want to. Then again use the
listdir function to see the directory and its contents.
# show the contents of the directory os.listdir("C:\Users\nilabhnishchhal\PYTHON_BOOK_NILABH\Blogs")
['.ipynb_checkpoints', '0eb9d5af5e9d8b5837f12e453762e638.jpg', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON - Jupyter Notebook.pdf', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON.ipynb', 'Baops.gif', 'Boolean Logic.docx', 'Cloud Blogs', 'Image Processing Blogs', 'Link to all my blogs.txt', 'Memory Blogs', 'Number System Blogs', 'Parent', 'Python Modules Blogs', 'TDS Checklist.pdf', 'Trial Folder', 'Useless.txt']
Notice the “Useless.txt” file, which we are going to delete.
['.ipynb_checkpoints', '0eb9d5af5e9d8b5837f12e453762e638.jpg', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON - Jupyter Notebook.pdf', 'AUTOMATE_YOUR_IMAGE_PROCESSING_USING_PYTHON.ipynb', 'Baops.gif', 'Boolean Logic.docx', 'Cloud Blogs', 'Image Processing Blogs', 'Link to all my blogs.txt', 'Memory Blogs', 'Number System Blogs', 'Parent', 'Python Modules Blogs', 'TDS Checklist.pdf', 'Trial Folder']
Notice in the output above that the file which we deleted is no longer in the list.
How to delete the folder (directory)?
To delete any folder, you can use
os.rmdir("path") where the “path” is the path to the folder. However, you can delete only the empty folders using this method. This kind of safeguards against accidental loss of data. In case you want to delete the folder which has some files in it, you first need to delete the files using the
os.remove() method mentioned above.
You remember, we created a “Trial Folder” above. Let us delete that.
Note: You can not delete the Current working directory using this function. If you want to do that, first you have to change the current working directory using the method listed earlier in this article.
This function removes directories recursively(goes backward). Works like rmdir() except that, if the leaf directory(think of the end of the line, like our “Grand Child” folder) is successfully removed, removedirs() tries to successively remove every parent directory mentioned in the path until an error is raised (which is ignored, because it generally means that a parent directory is not empty). For example, os.removedirs(‘C:/Users/nilabhnishchhal/PYTHON_BOOK_NILABH/Blogs/Parent/Child/Grand Child’) will first remove the directory ‘Blogs/Parent/Child/Grand Child’, and then remove ‘Blogs/Parent/Child’ and then ‘Blogs/Parent’, as they are empty. Once it reached the “Blogs” directory, which is not empty, it stops.
Raises OSError if the leaf directory could not be successfully removed.
You can see in the screenshot below that the Parent directory and all its subdirectories are deleted.
How to Rename a File or Folder?
Rename the file or directory
new. If a file or folder named
new already exists, the operation will fail with an OSError.
Notice in the image of the bog folder above, there is an image file with a very long numerical unreadable name. Let us change that to a better understandable name for that file.
You can see in the image below that the name of the file is changed to “logic.jpg”.
There is yet another method to rename the file or folder,
os.replace() However, this is the same as
os.rename() for a folder but different for a file.
This function will rename the directory
new already exists as a directory, OSError will be raised.
This function will rename the file
new. But the difference between the directory and file is, that in the case of a file if a file with a name
new already exists, it will be replaced silently if the user has permission. The operation may fail if
new are on different filesystems.
In this article, we explored some of the capabilities of Python’s built-in os module. We also saw examples as to how this can be useful. Some more such task-based examples were also referred to in the “AUTOMATE YOUR IMAGE PROCESSING USING PYTHON” article. These functions and os module can be used in the world of Data Science and Analytics. It is important to understand that the os module has a lot more to offer, and based on your need, you can construct much more complex logic. This is wonderful to interact with the Operating system from inside your python code. It opens many doors. Isn’t It. Please let me know in the comments.
The implied learning in this article was, that you can use Python to make your daily task or your repetitive tasks easy, by automating them. There may or may not be straight forward solution to things, but if you are inclined to find it, there are enough resources at your disposal to find a way out. You can look at the mix and match the learning from my book, PYTHON MADE EASY – Step by Step Guide to Programming and Data Analysis using Python for Beginners and Intermediate Level.
About the Author: I am Nilabh Nishchhal. I like making seemingly difficult topics easy and write about them. Check out more at https://www.authornilabh.com/. My attempt to make Python easy and Accessible to all is “Python Made Easy”.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.