Back in December 2017, the Google Brain team revealed it had discovered 2 new planets by applying Astronet – it’s deep neural network model for working on astronomical data. It was a monumental discovery that went to show the far-reaching impacts of machine learning in today’s world.
Now, Google Brain has released the entire code that went into making that technology and they’ve made it available for everyone. The model is based on a convolutional neural network (CNN) and you can see an outline of Google’s model in the below image:
Note that you will require the below packages to work with the data:
NASA’s Keplar space telescope of course. It has been scanning the Milky Way for years collecting and analyzing data about planets and stars. The dataset generated from all those years is a data scientists’s dream – plenty of outliers making for noisy data and enough unknown elements to keep the best in the business guessing.
But at the end of the day, the neural network requires human labeled data to understand, flag and verify the planets and stars. The research team have only worked on 600 out of 200,000 stars observed by Keplar.
You can view Google’s official blog post about it here and access the entire code on GitHub here.
The logic Google has given behind making the code available for everyone is that they hope this will speed up the process of improving the accuracy of their convolutional neural network. Note that it will take a bit of time to download the data because of it’s gigantic volume.
This is one of the most fascinating datasets a data scientist could work with. It does require domain knowledge but the vast nature of the problem and with Google’s walk-through on their GitHub page, it could be enough to get you started. So go ahead and dive in!
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Interesting! But for the data, I suppose the model to work as well as on Kepler space telescope images or on earth observatory images.
Agreed. Google open sourcing its source code can help reduce the groundwork required to work on the project from scratch, and instead helps apply a similar technique to a much wider ranged dataset.