What is NASA Doing with Big Data? Check this Out

Within the time you read the above sentence, NASA could have collected 1.73 gigabytes of data from around 100 missions which are active currently. NASA doesn’t stop doing this and the rate of collection is growing in an exponential manner. So, managing this kind of data is an uphill task for them. But the data which NASA collects is highly precious and its significance is immense in NASA’s science and research. NASA is trying extremely hard to make this data as approachable and accessible as possible for their daily tasks, various predictions in the universe, and for the human well-being through its innovations and creativity.

In version 2.0 of their “Open Government Plan” in the year 2012, NASA discussed, but did not go deeply into the work they have been doing regarding “Big Data” and they believed that they have much more to explore in this field.

We all know what big data is and what its uses are. So, I don’t think there is any need to mention what really big data is and let’s move on with other topic.

NASA’s Big Data Challenge

We may think NASA’s big data challenge is an Earthly challenge but it is more than a stereotypical challenge. Most of the sets of big data are defined by a metadata which is significant, but these big data sets challenge the data management practice of current and future. Usually, NASA involves in missions where we get information flows continuously from spacecrafts in space, and on earth, much quicker than we can manage, store and understand it. NASA is having two types of spacecrafts. One is Deep Space Spacecraft and the other is Earth Orbiter. What deep space spacecraft does is, it sends the data back in the order of MB/s and earth orbiters also send back the data similar to deep space spacecraft but in the order of GB/s. NASA uses technology such as communication via optical laser to speed up the downloading of huge volumes of data to 1000 times faster. Today, NASA can’t handle this much of data and they are preparing for it. So, NASA is planning extensive missions that will handle 24TBs of data in a single day. If we consider a single mission, the data which it handles is 2.4 times the whole library of Congress.

NASA focuses on gathering the most important data from data rather than gathering everything as it is very expensive to transfer even a single bit down from spacecraft to NASA’s data centers. After the accumulation of data at the data centers, the big issue for NASA is data storage, management, visualization and analyzation. To give a rough idea what NASA deals with, Climate Change data Repositories’ size is estimated to increase to 230 petabytes by the end of 2030. Just for letting you know its vastness, in one year, all the letters delivered by US Postal service is equivalent to 5 petabytes.

Not only spacecrafts, but also online platforms, low-cost sensors and mobile devices are the sources of NASA’s data. In October 2012, an article in Harvard Business Review has put it this way “each of us is now a walking data generator”. For NASA, like many organizations, the scale of big data challenge seems extremely difficult to deal with.

As you can guess, the increasing volumes of data are not the only challenge NASA is facing. As the data increases in this manner, challenges such as transferring, indexing, searching and many more are getting exponentially increased. In addition to this, algorithms’ and instruments’ growing complexity, rate of increment of technology refresh, and declining budget environment, all these are significant in NASA’s approach to big data. It is fortunate that Federal Government is focusing much on the big data challenge. In March 2012, administration of Obama announced a “Big Data Research and Development Initiative”. This focuses on enhancing the techniques and tools required to glean, organize and access the discoveries from large quantities of digital data. Its goal is to modify the potential of government to utilise big data for biomedical and environmental research, education, national security and scientific discovery.

Current Approaches

NASA considers building new ways to visualizing, analyzing and interpreting the huge quantities of data as its highest priority. Within the government itself, there is a pressure to effectively handle this big data both from bottom up and top down. From the view of Mission Directories (Technology, Science, Human Space Exploration, Aeronautics and Operations), in version 2.0 of “Open Government Plan”, NASA figured out many of the big data approaches and activities.

World-Class examples indicating how NASA archives, stores, manages, visualizes, analyzes and makes effective use of the big data are as follows

Managing and Processing

Mission Data Processing and Control System (MDPCS) demonstrates the approach of NASA to processing and managing huge quantities of data. Recently, Curiosity Rover used this on Mars. MPCS incorporates with deep-space framework and in turn, Mars Reconnaissance Orbiter of NASA to deliver the data from and to curiosity and handle this raw data in real time. Previously, this whole process used to take hours, and even days to conclude. Flight Operations Team utilises Custom Data Visualizations which are built by the system.

Storage

NASA’s Goddard Institute for Space Studies and Global Modelling and Assimilation Office primarily use NASA Center For Climate Simulation (NCCS) which indicates the approach of Agency to big data storage. The main focus of NCCS is on weather and climate data and it currently contains the data with a size of 32 petabytes with a total capacity of 37 petabytes. 17-by-6-foot visualization wall, which is an advanced visualization tool is owned by NCCS. This tool provides a surface which is of high resolution so that scientists can present animated content, images and video from NCCS’s data.

Archiving and Distribution

Atmospheric Science Data Center (ASDC) whose area of focus is Earth Science and Planetary Data System (PDS) whose area of focus is planetary science, demonstrate two examples of how NASA approaches archiving and processing. ASDC is located at NASA Langley Research Center and it is responsible for distributing, archiving and processing the NASA Earth Science data. ASDC provides atmospheric data which is vital in knowing the processes and causes of global climate change and the result of human actions on the climate and incorporates climate data gathered over the years. PDS archives and presents the scientific data into a single website from NASA laboratory measurements, planetary missions and astronomical observations. It provides access to over 100 TB of space images, models, telemetry and everything concerned with planetary missions from the past 30 years.

Analysis

Pleiades supercomputer of NASA helps analyze the projects which are challenging from space weather and solar flare scenarios to comprehensive space vehicle designs. Pleiades has used recently to handle large quantities of star data collected from kepler spacecraft of NASA. It led to the discovery of planets which are of earth’s size in the Milky Way galaxy. About 1200 users across U.S depend on the system to work on calculations that are complex and large. Pleiades is also utilised to develop Bolshoi cosmological simulation which analyzes how galaxies and large-scale structures of the universe are evolved over billions of years.

Visualization

A virtual laboratory called NASA Earth Exchange (NEX) integrates data visualization, data system, models and algorithms, supercomputer and large amount of online data with collaborative technology and social network. Before NEX, scientists invested a lot of effort and time to produce high-end computational methods rather than focusing on scientific problems.  Now, scientists can utilise supercomputer for the visualization of Earth Science data sets as well as sharing and running modelling algorithms and collaborating on existing or new projects. Recently, the NEX environment has been used by a research team from the USA to adjoin a mosaic which is atmospherically correct, and get density of global vegetation at a resolution of 30 metres. The total processing of the composite containing 340 billion pixels has taken only a few hours on the Pleiades supercomputer enabling the team to experiment with new approaches and algorithms with ease. NASA has also invested in a number of knowledge-sharing and collaboration platforms for the Earth science community that combine workflow management, Earth system modelling, NASA  remote sensing data feeds, supercomputing to provides a holistic approach of our work for researchers.

Commercial cloud computing services

The way NASA is modernizing its approach to Big Data has been demonstrated by the Mars Science laboratory mission which makes use of cloud storage solutions which are commercially available and cloud computing. NASA migrated and engineered websites to Amazon Web Services and content management system in less than four months. MSL was largely dependent on applications which are mission-critical and that could with-stand the breakdown of about 10 data centers, when providing traffic of almost 150 Gigabits per second to a global community of general public, scientists and operators. A solution has been developed by the team that would download telemetry and raw images from Curiosity directly. All the images from Mars were delivered, uploaded, stored and processed from the cloud as the data streamed in. In the databases which are highly available and scalable, the data was catalogued and exposed to applications and users through a Restful interface. This enabled the content managers for the Mars websites to produce web-pages which are informative with powerful real-time images. This approach enabled NASA to deliver a dynamic content of 120 TB and a static content of 30 TB the first night, and meet the demands when over 8 million hits were requested of their websites in less than a minute. This also enabled the team to make the most of JPL Nebula and JPL Galaxy supercomputers. The supercomputers ran almost 200 Monte Carlo simulations each having a time span of 24-hours at a rate of 20 GB each during the mission.

Real World Application of what NASA is doing with Big Data

The incorporation of big data into NASA not only favours the government but also has real implications for you. One good real world example of how NASA makes use of its expertise in big data and in-turn influences your life is in the area of airline safety. NASA involves in investigating the data gathered from planes to examine the safety implications which will help with improvements in maintenance procedure of commercial airlines and successfully avoid failures in the equipment. Making use of algorithms which are advanced, the agency extracted relevant information with difficulty from large amount of data which is unstructured to help foresee and avoid problems of safety. With the use of open-source algorithm known as Multiple Kernel Anomaly Detection (MKAD), the agency recognised, in what way two uninterrupted data networks or streams are identical, and then examined them making use of single framework to catch patterns to automatically detecting precursors associated with events which are adverse while an airplane is in flight.

A Big Data Opportunity

From studying the solar plasma ejections based on real-time and supervising global climate change to making the most of large-scale engineering designs and modernizing the way we deal with mission operations, NASA is certainly a leader in the big data application. At NASA, scientists are experimenting with innovative approaches to gain control over this shifting environment and deal with many challenges it poses to government and the manner in which NASA does business. The opportunities for NASA in exploring the universe of big data seam limitless.

The actions NASA is taking to explore big data are outlined by the Open Government Plan. NASA has created a website data.nasa.gov as a starting point to involve with their data but this can be termed as a simple directory of unique and amazing data NASA provides. NASA is also leveraging to provide users to find relevant data of high quality and tools and applications which are easy to use.

The scientists of NASA have set a goal to “create new situations for much higher co-ordination towards big data opportunities of NASA and enhanced cooperation with other organizations” with the purpose of inspiring the citizens to use raw datasets and produce applications which are in connection with NASA’s mission. NASA also joined the Department of Energy’s Office of Science and National Science Foundation to state the presence of “Big Data Challenge”, a set of contests held on the TopCoder platform. Rivals are assigned with a task of assuming mobile applications that find new value concealed in the government information domains which are discrete and then indicating how they may be shared as cross-agency, universal solutions that exceed the limitations of individual silos. This can be treated as a new, fresh opportunity to work with NASA and help to form out of observations, fresh and unique advancements that are important to the success of government in the future. This describes the path-breaking work of NASA regarding the effective handling of big data and making the most of it. We can certainly say that big data can be of much more use to us if we manage it effectively. Big data also has huge job prospects as it is used in various organizations like NASA as explained above. People obtaining big data training are getting increased day-by-day due to its adoption in numerous top organizations around the world. So, taking training in big data and gaining expertise in it will unveil the road to a lot of job opportunities in various prestigious organizations across the world.