November 9, 2016 The SpaceNet Challenge: help us to harness machine learning to make maps more current and complete
*This article first appeared on DigitalGlobe and was written by Todd Bacastow.
If you plan on competing in The SpaceNet Challenge join us on Youtube Live on 11/10 at 9 AM eastern for an expert discussion hosted by Topcoder and team members from SpaceNet. This discussion will help you better prepare for the upcoming challenge!
We recently launched SpaceNet on AWS, an open corpus of training data established with the goal of enabling advancements in machine learning using satellite imagery. To accelerate this initiative, we’re thrilled to announce The SpaceNet Challenge in collaboration with CosmiQ Works and NVIDIA, which is being facilitated by Topcoder. This is the first in a series of recurring open innovation competitions focused on developing next generation computer vision algorithms for automated mapping. With $34,500 of prizes, the first challenge is to tackle the automated extraction of 2D building footprints from imagery. The competition officially starts on 11/14, but you can pre-register today. It’s also worth exploring the resources page to learn more about the evaluation metric and check out the data visualizer.
We rely on maps in our daily life but often take their currency and completeness for granted. Online map data is lacking for many places such as in developing countries and rapidly growing urban areas. At DigitalGlobe, we collect approximately 3 millions of square kilometers of imagery per day from our constellation of satellites. If that sounds like a lot of data, it is – about 70 terabytes worth. If the average smartphone photo is about 3 megabytes, we take the data size equivalent of about 23.3 million smartphone photos each day. It’s impossible for humans to manually handle such massive amounts of data so our vision is to automatically extract map data from imagery one day as part of a standard processing step.
Having current and complete location data is especially important when a common understanding of place is needed. Uses for this data like security planning for major events or responding to natural disasters come to mind as I think about headlines from 2016 – the Rio Olympics and Hurricane Matthew. In both situations, current map data was lacking which created the need to quickly update maps to inform those on the ground and decision makers. OpenStreetMap and platforms like Tomnod use crowdsourcing to help improve maps by enlisting volunteers. Human mappers can quickly learn to tag observable objects such as damaged infrastructure or draw vectors – points, lines, and polygons – to represent map features including roads and buildings. In crowdsourcing campaigns, thousands, sometimes millions, of contributors lend their time and mapping skills.
Without question, crowdsourcing has a significant impact in creating map data. However, limitations of crowdsourcing to produce data include the need to first identify data gaps, assemble the crowd, scale data production, as well as ensure data consistency, speed, and accuracy. Advancements in machine learning, specifically computer vision, show promise in utilizing automation to extract features from the massive amount of imagery collected every day. In fact, much of the crowdsourced map data can also serve as training data for machine learning.
One machine learning approach is the use of convolutional neural networks (CNNs) called “deep learning.” Deep learning utilizes accelerated algorithm training enabled by GPUs. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has demonstrated how deep learning techniques improve speed and accuracy for identifying objects in photos over classical machine learning algorithms. There is still much room for improvement in applying such algorithms with satellite imagery – especially given the unique characteristics of satellite imagery such as multi-spectral bands. These bands provide additional data for the non-visible light spectrum – data that everyday photos lack. We’re eager to see how this data can be used to improve algorithm performance, so we’ve included 8-band multi-spectral data in the SpaceNet corpus for you to explore.
We look forward to seeing the innovative algorithms that will be developed through The SpaceNet Challenge. Following this challenge, we will share the source code for the winning algorithms in GitHub for the community to learn from, use, and improve. We hope this will be a spark to enable more innovative applications for machine learning utilizing geospatial data. These applications not only have the potential to impact how we create maps in the future, but more broadly, how we understand and interact with the world around us.