Click > to expand page tree items
- This line was added.
- This line was removed.
- Formatting was changed.
Feeling a little overwhelmed? Start here for everything you need to know about getting started with UN Datathon 2023.
You can also find more general information on the UN Datathon Essentials page
Welcome to UN Datathon 2023!
During the Datathon, participants will be offered data, tools and services from our four technology partners: Amazon Web Services (AWS), Esri, NetApp Ocean for Apache Spark, and Oblivious.ai. On this page, we have a brief summary of the resources available from our partners such as Amazon Sagemaker, ArcGIS Living Atlas, accessing and processing AIS data on Jupyter notebooks, and the tools you will use during the Privacy Enhancing Technologies (PETs) Challenge track. Further, each partner has generously curated more detailed documentation on these tools and services which you can find on the left-hand sidebar. If you see a ">", you can click on it to expand more pages below.
In addition to the technical resources, you will find vital information about the UN Datathon Theme - which your team's solution needs to align to - and other important general information about the Datathon. The Submission Guide guide will steer you through the process of submitting your projects, and frequently asked questions can provide clarity on common queries. We also invite you to join the Discord server for live interaction with our UN Datathon mentors, interact with other participants, and get tech support. If you are stuck during the Datathon, the mentors are here to help you unblock yourself.
In addition, you have easy access to recordings of all the training webinars and office hours provided by our partners, ensuring that you're well-equipped with the knowledge and support needed to excel in the competition. Finally, to ignite your creativity and problem-solving skills, we've compiled a list of suggested datasets that align with the UN Datathon Theme. Your journey starts here, and we look forward to witnessing the innovative solutions you'll bring to life during this event.
Credentials / How to log on
You can use the following entry points to connect to the platforms from our Technical partners during the Datathon:
You should have received your credentials information via the email that you registered for the Datathon with. Please reach out to us on Discord with your Team name and email address if you have not received them yet.
We have created accounts based on the email addresses for all those who registered. At your earliest possible convenience, please navigate to the Login screen and click on “Forgot Password?” to reset your password. If you need to register for an account, please click here and fill in your credentials. Please enter your Team Name exactly so it matches for all participants.
If you would like to work with AIS Data on NetApp Ocean for Apache Spark, please request your credentials as soon as possible using this form. We will create accounts for those who register and you will need to reset your password in the NetApp system here. We also maintain a simple Jupyter notebook system that you may use, accessible here.
- The PETs track main competition is hosted here: UN Datathon PETs Track Competition
- All registered team members will receive an email from Antigranular with their sign-in link.
- Team members should check that all their details are correct. Without this you will not have access to the secure enclave environment.
- Please check your junk email and if you have not received this email by end of Thursday please email firstname.lastname@example.org email@example.com
- The team will also be available available through discord for any tech issue and queries
- You can see all the steps for connecting to the server in this short video. We are listing them here for your convenience:
- After logging into the Antigranular platform you should go to the datasets section to access the PETs track dataset
- In your python code or notebook use the "!pip install antigranular" command to set up the private python library.
- You should copy your session details from the UN Datathon FAO dataset page on the Antigranular website
- Paste the session details into your code in order to access the dataset from the secure enclave.
- Further tutorials for working with the dataset can be found in the Oblivious office hours videos
AWS S3 bucket credentials
Each team leader will receive an email with a set of AWS credentials that can be accessed form the NetApp system or the Jupyter environment to store data. Your s3 buckets are only accessible from within the private network inside the clusters, so do not be frustrated that you cannot take data out of the system. The s3 buckets are mainly provisioned to facilitate the storage of AIS-related computations when you are within the AIS system.
Publicly Available Data Sets
The only requirement for the data sets that you can use for UN Datathon 2023 is that they must be publicly and openly available datasets and must be available for free during the period of the UN Datathon.
Please click here for a sample list of available datasets.
On this page:
Technical Tools and Resources
The UN Datathon and its partners are delighted to offer you a range of powerful tools and resources to enhance your project submissions, unlock your full potential, and turn your innovative ideas into reality. These resources have been graciously made available to all UN Datathon participants completely free of charge. Whether you're a novice or an expert, these tools are entirely optional, allowing you the freedom to work with the tools you are most comfortable with.
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud that includes infrastructure as a service (IaaS) and platform as a service (PaaS) offerings. AWS services offer scalable solutions for compute, storage, databases, analytics, and more. During the UN Datathon 2023, you will have access to the datasets available on the Registry of Open Data on AWS and many AWS services such as Amazon Athena, Amazon EMR, Amazon QuickSight, Amazon SageMaker, Amazon Simple Storage Service (Amazon S3), AWS Glue, and AWS Identity and Access Management (AWS IAM).
The Registry of Open Data on AWS offers over 450 datasets including datasets from Allen Institute for Artificial Intelligence (AI2), Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Open Data Dissemination Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative. The registry also provides usage examples for datasets listed in the registry. You can explore the catalog here.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.
Amazon QuickSight is a fast, cloud-powered business intelligence service that delivers insights to everyone in your organization. As a fully managed service, Amazon QuickSight lets you easily create and publish interactive dashboards that include machine learning (ML) insights.
Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you don't have to manage servers.
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile applications.
AWS Glue is a fully managed ETL (extract, transform, and load) AWS service. One of its key abilities is to analyze and categorize data. You can use AWS Glue crawlers to automatically infer database and table schema from your data in Amazon S3 and store the associated metadata in the AWS Glue Data Catalog.
With AWS Identity and Access Management (IAM), you can specify who or what can access services and resources in AWS, centrally manage fine-grained permissions, and analyze access to refine permissions across AWS.
Accessing AWS Services
For more detailed documentation on AWS services, check the Amazon Web Services (AWS) section of the wiki.
Esri is the global market leader in geographic information system (GIS) software, location intelligence, and mapping. For the UN Datathon, you have access to the ArcGIS system which includes ready-to-use spatial data, analysis and visualization tools, and web apps to share interactive outputs from your Datathon project. You'll also use an ArcGIS Survey123 form to submit your final entry to the judges.
The ArcGIS system can ingest data from web and local sources - documentation here on how to import your own data from a URL, file upload, or cloud location (such as AWS).
You can also use data in your project from the ArcGIS Living Atlas of the World. ArcGIS Living Atlas of the World is the foremost collection of geographic information from around the globe. It includes maps, apps, and data layers to support your work. Search for data by key word, or filter based on data type, topic, or geographic location. The topics and data types include administrative boundaries, environmental data, imagery, live data feeds, and much more.
ArcGIS Online Map Viewer
The ArcGIS Online Map Viewer enables users to perform analysis, visualize data and build compelling interactive maps, all from a web browser.
ArcGIS Online Web Apps
ArcGIS Online has a number of tools to create interactive web apps from your maps - descriptions of the tools and documentation to get started with each can be found here.
Here is a gallery of example dashboards, one of the app types you can configure in ArcGIS Online.
ArcGIS Pro is a full-featured professional desktop GIS application from Esri. With ArcGIS Pro, you can explore, visualize, and analyze data; create 2D maps and 3D scenes; and share your work to ArcGIS Online. Esri's desktop software is available for you to download once you are logged into ArcGIS Online. Follow the instructions here to download the software.
Introduction to ArcGIS Pro documentation is found here - there are additional sections of documentation on different aspects of the software that expand on the left side of the page. Given the time constraints of the UN Datathon, participants without previous experience using ArcGIS Pro may want to do their work in ArcGIS Online or ArcGIS Notebooks.
ArcGIS Notebooks provide a Jupyter notebook experience optimized for spatial analysis. Combine industry-leading spatial analysis algorithms with open-source Python libraries to build precise spatial data science models. Reduce time spent managing dependencies across data science ecosystems, and increase cross-team collaboration and transparency. Convey results with beautiful, interactive maps and apps for data storytelling that drives insight and action. Documentation to get started can be found here.
How to Access
Once you log in to ArcGIS Online you will see a tab at the top titled Notebook. If you choose to download ArcGIS Pro you can also run ArcGIS Notebooks within ArcGIS Pro.
ArcGIS StoryMaps transforms your analysis outputs into interactive content that informs and inspires. It makes it easy to explain complex topics related to your knowledge and experience. Bring your existing web maps, surveys, dashboards, and external web content into a memorable digital experience to add context to your analysis and provide interactive exploration of your datathon project.
How to Access
Within ArcGIS Online select the app launcher at the top right and click ArcGIS StoryMaps.
Follow the link to the UN Datathon ArcGIS Online organization and log in using SSO (single sign on) with the account you have created for the datathon, or you can create a new ArcGIS Online account. Access to ArcGIS Online will be limited to the duration of the hackathon, but StoryMaps created by teams and shared publicly will remain available for a year after the event if groups would like to use them for their portfolio.
For more detailed documentation on Esri services, check the Esri section of the wiki.
AIS Data on NetApp Ocean for Apache Spark
NetApp has a portfolio of leading data, application and storage solutions to help organizations manage applications and data across hybrid multi-cloud environments. During the UN Datathon you will have access to one of our CloudOps services called Ocean for Apache Spark. This service provides a heavily optimized serverless Spark-on-Kubernetes running in your AWS account.
There are several ways to access the platform and the data.
Option 1. Access through the UN Global Platform's Hackathon Notebook environment.
To launch a notebook, navigate to https://hackathon-notebooks.officialstatstics.org and login using your UNGP credentials. Select the option "NetApp: Ocean for Apache Spark" and click "start". This will create your workspace, and this may take up to 5-10 minutes during peak usage, even though we will try to optimize startup times to be far less than this. Once you've launched your environment, you should see a standard Jupyterlab environment. Each notebook you create runs on a separate server-like environment with its own fixed CPU and memory allocation, so be judicious of how many of these you open.
When you're done, please run "spark.stop()" in your notebook context to stop the backing server infrastructure in each notebook.
AIS data. All AIS data are available inside the cluster in an s3 bucket: "ungp-ais-data-historical-backup/exact-earth-data/transformed/prod/"
This bucket is only accessible from within the private network on the UN Global Platform.
The total volume of s3 data exceeds 15 TB. To analyze this properly, you will need to use Spark. If you attempt to load too much data into the memory of the Spark driver, it will fail unceremoniously and you will need to restart your analysis. This is not a bug, but a distinct challenge regarding the analysis of big data in a compute-limited environment.
Jupyter Lab integrated with Ocean for Apache Spark
VS Code IDE integrated with Ocean for Apache Spark
Not Spot console login so we need to show how to access it from.
Links just here for reference.
For more detailed documentation on NetApp services, check the AIS Data on NetApp Ocean for Apache Spark section of the wiki.
Please check back soon as additional documentation for the Privacy Enhancing Techniques (PET) Challenge track will be added soon! For now, check out the documentation below!
In recognition of the need to better understand the impact of shocks on agricultural livelihoods in food crisis contexts, the Food and Agriculture Organization of the United Nations (FAO) established the Data in Emergencies Information System. Driven by regularly collected primary data in food crisis countries, its objective is to inform decision-making in support of agricultural livelihoods in fragile and shock-prone environments. Since the launch of the DIEM Hub in June 2020, DIEM surveys have been completed in over 30 countries reaching approximately 150 000 households per year. At the center of the DIEM Information System is the DIEM-Monitoring System which performs regular, standardized and frequent household surveys.
FAO Data in Emergencies Hub page on the Datathon Wiki
For more detailed documentation on the PET challenge, check the Privacy Enhancing Techniques (PET) Challenge section of the wiki.
We have available to use StoryMaps and Survey123 tools to use for submission.
See Submission Guide for more information.
We also have a wealth of information contained in our UN Datathon Webinars which were organized by us and our partners.
See Webinars for more information.