New York, NY — Professionals in data science and related fields are converging at Google headquarters this weekend to volunteer for the DataKind NYC DataDive. They are using machine learning and other techniques to develop solutions for three organizations that help communities protect human rights and the environment.
DataKind is a global nonprofit organization that is "harnessing the power of data science in the service of humanity." Their DataDive is a weekend-long event where teams of data science volunteers work with nonprofit organizations to develop solutions that help them achieve their missions.
In this event, they are exploring how to use machine learning to track the real-time status of claims about harm to individuals or communities caused by internationally-financed projects, and how to automate processes to help organizations learn about new international development projects more quickly.
We will be there Friday evening through Sunday afternoon to interview participants and give updates. Elsevier has 10 volunteers, and the Elsevier Foundation is a sponsor along with the 11th Hour Project, American Airlines, Google Cloud and Teradata.
We will be posting updates here throughout the event. In addition, you can follow us on social media:
- On Instagram, follow @ElsevierConnect and search for #ElsevierLife.
- On Twitter, follow @ElsevierConnect and use the hashtag #DataDive.
5 lessons from my first DataDive
July 30, 2018
Check out my event recap: You too can be a sexy data unicorn — and other lessons from my first DataDive
And now for the results ...
Sunday, June 24, 12:30 pm
At the end of the DataDive, Project Champions from the three organizations presented the results of the projects. Each team accomplished what they set out to do, with more work needed to optimize their systems.
International Accountability Project: Developing project alerts
For IAP, Preksha Krishna Kumar led off by summing up a problem faced by countries around the world:
The problem is, banks, governments, big institutions, companies propose projects without the input of the local communities, and without the communities knowing that these projects exist.
To make sure communities are given notice of this development, IAP has an Early Warning System. This system collects information about proposed projects into a searchable database by scraping project data from the 13 major development finance institutions. IAP and its partners send summarized information and materials for a community-led response to those nearest the proposed projects before funding is approved.
IAP has a database of 7,000+ development projects ranging from those recently proposed to those that have been completed. Getting up-to-date information on these development projects is very difficult, and IAP would like to facilitate this process by using information generated from mainstream news sources. They can provide communities more timely information regarding development projects by linking projects to news articles, but this is an impossible manual task given the number of projects and the volume of information.
Additionally, IAP wants to add metadata to each article — information about country, sector and bank — so they can more efficient in providing relevant information to advocates in certain areas of the world or to those interested in certain sector topics.
This was the challenge for the DataDive.
The team created algorithms to automatically tag the articles with sector, bank and country, and developed methods for matching projects to articles by providing different methods for rank most likely matches. More development will be needed, but IAP reps said this was a fantastic start to developing a project matching algorithm.
Inclusive Development International: Follow the money — web-scraping to find financial pressure points
The challenge, as this team put it: "the gift of automation":
- Manually searching the databases is time-consuming and labor-intensive.
- Limits how many people IDI can help pursue accountability.
- We were a motley crew of 25 volunteers: students, R developers, scientists, data scientists and professional web scrapers.
- Goal: Write 10-14 web scrapers of funding databases and present it to IDI in a ‘Flask’ front-end.
They accomplished their goal and created a Research Scraping Tool.
And the results? IDI is much better equipped to uncover financial links between development banks and harmful projects. "We're going to help more people — thousands of more people — pursue accountability and justice," said IDI Research and Communications Director Dustin Roasa. He added that they plan to use the system immediately for cases in Kenya and Uganda.
As for their biggest “ah-ha” moment: "Each database has quirks, but the code can conquer those quirks."
Dustin said he was impressed with the collaboration. "When people finished early, they moved on to helping other people in the group."
Accountability Counsel: Monitoring and understanding local complaints about international development projects
The challenge: "AC had a lot of websites to scrape with scrapers that frequently broke, plus a bunch of complaint data that had not been thoroughly analyzed."
Srinivas Avireddy, a software development engineer at Amazon and one of three Data Ambassador for this project, described his team's key achievements:
- We were presented with the process of automating data collection from 8 Independent Accountability Mechanism (IAM) websites. 10 volunteers worked on 5 websites, and we successfully automated the process for 5 IAM websites.
- The time to automate the data collection was reduced by at least 50% for each of these websites.
- Another key problem was to automate the process of finding the updates made to the complaints. This would help Accountability Counsel move away from manual monitoring of individual complaints and have a centralized and automated process of finding updates.
- Data analysis was done for 2 datasets (historical complaints data and benchmarks data). Predictive modelling using R and Python was done to understand the reason behind ineligibility of complaints (33% of complains are ineligible). Several interesting plots (decision trees, time series, posterior probability estimation) were plotted.
Day 3: putting on the finishing touches
Sunday, June 24, 11:15 am
With Ed Sheeran piped in on the PA, participants are putting the finishing touches on their projects. Soon they will prepare their presentations. Here, Emily Yelverton, a project manager for DataKind, collected data from her teammates on Accountability Counsel project so she could prepare their presentation.
PS: This was breakfast:
"We all need to step up our data analysis skills"
Saturday, June 23, 3:45 pm
Jonathan Zimmerman has been using his analytical skills for years at Elsevier, currently as Associate Director of Customer Insights. But the work he's been doing at the DataDive is different. And that's one reason he's here. "It's about seeing if I could expand my horizons in data analysis," he said. "At Elsevier, we're really dealing with survey data from customers, and this is a whole different thing.
"With Elsevier becoming more and more focused on data analysis, we all need to step up our data analysis skills," he said. "Now, when I walk through the office, every single person is looking at data. That wasn't the case when I started working here."
Jhonny Almeida, a master's student in pure and applied mathematics at Montclair State University in New Jersey, compared results with Jonathan. Working on the Accountability Counsel project, they were trying to figure which factors lead to international development complaints being deemed eligible or ineligible. Working separately, they got similar results.
"Sexy data unicorn"? Well, not quite
Saturday, June 23, 12:25 pm
Dr. Jake Porway knows a thing or two about data analytics. The founder and Executive Director of DataKind studied computer science at Columbia and got a PhD in statistics at UCLA. So when he told the teams how impressed he was with their midday progress, he was serious, even though most people were still at the data-preparation stage.
"Any good data science project is mostly data cleanup," he said. "Everyone thinks they're going to be the sexy data unicorn, but then you just discover you're the data janitor."
It's like you need to build a house, but first you have to go shopping at the Home Depot with the lights off. You don't know what's in the data until you upload it onto your computer and start analyzing it.
Making sure complaints are heard
Saturday, June 23, 12:10 pm
Cherisse Thomas just completed her MBA in data analytics. Here, she's working alongside Dr. William Gunn, Elsevier's Director of Scholarly Communication, on the Accountability Counsel project, which wants to find a better way to monitor complaints against international development to make sure the communities that are harmed can get recourse.
"I'm working on data visualizing to explain and tell the story of what I'm finding with the data," Cherisse said, during a short break for lunch. She showed the charts she created on the types of complaints people are filing for international development, which run the gamut from labor, corruption and fraud and displacement — though most are classified simply as "other." Now she's trying to visualize by country. But already she's uncovered a disturbing trend:
The vast majority are closed without results. That just shows you that a lot of issues are going unresolved — and that's why we're here.
Meanwhile, William is using the open-source statistical program R to figure out which factors are likely to make a complaint eligible or ineligible. So far, he found that the organization it's presented to and the nation it comes from tend to be significant. The type of issue also effects its eligibility. "Issues having to do with biodiversity or indiginous people's rights seem to go through at a higher rate," he said, adding that there is also a high volume of those complaints. Meanwhile, of the 41 "procurement" complaints filed, none have gone through.
Here's a chart their team created to show the complaint lifecycle.
Update: William Gunn, who worked through lunch, was just seen in the break room eating a maple glazed donut topped with bacon.
Seeking data scientists!
Alan Krull is an important person to meet at the DataDive. He's a talent acquisition manager for technology at Elsevier, and he's scouting data scientists to work in our New York, Philadelphia and London offices.
The real work begins
Saturday, June 23, 10 am
A hush has fallen over the room except for quiet conversation and the tapping of keys. More soon.
3 projects for inclusive development
Friday, June 22, 7 pm
Leaders from each of the organizations talked about their work before outlining the project volunteers could take part in and the skills that would be needed. They spoke passionately about the hardships that can result when international development takes place in communities without their input, displacing people from their homes and ruining their livelihoods. And they sought volunteers with a range of technical and non-technical skills.
In a world where data science and technology are increasingly being eyed with suspicion, here's a chance to use our expertise for good. As Dr. Jake Porway, Founder and Executive Director of DataKind, put it:
We're not really working with data; we're working with people.
Making sure communities can take part in development that affects them
The International Accountability Project (IAP) wants communities to play a key role in the develop process. "If a community is not involved in a development project, it will likely cause harm," said Ryan Schlief, Executive Director of IAP. "It could be put in a place that could hurt their work or their livelihoods."
His organization is tracking all development projects at 13 major banks. Now they want to develop an automated process to match news articles to projects. This would help them keep track of the impact of these projects in real time. "Our objective is to get the information about the development projects to the people in the communities nearest to the project," Ryan said.
A Data Ambassador pointed out the need for semantic matching, since the language used in news stories is often different than that used by banks and official project reports. Skills needed include basic coding in Python, basic modelling skills and machine learning techniques.
"Follow the money"
The mission of Inclusive Development International (IDI) is to ensure international development considers the human rights and environmental impact on communities. To do that, they "follow the money" between harmful projects and the institutions that finance them so complaints can be filed.
"We get requests from people who are harmed by the projects, people who have had their homes ruined, their farms, their crops," said Research and Communications Director Dustin Roasa. His organization tracks who is financing and benefiting from the development.
Right now, they manually collect search results from 14 online databases. "We are swimming in data, and we are understaffted, so we are humbled that you are all here today to help us," Dustin said. "Ultimately this benefits people."
The challenge is to write a suite of web scrapers to automate this process. They are seeking volunteers with familiarity with web data and APIs who use the Python web-scraping toolkit and pandas to munge results. They are also looking for non-technical skills, including research, testing, and documentation to create a users manual.
Monitoring complaints against international development projects
Accountability Counsel helps communities hold corporations and institutions accountable for harmful effects of international development projects. The goal of their DataDive project is to monitor and understand local complaints about international development projects.
"Our focus is on making sure the communities that are harmed have recourse," said Research Director Samer Araabi.
The project has three parts:
- Standardizing the process of collecting complaints
- Finding a way to get updates on complaints
- Analyzing the complaints (to figure out what makes certain complaints more effective than others, for example)
They need people experienced with Python and its scraping packages. They also need people familiar with data exploration, visualization and predictive modeling "whether you're a d3.js wizard or just getting your feet wet in R."
Beyond that, they're seeking people with analytical skills. "If Python or scraping is not your thing, never fear — there is plenty you can do," said Emily Yelverton, a project manager for DataKind.
Knowing the skills being sought helped participants choose their teams.
"I'm looking at where I can make the biggest impact," said Dr. Willian Gunn, Director of Scholarly Communications at Elsevier. He said his programming experience and analytical skills seemed best suited to the Accountability Counsel's project, adding that he could offer his data visualization skills using Gephi or Neo4j.
Getting to know each other
Friday, June 22, 6:30 pm
Prepping for opening presentations
Friday, June 22, 6 pm
DataKind reps huddle with Project Champions, who will give presentations about their NGOs to the volunteers.
Meet the Elsevier volunteers
- Mike Carroll, Software Engineer (Sacrimento area)
- James Chang, Senior Business Analyst, Engineering & Technology Products (NYC)
- Jessica Cox, Data Scientist, Elsevier Labs (NYC)
- Ale De Vries, Product Director, Platform Integration (NYC)
- Willian Gunn, PhD, Director of Scholarly Communications, Global Communications (San Francisco Bay area)
- Rebecca Poch, Software Engineer, IT Services (NYC)
- Raghavendra Ponnam, Data Engineer, IT Services (NYC)
- Wen Zhang, Senior Web Analyst (NYC)
- Jonathan Zimmerman, Associate Director, Customer Insights (NYC)
Alan Krull, Talent Acquisition Manager - Technology, will be there scouting talent for our technology teams in Philadelphia, New York, and Cambridge, Massachusetts.
The three DataDive organizations
Volunteers will be working to help one of these organizations with data science projects:
Accountability Counsel is a nonprofit organization committed to helping communities hold corporations and institutions accountable for harmful effects of international development projects. They provide tools and training for these communities and advocate for policy changes to make financial institutions more accessible, independent, transparent and fair. They help communities worldwide by facilitating formal requests for accountability; this includes identifying the correct accountability office for a development project, filling and submitting the complaint, and keeping track of its progress.
Inclusive Development International
The mission of Inclusive Development International (IDI) is to ensure international development considers the human rights and environmental impact on communities. One activity that supports this mission is following the money between harmful projects and Development Finance Institutions (DFIs), where complaints can be filed with associated Independent Accountability Mechanisms (IAMs).
International Accountability Project
As an international advocacy organization, International Accountability Project (IAP) seeks to advance development principles and projects that prioritize human and environmental rights by:
- reinforcing how communities participate as central figures in the development process.
- influencing the policy and practice of development, assisting with specific community-led priorities.
- supporting communities and civil society to monitor and respond to development projects.
IAP leverages community-level expertise and experience to increase community-led participation and reinforce campaigns supporting community-led development.
Sources: Accountability Counsel, Inclusive Development International and International Accountability Project