Data Umbrella PyMC 2022 Open Source Report

Author: Reshama Shaikh

High Level Summary

Number of participants who:

  • Registered: 76
  • Attended: 38
  • Submitted >= 1 pull request: 24
  • Countries represented: 11

Sprint Background

The PyMC open source working sessions were organized by Data Umbrella to increase the participation of underrepresented persons in open source, python and data science.

This report focuses on the summary, impact and lessons learned of the Data Umbrella PyMC Open Source Working Sessions.

Pre-Series Office Hours

Photo not available.

Session 1

Session 2

Session 3

Post-Series Office Hours

Event Sponsors

This event was supported by:

This is a 3-minute video by Mariatta Wijaya of Google with inspirational tips on contributing to open source.

Schedule of Sessions

  • 02-Jul-2022: Pre-series Office Hours (13-14:00 UTC) (1 hr)
  • 09-Jul-2022: Session #1 (13-16:00 UTC) (3 hrs)
  • 22-Jul-2022: Session #2 (16-19:00 UTC) (3 hrs)
  • 4/5-Aug-2022: Session #3 (23-2:00 UTC) (3 hrs)
  • 18-Aug-2022: Post-series Office Hours (23-24:00 UTC) (1hr)

Number of Attendees

Session Data Umbrella Organizers PyMC Mentors Community Contributors Note
Pre-series Office Hours 3 2 24  
Session #1 3 4 20  
Session #2 3 4 12  
Session #3 1 4 6 Asia-Pacific (a)
Post-series Office Hours 1 3 4 Asia-Pacific(a)
         

(a) Session 3 and post-series office hours were for Asia-Pacific time zone.

Contributions Statistics

The contributions during the working sessions were tracked in this PyMC OS-WS spreadsheet. Contributions included both submitting a pull request and opening an issue where observed.

We worked on a few different repositories for the PyMC project:

  1. video-timestamps: this is a beginner-friendly list of issues where contributors watch a video from the PyMCon 2022 conference and add timestamps
  2. pymc-data-umbrella: this is the event website. Contributors could submit PRs to fix typos or clarify the contributing guide, as well as add their information to the list of participants
  3. pymc-dev/pymc: this is the main code repository for PyMC
  4. pymc-dev/pymc-examples: this is the repo that holds notebook examples for PyMC

As of the date of this report (28-Aug-2022), these are the PR stats:

  • Open: 2
  • Merged: 56
  • Issues opened: 6

Timestamps

Timestamps were added for 16 videos.

Event website

A number of PRs were submitted to update contributor information.

Updating Jupyter Notebooks

This was a more intermediate issue for new contributors, which was updating notebooks with consistent information for sphinx rendering.

PyMC documentation

These contributions were in the main code repository.

Demographics

Of the 74 people who registered, 38 attended. Of the 38 who attended, 24 submitted a pull request. This funnel graph shows the breakdown, by gender.

A total of 38 contributors attended the sprint. 14 of 38 (37%) identified as she/her. 24 of 38 (63%) identified as he/him.

Contributors joined from 10 different countries. Country information was provided based on where participants were joining from.

  1. United States of America: 13
  2. India: 6
  3. Ghana: 4
  4. Kenya: 4
  5. Germany: 3
  6. United Kingdom: 2
  7. Canada: 2
  8. Brazil: 2
  9. Colombia: 1
  10. Ireland: 1

Returning Contributors

There were 3 “returning” contributors. These contributors had participated in a previous scikit-learn sprint.

Spoken Languages

The event was run in English. Participants were asked on their registration forms to indicate if they needed a translator. No translators were requested.

This barplot shows the primary spoken languages by the sprint participants.

Impact Report for Data Umbrella PyMC Open Source Working Sessions

Non-measurable Impact

Aside from the number of PRs that were merged and issues that were opened, there is non-quantifiable impact of the open source working sessions. Some examples include:

  • learning to set up virtual environment
  • using Git (fork, clone, branch, fetching another’s PR)
  • introduction to tests such as: flake8 (linting, formatting), pytest, “continuous integration”
  • learning about sphinx and documentation
  • learning about NumPy validation
  • navigating through the codebase structure of pymc
  • digging into functions, learning about errors
  • interacting with contributors on GitHub
  • learning, in general
  • networking, meeting people from around the world
  • building confidence (making a dent in “imposter syndrome”)
  • having fun

Finding out About the Working Sessions

For those who attended the working sessions, this is how they learned of the event. The main avenues were by invitation from Data Umbrella, Meetup, Twitter, LinkedIn and their network (“word of mouth”).

Next Steps

Explore options to continue momentum of contributions.

Sessions Feedback

Feedback has been shared a number of ways:

  • Event survey
  • Social media (Twitter, LinkedIn)
  • Casually, in conversation during the office hours and working sessions

Survey

We received 5 responses to the survey. The primary reason the responses rate was so low is that these events were spread over a 7-week period and different people attended different events.

Overall, the feedback on the surveys was positive.

In response to the question “What are your favorite parts about the sessions?

  • Interacting with Mr. Christian and getting to know more about the community and workings.
  • Working with other people - a lot of time spent alone when learning usually so it’s a nice change and good to be exposed to other people’s ideas
  • Meeting core PyMC team and other contributors, networking, learning to contribute to open source project

Suggestions for Improvement

In response to the question “What could have worked better at the sessions?

  • I had (and still have) difficulty finding certain pages and links - between pymc contributing section and dataumbrella/pymc website I get confused, since the websites look similar but have different URLs
  • Call out need to fork both pymc and pymc-examples (or whichever one you plan to contribute to)

Challenges

Challenge 1: Emails going to spam

We communicated with registrants via email and Discord. For a number of people, the emails went to spam and they missed it. We do have a reminder on the registration form to keep an eye out on their spam folder, but emails were still missed.

Challenge 2: Preparing by reading

The event had a comprehensive website and the events were posted on Meetup with instructions as well as in multiple places (event website, Discord, newsletters, emails) on the process (join Discord, go through website, submit a registration form). Despite numerous reminders a number of people did not join Discord, some joined Discord at the start of the event, which might indicate they missed reminders, some participants did not submit a registration form, etc.

It is important that participants submit a registration form for these reasons:

  • They have read and agreed to the code of conduct.
  • They understand how the event will go.
  • Many participants have anonymous Discord profiles and this information is needed to track who is joining the event and can be added to the private channel.
  • We need to connect participants to their GitHub pull requests.
  • We need participants email addresses to communicate with them about the event.

Challenge 3: Discord

Some participants had technical issues with Discord. We have a 10-minute video on how to navigate Discord, though it is not apparent that all participants watched the video.

What’s Next

We hope to maintain the momentum by holding casual monthly “study groups” to continue contributing to PyMC.

Sessions: Social Media Shares

Carlo of Brazil

Pablo of Brazil

Igor of USA

Dustin of USA

Prince of Ghana

Rowan of Tennessee, USA

Benjamin

Chris Fonnessbeck, PyMC Team Member

Zoe


Social Media Promotion

We created a social media kit for the Data Umbrella PyMC Open Source working sessions to provide content for our community partners to share.

Twitter (English)

LinkedIn (English)

LinkedIn announcement


Acknowledgments

We thank the Data Umbrella & PyMC organizers who created the website, conducted outreach, marketing and so much more!

  • Reshama Shaikh
  • Beryl Kanali
  • Sandra Meneses
  • Sandy Weng
  • Cristina Mulas Lopez
  • Christian Luhmann
  • Oriol Abril Pla
  • Thomas Wiecki

We thank the PyMC team who mentored at the sessions and those who were online during the weekend afterwards to promptly review the submitted pull requests, particularly:

  • Christian Luhmann
  • Oriol Abril Pla
  • Ravin Kumar
  • Dan Phan
  • Chris Fonnesbeck
  • Alex Andorra
  • Michael Osthege
  • Fernando Irarrázaval

References

Addendum

  • [no addendums or updates at the time of publication]