First Time Contributor to Open Source — Data Umbrella Scikit-learn Virtual Sprint February 2021

“Data Umbrella Africa & Middle East scikit-learn sprint”

Fortune Uwha
3 min readFeb 12, 2021
Image Credits: Data Umbrella

On February 6, 2021 I participated in an open source sprint organized by Reshama Shaikh and Mariam Haji of Data Umbrella. The sprint was organized to involve more underrepresented groups in tech to contribute to the open source machine learning library scikit-learn and to generally widen the pool of open-source contributors.

Sponsors: Special thanks to Code for Science & Society & the Gordon and Betty Moore Foundation for supporting the scikit-learn sprint.

My motivation behind taking part in the sprint

I use some form of open source software every single day. Open source software, including the scikit-learn library, plays an important role in my life and using my skills to give back and help with improving the software I depend on is a huge privilege.

I also get to learn a lot from the source code. Going through the code of scikit-learn with hundreds or even thousands of contributors, I gain immense knowledge about best practices and code quality.

Setting up my virtual environment

Now this was a struggle. But, the community provided some very helpful videos and links that greatly helped me in setting up my virtual environment.

If there is no struggle, there is no progress

— Frederick Douglass

Other helpful links: Data Umbrella Playlist

Community bonding period 🤝

During this period, all selected participants joined the discord channel in preparation for the sprint. I introduced myself and made new friends here🤗

Pair programming works!

We all got assigned a programming partner and for each team there was a channel for written communication and an interactive channel where people could share their screen and chat with each other. The interactive channel also allowed the organizers/ core developers to jump in and help if needed. Additionally, there was a “help queue” channel where participants could ask questions and an interactive “help” channel where participants could talk directly to available mentors.

I got an incredibly amazing sprint partner C Thinwa who was a returning contributor from Nairobi, Kenya. We collaborated and worked together on two separate issues from the scikit-learn repository.

The programming period 👩‍💻👩‍💻

In this, we had to virtually collaborate with our pair programming partner for about 4 hours. This was great because two heads are definitely better than one. If the driver encounters a hitch with the code, there will be two of them who’ll solve the problem. This also helped to avoid making the same coding mistakes. The issues we worked on can be found here and here.

What have I achieved? 🏆

I achieved a lot of things from the scikit-learn sprint more than words could express. But I’ll highlight a few;

  • Learnt to check for style and programming errors using flake8 and pytest
  • Pull request got approved and merged to the main scikit-learn repository, hurray! 🎉🎉. This brings an incredible sense of satisfaction and joy . It really is an amazing feeling. You can find the merged PR of me and my programming partner here and here.
  • Met scikit-learn core contributors, Andreas Mueller, Adrin Jalali, Guillaume Lemaitre, and Olivier Grisel
  • Connected with amazing fellow contributors 💓

That is all for now. Thanks for reading about my experience till the end. If you enjoyed reading this article please give it a 👏 below. Thank you!

Feel free to connect with me🙂 over LinkedIn or Twitter or simply drop me an email at



Fortune Uwha

Senior Data Analyst @ Western Union| Data Science Writer. Mathematician with a love for data(Math + Data = Adventure😉) Connect: