Curriculum vitae.

Education.

MSc Artificial Intelligence

University of Amsterdam

2018 - 2021

 

Courses: Machine Learning, Deep Learning, Multi-agent systems, NLP, NLP II, Computer Vision, Information Retrieval, Knowledge Representation, Information Visualization
Thesis: As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning.


Honors BSc Statistics

University of Toronto

2012 - 2017

 

Minors: Psychology & Italian

Courses: Statistical Theory, Machine Learning, Methods of Data Analysis, Data Analysis II, Intro to Computer Programming, Design of Scientific Studies, Data Mining, Linear Algebra, Survey Sampling and Observational Data

Associations.

Women in AI

X, the moonshot company

Chosen to participate in the Women in AI dinner series by Google X, the moonshot factory in Mountain View and London. The goal of the Women in AI dinner series is to connect and identify women doing impactful work in AI. This network is only available to explicitly invited guests as well as employees of X and other Alphabet companies. 


As a volunteer for the WiML workshop at NeurIPS 2021, I oversaw the Sponsor section to make sure our sponsors (e.g. Deepmind, Microsoft, Nvidia) were well taken care of and answered any technical questions they had.


Women in AI Ethics is a global initiative with a mission to increase recognition, representation, and empowerment of women in AI Ethics. Nominated as one of the top pioneers in the section ‘Society + Sustainability’ for promoting the use of AI for social good, through access for marginalized groups, mitigation of environmental impact, and development of public interest policies.

Publications.

Abstracts:

  • Thomas-white, K., Wever, F., Navarro, P. (2024) ‘Women Who Report Pain with Sex Have Varying Microbiome Profiles Based on Age’ accepted for poster presentation at ISSWSH 2024 (Poster)

  • Thomas-white, K., Wever, F., Navarro, P. (2023) ‘Incidence and Symptom Profiling of Vaginitis Containing Aerobic and Anaerobic Pathogens’ accepted for poster presentation at IDSOG 2023 (Poster)

  • Olmschenk, G., Thomas-white, K., Wever, F., Navarro, P. (2023) ‘Gardnerella Species Variations Show Pathogenic and Metabolic Differences’ accepted for oral presentation at IDSOG 2023

  • Shea, A., Wever, F., Denis, G., Ventola, C., Vitzthum, V. (2022) Assessment of potential recall bias and the multiple meanings of “heavy” in the study of menstrual bleeding’ Abstracts. American Journal of Human Biology, 34: e23740. https://doi.org/10.1002/ajhb.23740

Invited Talks.

Women in Data Science (WiDS) New York Regional Conference 2023 - 14 April, 2023

Talk: Leveraging data to close the gender health gap

Panel: AI Futures


Experience.

Machine Learning Data Scientist

Evvy

February 2022 - Present


Machine Learning Researcher

Clue & Bill & Melinda Gates Foundation

February 2020 - December 2021

 
 
 
 
 
 

Evvy is building a new understanding of the female body by discovering and analyzing new biomarkers — starting with the vaginal microbiome.
- Our intro product is the first ever at-home vaginal microbiome test to leverage full-genome sequencing to give anyone with a vagina groundbreaking insight into what’s up down there, why it matters, and what they can do about it.
As the first Data Scientist, I play an integral role in setting up the proper data foundations, and leveraging metagenomic sequenced microbial data to answer interesting research questions.


• "As easy as APC" - Master Thesis: Proposed leveraging self-supervised learning to tackle missing values and class imbalance simultaneously for multi-variate time-series classification. The proposed models, GRU-APC & GRUD-APC, showed improved results on two real-world datasets (Physionet & Clue), and show superior performance to existing published state of the art work. Tech stack used: SQL, Python, Tensorflow & AWS.

• Query data (SQL) from the Clue database containing 15 million+ users and billions of cycles. Filter, clean & analyze data (Python) to gather insights for a study on the impact of contraceptives on changes in symptom experience.


Biomedical Research Assistant

Stanford University

June - December 2019

 

• Investigated ~50 million self-tracked menstrual cycle data points from ~2 million cycles of 170k women, to research whether sexual intercourse patterns are predictive of cycle length variations.

• Performed a comparative analysis of different models for time series classification, such as MLR, ANN, LSTM and TCN.

Supervisor: Laura Symul, Stanford University


Data Scientist

Connecterra BV

March - May 2019

 

• Developed an auto-encoder LSTM to detect anomalies in cow behavior over time.

• Helped create a farm-specific cow-ranking system based on factors such as milk yield and rumen efficiency.


Data Science Intern

Microsoft

October 2017 - April 2018

 

• Implemented an SVM classifier to categorize client emails for a Dutch multinational bank, with ~88% accuracy. This model was deployed and is now in use by the client.

• Researched ~20k public government documents using NLP, helped create an Azure web app to efficiently search for keywords, summaries and document similarity scores.


Economics Research Intern

- with full-time offer

Thumbtack

June - Sep 2017

 

• Created an interactive choropleth map using D3.js showing important company metrics across the US on various geographical levels. This tool was presented to the CEO and is still used by the Marketing/Product team.

• Built dashboards on Mode Analytics (SQL & Python), which were presented to policymakers around the US, and at conferences in Washington D.C.

• Analyzed the final results for the annual Small Business Friendliness Survey


Founder/President

Data Science Toronto

2016 - 2017

 

• Founded the first official Data Science Club at the University of Toronto, sponsored by IBM’s Big Data University.

• Organized workshops, seminars, and networking events to help students Hone their data science skills.

• Grew the student club to ~500 members within a year, with an executive team of 6 members. Data Science Toronto is still currently active at UofT.


Course Developer

IBM - Big Data University

Oct 2016 - Feb 2017

 

Developed the course “Data Analysis with Python” for Big Data University, a community outreach by IBM. The course, which includes topics like model development and validation, is now available for the 1 million users on their platform.


Director of Business Development

You’re Next Career Network

2016 - 2017

 

• Organized Canada’s largest Startup Career Fair with 90+ startups and 2000+ attendees.

• Organized a workshop on Data Mining by IBM and a data hackathon.


Data Analytics Intern

- with full-time offer

Whistle

May - August 2016

 

Whistle Labs has the largest comparative database of pet health information.

Nutritional Research: developed an improved logistic regression model for recommending daily caloric intake.

Financial Analysis: researched why users switch between subscription plans.

Geocoding: analyzed GPS location data to cluster activity events.

Data Journalism: produced data-backed articles with the PR team for Tech Insider.


Behavioral Economics Research Assistant

Rotman School of Management

2015 - 2017

 

Responsible for designing and conducting experiments, and for independent data collection and analysis for a research on consumer decision-making: Pairwise Normalization: A neuroeconomic theory or multi- attribute choice.