Events
Seminar @ Cornell Tech: Timnit Gebru
Using Computer Vision to Study Society: Methods and Challenges
Targeted socio-economic policies require an accurate understanding of a country’s demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning driven approaches are cheaper and faster–with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to determine demographic attributes using the detect cars. To facilitate our work, we used a graph based algorithm to collect a challenging fine-grained dataset consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources. Our prediction results correlate well with ground truth income (r=0.82), race, education, voting, sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Data mining based works such as this one can be used for many types of applications–some ethical and others not. I will finally discuss work (inspired by my experiences while working on this project), on auditing and exposing biases found in computer vision systems. Using recent work on exposing the gender and skin type bias found in commercial gender classification systems as a case study, I will discuss how the lack of standardization and documentation in AI is leading to biased systems used in high stakes scenarios. I will end with the concept of AI datasheets for datasets and model cards for model reporting to standardize information for datasets and pre-trained models, to push the field as a whole towards transparency and accountability.