Data Insights

As a PhD student, I have collected and analyzed data from various sources. Below are the main insights from each project in a non-technical language. Click on each image for a short overview of each project, or go to each project's page for a full summary of the results.

Most of these insights are from experiments that allowed me to understand behavioral biases and deviations from the behavior of the fully rational agents that are often assumed in economics.

I have also worked with data from the Wesleyan Media Project on political TV ads and public data from the Federal Election Committee on Independent Expenditures.

More About My Work:

Analysis

I design experiments to test hypotheses and assumptions from economic theories.

I use econometric analysis to estimate the parameters of the relevant models and test hypotheses, as well as evaluate the impact of different policies.

Economics and Experiments have given me the tools to understand causality at a deep level.

Tools

I use Python for most of my data cleaning and analysis, but I also have experience with R and SQL.

I recently expanded my skills to include supervised and unsupervised machine learning techniques. I have implemented SVM and naive Bayes algorithms, as well as k-means clustering. I am always excited to learn new tools and skills.

Other

I presented my research in various academic seminars and conferences.

I also have experience teaching math and microeconomics classes at BA and MA levels.

Projects

Political Advertising

Stereotypes

This project is what I am currently working on. I am using publicly available data from the FEC on independent expenditures, data on election results for the House of Representatives, and data on TV political spots from the Wesleyan Media Project, as well as district geographic data from the census TIGER databases to estimate the parameters of a model of election tournaments. I plan to use these estimates to evaluate the effects of counterfactual policies on reducing the incumbent advantage and ameliorating the barriers for new entrants.

The map above illustrates estimated expenditures on TV ads by electoral district. Expenditures are on a logarithmic scale, and data from the blank districts are missing.

In a setting where participants receive noisy information about the performance of a peer in 6 different tasks, I find that the prevalence of stereotypes differs when the stereotype is positive relative to when the stereotype is negative.

When participants initially underestimate the performance of their peers, they tend to overreact to the good news about peer performance and, consequently, begin overestimating it.

When they initially overestimate the performance, they do not overreact to bad news; instead, they blame exogenous factors for the negative outcomes and sustain their initial bias. This makes overestimation a more prevalent bias than underestimation in the data.

Overconfidence is due to Attribution Bias

Human Learning as Regularized Algorithms

In various settings it has been observed that people tend to be overconfident. I designed an experiment that allows me to distinguish among three different mechanisms that can lead people to becoming overconfident. It could be that they hold dogmatic beliefs about themselves; that they have fallen into a learning trap; or that they incorporate information in a biased way.

I find that, although some people do fall into learning traps, most of what explains overconfidence is incorporating information in a self-serving manner: people attribute successes to themselves but blame exogenous circumstances when they fail.

In a learning environment where subjects have access to different sources of information, we investigate the extent to which people focus their attention on only a few sources to inform their predictions about an unknown outcome.

The average subject uses only two sources of information, and they do not choose these sources optimally. Instead, they devise a set of variables that they believe to be relevant and commit to them throughout.

We can determine this by randomizing the number of variables available to participants. We see that allowing them to observe two variables improves their performance relative to a single variable. However, providing three or more sources of information has no additional significant effect.

Experiment Interfaces:

All my experiments were coded on Python using oTree. You can find screenshots of the different experiments here and the code in my GitHub repositories

Technical Details

Coding Languages:

Python (pandas, numpy, scipy, seaborn, matplotlib, sklearn, plotly)
R (data.table, tidyverse, ggplot2),
SQL (snowflake)

Editors: VS Code and Jypyter Notebooks

Spoken Languages: English (Fluent), Spanish (Native),

French (Beginner), Russian (Beginner)

Research Assistants: Two cats 😺😺 -->

Nationality: Mexican 🇲🇽