I am a data scientist with a political science background. I work as an independent consultant with skills and experience in:
- Forecasting, prediction, and classification using statistical and machine learning models for tabular and time-series data.
- Extracting, cleaning, and merging data from various sources and in various formats, including events and spatial data.
- Developing and maintaining data and model pipelines.
- R, Python, SQL, PostgreSQL, MySQL/MariaDB, Docker, git/GitHub, bash, cloud servers (AWS EC2/S3).
- Substantively, developing and implementing geopolitical and governance risk forecasting systems.
- More than 10 years of experience working on client-driven projects, for clients including Leidos, the US Intelligence Advanced Research Projects Agency, Duke University, and the University of Gothenburg (V-Dem Institute).
If you are interested in hiring me, get in touch or check out my LinkedIn profile for more information.
Where you can find me
What else is here
For a list of academic publications, see my research page.
For a brief period of time, I used to blog.
There are a couple of sub-pages:
et1000: the 1,000 most common Estonian words: I’ve been learning Estonian on and off for several years. A challenge that I encountered when I started using flash cards was how to prioritize what words to learn. At that time (2020), I couldn’t find an easily accessible and formatted list of the most commonly used Estonian words, so I made my own. It was also an attempt to practice some JavaScript, although in the end the main challenge ended up being related to the fact that in Estonian, words can have many different forms based on the grammatical context in which they are used (lemmatizers to the rescue!).
And several R package static doc pages:
icews: The ICEWS event data consists of more than 270 million event data records extracted from global news stories. The raw data is delivered via dataverse—the {icews} R package automates the process of keeping an up to date local copy of the ICEWS data, using either a file- or SQLite-based storage backend. (Not on CRAN.)
states: I frequently work with global data for independent states. This package has some utility functions for making it easier to work with the two major lists of state system membership, Gleditsch & Ward and COW.
spduration: Implements a time-varying covariate split-population duration regression model for survival data where an unknown portion of the cases are immune from the failure event. These are sometimes also called cure models.