If you have seen my previous post you might know that I’m interested in automating gender detection. Luckly I’m not alone and we have many options to solve this problem. Oddly, there is only one alternative in python and it’s not the best one.
That’s why I decided to come out with a new one. It didn’t make sense to build one from scratch because there are so many open source options, so I based my solution on the Ruby gem Beauvoir. I used the statistical approaches (based on Agresti-Coull estimated value and binomial confidence interval). I used a dataset from GenderTracker, a solution created by the OpenGenderTracking Project. And I made some performance improvements with an index based on the position of the words in the file.
Finally I added support for Spanish names in Argentina and Uruguay. I used the data from names query of the city of Buenos Aires and the civil registry in Montevideo. The main problem with these datasets is that they are based on cities and not the whole country. I’m trying to get the data from Argentina but haven’t have luck so far. The code I’ve used to get this data is here and here.
The code of the GenderDetection tool for python is here and of course, it’s open source. Hope you find it useful! If you find any bugs, don’t hesitate on letting me know!