July 25, 2022

How to Simulate Turnout in Tunisia's Constitutional Referendum Using R and the Tidyverse

On coming home, I looked at R-bloggers, while vegetating over the Giants game. I have a keen interest in Middle Eastern politics and specifically, the machinations in North Africa's Maghreb region.. So, it is with keen interest that I read that Tunisia is having a constitutional referendum and that he's run a few simulations of the referendum. I've copied the run below:
require(tidyverse)
require(ggthemes) # we need these two and our man forgot to include them.

N <- 7000000

N -> number.of.voters # for clarity purposes

stations <- 4500
vote_assign <- sample(1:stations,number.of.voters,replace=T, prob=sample(1:3,stations,replace=T))
# loop over turnout, sample polls, estimate turnout

over_turnout <- parallel::mclapply(seq(.05,.5,by=.1), function(t) {
# polling station varying turnout rates

station_rates <- rbeta(n=number.of.voters,t*20,(1-t)*20)

# randomly let voters decide to vote depending on true station-level turnout rate

pop_turnout <- lapply(1:stations, function(s) {
tibble(turnout=rbinom(n=sum(vote_assign==s), size=1,prob = station_rates[s]), station=s)}) %>% bind_rows

over_samples <- lapply(1:1000, function(i) {

# sample 100 random polling stations 1,000 times
sample_station <- sample(1:stations, size=50)

turn_est <- mean(pop_turnout$turnout[pop_turnout$station %in% sample_station])

return(tibble(mean_est=turn_est, experiment=i))
}) %>% bind_rows %>%
mutate(Turnout=t)

over_samples
},mc.cores=10) %>% bind_rows over_turnout_biased %>%
group_by(Turnout) %>%
summarize(pop_est=mean(mean_est), low_est=quantile(mean_est,.05), high_est=quantile(mean_est, .95)) %>%
ggplot(aes(y=pop_est,x=Turnout)) +
geom_pointrange(aes(ymin=low_est, ymax=high_est),size=.5,fatten=1) +
geom_abline(slope=1,intercept=0,linetype=2,colour="red") +
theme_economist() + # I prefer this to theme_tufte, as Robert employed
theme(text=element_text(family="")) +
labs(y="Estimated Turnout",x="True Turnout", caption=stringr::str_wrap("Comparison of Mourakiboun estimated (y axis) versus actual turnout (x axis). Red line shows where true and estimated values are equal. Based on biased samples of 50 polling stations with higher turnout stations more likely to be sampled. However, simulation assumes no problems with recording votes.")) +
ylim(c(0,0.5)) +
xlim(c(0,0.5))
From our experience in dealing with Brexit, referenda tend to be very hard to predict if the country isn't accustomed to having them regularly. And, Robert does acknowledge this.

May 10, 2022

How I Hire

I've interviewed dozens of people for many positions. Some I waved through, a handful I rejected at every one of my variousorai employers. I've had a few cry out of frustration at not being able to solve the problem, I've had a handful who've been resourceful and found the answers on the web in real time. I'm. here to tell readers that knowing how to factor a polynomial (or not, as the case may be) is not the reason candidates are asked questions like "what is 2 to the 32nd power?" or "Why are manhole covers round?".

If you think that grinding on LeetCode or doing hackerrank perfectly is the end-all, be-all of your interview preparation, you will not be bumed ahead. If you feel that your graduating from valedectorian at Choate entitles you to a job, you are not going to get ahead.

Now, what will and, more importantly, why will it get you the job and keep you there?

We'll start with the "why" first. So, you went to Stanford. Congrats, you are one of 16,914 individuals who go to Stanford every year. What makes you deserving over any of them? You have a 4.0 GPA, what does that tell me? It tells me that you can give the grader what they want to see for 4 years straight. While your skills at reading people's intentions are commendable, how do I know that you'll continue to do this for us? You need to convince me (along with every other person) that you are the sort of person whose name I will look forward to seeing daily in my inbox -- you probably can gather where this is going...

What I want to know is "what makes you unique?" and not just nother hapless Cardinal, who found me on a list of alumni and reached out because we pay well enough to let you afford a monthly trip to a gentlemen's club. Do you like to travel? Was your home destroyed by a bomb in 1986? Did you grow up without parents? How did you get interested in this sort of work? You want me to remember you as a person, so that, when I look at your CV in a few days, the fact that you were the Stanford graduate that skiied Whistler at age 5 pops into my head or whatever it is.

September 23, 2020

How to Track ISP Uptime

The perl code below uses nmap (through the wonderful Nmap::Scanner module) and Text::CSV_XS -- though Text::CSV will work if you modify the import line:
use strict; use warnings; use diagnostics;

use Getopt::Long;
use Nmap::Scanner;
use Text::CSV_XS;

my $host = 'hasan.d8u.us';
GetOptions ('host=s' => \$host) or $host = 'hasan.d8u.us';
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
my $last = 'up';
my $scanner = new Nmap::Scanner;
$scanner->register_port_found_event(\&host_found);
$scanner->register_no_ports_open_event(\&host_found);
do {
    $scanner->scan("-p 22 --system-dns -Pn $host");
    sleep 30;
} while 1;

sub host_found {
    my $self     = shift;
    my $host     = shift;
    my $port     = shift;
    my $time     = time();
    
    if ($host->status ne $last) {
        $csv->say($time, $host->status);
        $last = $host->status;
    }
}

August 31, 2020

How to solve fizzbuzz

I was on a phone screen today and was given the famous (or infamous) FizzBuzz problem. I will provide a strategy to solve the class of problems that fizzbuzz represents, namely, take two numbers, a and b both positive. The question is phrased thusly:

Generate numbers 1 to n, print "fizz" for every number divisible by a, "buzz" for every number divisible by b, and "fizzbuzz" for every number divisible by a*b. If not, print the number itself.

The key here is to start with the most restrictive case first, the a*b, before tackling the larger of a and b as an else if clause and the default case being the final line of the loop. Hence, in pseudocode:

for number in range(1, infinity):
if number % (a*b) == 0:
console_print 'fizzbuzz'
else if number % b:
console_print 'buzz'
else if number % a:
conosle_print 'fizz'
else:
console_print number

This code isn't in any language, but is probably easy-enough to translate to whatever language you need to. Also, it is general enough to (hopefully) enumerate the general case. I do, however, welcome further questions and feedback. Many thanks!

August 17, 2020

How to Add Grain in the Gimp

I have long wanted to learn how to retouch photos I've taken. One of the reasons I pay for Smugmug is that it won't mess with my creations. So, I took Terry's photos, adjusted its saturation, before finally adding grain. Here's how I did it:

  • To allow for easy redo, you should start off by duplicating the layer, under the layers menu or by hitting CMD-SHFT-D. This will result in a new layer entitled "[filename] copy" with an identical thumbnail to the [filename] layer.
  • Under the Colours menu, choose "Saturation".
  • This will popup a dialog with options, check the "Split View" and adjust the line to the main part of the image to see how it will look.
  • Set the saturation to 0. You'll have to tab away to see your results in the image.
  • Save this setting for later use if you're happy with it.
  • Now the image should look like the second one linked -- all colour removed. Have a cup of coffee
  • Back now? Good... We will now add grain to the image. Under the "Filters" menu, choose "Noise" and "RGB Noise".
  • Unclick "Independent RGB" and adjust the value to suit, watching the preview after tabbing away from the field.
  • When you're done, you should have something looking like my third linked image.

May 24, 2020

How I Do Data Science

At a high level , a data scientist does one of 3 things. Given a time series, we want to determine one of three things:

  1. What happened before?
  2. What the values were in the middle?
  3. What comes afterwards?
To estimate all three, data must be fit into an equation. I just set up a web service where you can post a delimited list of numbers, walk away, have it come up with histograms of the various fits and be notified by email when it's ready -- look for the fit tests for sample calls.

February 21, 2020

How to Predict the Stock Market?

I just completed a script that leverages two of my Json endpoints on units to make stock market predictions. The script is open-sourced on that gist and forms part of my test suite for the project.

December 21, 2019

How to Reverse-Geocode Addresses for Free

I just added functionality to units to reverse-geocode addresses. My test method for this code follows:

 @Test
 public void geocodeWorks() {

  Map<String, String> expected = new Gson().fromJson(
    "[{\"request\":\"2017 North Beverly Glen Blvd, 
Los Angeles, CA 90077, USA\",\"displayName\":\"Emerging Market Services, 2017, 
North Beverly Glen Boulevard, Beverly Glen, Westwood, Los Angeles, Los Angeles 
County, California, 90077, United States of America\",\"latitude\":\"34.1056855
\",\"source\":\"\",\"longitude\":\"-118.4461788\"}]",
    new TypeToken<List<Map<String, String>>>() {
    }.getType());
  StringMap actual = this.restTemplate.postForObject("/geocode",
    Collections.singletonMap("addr",
      "2017 North Beverly Glen Blvd, 
Los Angeles, CA 90077, USA"),
    StringMap.class);
  Assert.assertEquals(expected, actual);
 }
There's no rate limit on this API (for now), so use it to your heart's content. And you can view the up-to-date test code, as they evolve here.

December 10, 2019

'Staggering' New Data Shows Income Of Top 1% Has Grown 100 Times Faster Than Bottom 50% Since 1970

I found a time series regarding common dreams varying income classes share of GDP and visualized it, in an, I feel, more illustrative way than the original visualization. I'd appreciate your feedback on making my visualization clearer.

income  <- read.csv('/tmp/income.csv', header=TRUE, row.names=NULL)
colnames(income)  <- c('Year', '0-50pct', '50-90pct', '90-99pct', '99-99.9pct', '99.9-99.99pct', '99.99-100pct')
ggplot(income, aes(x=Year)) + geom_line(aes(y=income[,2]), colblue')

December 9, 2019

How to Script the Web For Free

This weekend, I decided to revive units functionality to parse HTML to JSON and put it up here. Will enhance it further shortly, by adding an error message and usage instructions. Knock yourselves out and let me know if you encounter any issues.