attended r conference in the land of eusilc. data anonymization for open science course focused on statistical disclosure control: synthpop, simpop, generative adversarial networks, k-anonyminity: if, for each combination of values of quasi-identifiers, at least k records exist in the dataset sharing that combination. perturbation methods: noise masking (adding normally-distributed errors), microaggregation (grouping similar records, replacing w means or medians), record-swapping, rounding, resampling, pram (probabalistically altered categorical data). puf techniques to thwart evildoers
longitudinal data are the hardest type to anonymize
the layout of hypercubes are codified in european legislation
preserve the correlation structure
simulated annealing
keynoted: cran reached 25th birthday with 80 submissions daily, r core commit stats log-scaled due to brian ripley, abhishek agreed to debug 2009-me
maybe the code is fine but the world changed
r not zero indexed but you europeans and your 1st floor is the zero-th floor
r is reading your code as a parsed tree
"commit a bunch of files before lunch 🍝"
thanked hadley wickham for session scraping forbes, useful to add 400 richest to scf. i recognize your name from twotorials (in-class notes below)
read the frequently asked questions section of metaculus. demography ought to be bigger, featured tutorial obsolete points
we follow a few principles to elevate forecasting above simple guesswork
greater weight to predictions by forecasters with better track records
a good question will be unambiguously resolvable
read income inequality in the united states: using tax data to measure long-term trends by gerald auten and david splinter. valid critique of piketty but focused on surrogate outcome: income < consumption < wealth inequality. dragon lair treasure mounds swell both by acquisition, appreciation
marriage rates on tax returns declined from 67 to 37 percent between 1960 and 2019. however, marriage rates have remained high among the top one percent, decreasing only from 90 to 85 percent. declining marriage rates outside the top of the income distribution increases income shares at the top of the distribution. larrimore (2014) estimated that the differential decline of marriage rates explains 23 percent of the increase in household income gini coefficients between 1979 and 2007
note that top wealth shares ranked by wealth are higher than when ranked by income
in the 1960s, only a tiny fraction of taxpayers actually paid the top tax rates (fewer than five hundred tax returns in 1962), in part due to tax avoidance behavior
the kakwani index of tax progressivity summarizes average tax rates over the entire income distribution. while changing little between 1962 and 1985, this index increased dramatically from 0.07 to 0.29 between 1985 and 2019
the u.s. tax system is more progressive than in european countries, which rely more on regressive value-added and payroll taxes
our estimates for after-tax income indicate that the top one percent share increased only 1.4 percentage points since 1979 and only 0.2 percentage points since 1962 .. using only market income on tax returns, piketty and saez (2003) argued that the top one percent share of income more than doubled since 1962. this analysis, however, did not include transfers and other income sources not reported on individual income tax returns, nor did it account for the effects of major tax reforms and changes in marriage rates. thus, it gave a distorted view of income inequality levels and trends
7/25