Posts

Showing posts with the label data

Census.gov QuickFacts Data Set

To create a visualization for  https://www.sizzleanalytics.com/  I had to compile census information from the census.gov QuickFacts tool. As for as I could tell, they don't provide an easy way to automatically download all the data, so I manually downloaded each state then used a simple node script to merge them together. Data set available here: https://data.world/aaronhoffman/census-gov-state-quickfacts Hope this helps, Aaron

Hacker News Dataset October 2016

Image
Our latest project on Sizzle  is a visualization of the Top 10k Posts of All Time  on Hacker News . To create the visualization, we first needed to collect the data. I noticed that there was an old copy of the hacker news dataset available on Big Query . But I needed an up-to-date copy, so I looked into the Hacker News Firebase API . The API allows you to get each item by Id. You can start by retrieving the current Max ID, then walking backwards from there. (Items my be stories, comments, etc., it's the same API endpoints for all types of items.) There is no rate limit, so I created the following script that will generate a text file with 10MM lines containing all of the URIs to retrieve. (we will then feed this file into wget using xargs) Note: 10MM items was ~5 years worth of data. Script to create the 10MM line file of URIs to retrieve: https://gist.github.com/aaronhoffman/1f753c660d7364bb594a36af350b227c That script takes about 10 minutes to produce a file t

Top Cities in the United States

To find a data source for visualizations , I found myself commonly searching for "Top 200 Cities in the US", "largest cities by land area", and "largest meto area by population", etc. I would then have to combine the list I found with other data sources to get all the info I needed. I thought I'd try to keep an up-to-date copy of this list, in case others also found this helpful. You can find that list here:  https://gist.github.com/aaronhoffman/e1893d32fa1254429abf57f5c0413fa3 I will try to keep this list up-to-date with additional information and API keys as I use it over time. Hope this helps, Aaron

Gather Metadata For Each Column of SQL Table

Image
Often when working with a new data set I'll run these queries to learn more about each column of data. https://gist.github.com/aaronhoffman/eb30805ee2f5cafc64152dd1def800bd For example, you can run a single query to union together aggregate data on each numeric column. Result looks something like this: Hope this helps, Aaron