Stonelinks


Home | Posts | Projects | About

FCC ECFS Comment Dataset

08/19/2014

Update (9-18-2014): Turns out the FCC just ended up releasing all the data in bluk and other people have already cleaned it up and are doing cool things with it.

github.com/Stonelinks/FCC-Proceeding-14-28-Data

If you’ve looked at US news in the last couple months, you’ve no doubt heard something about “net neutrality”. A lot of people have probably heard of it but probably don’t know what it is or why it is important. John Oliver does an awesome job motivating the topic for the average person. Fair warning, it is a bit scatological.

The FCC has opened up its electronic comment filing system (ECFS) for public comments about legislation that threatens the fairness and openness of the internet. Problem is that the FCC website is slow, PDF only and saves only the last 100 pages of comments. The system is cumbersome at best and inaccessible at worst. To solve this I wrote a scraper several months ago and have been scraping the comments on proceeding 14-28 almost every day. To date I’ve collected 153720 comments and about 570 MB of comment plain text. You can find download instructions and more over on github.

Today I’m releasing the dataset because I want to see what other people can do with it. I really want to use it to get my hands dirty with natural language processing, sentiment analysis and topic analysis. Hope someone finds it useful!

Tags: Net neutrality | Datasets

Read this next:

Leaflet Google Maps

Ever wanted to use google imagery inside of a leaflet map? This is just a small example of a google maps imagery / base layers hosted inside of a leaflet map. Check out where the magic happens on github: here.


Lucas DoyleWritten by Lucas Doyle, a robotics engineer who does a lot of web development in San Francisco.