Lyric
2nd String
Posts: 439
Joined: Aug 2013
Reputation: 7
I Root For: JMU
Location:
|
RE: Project Aurora New Features
(10-14-2016 12:13 PM)georgia_tech_swagger Wrote: (10-14-2016 12:01 PM)Lyric Wrote: As a developer myself who's interested in natural language processing, I'd love to see a developer API for this board to obtain publically-available data through an established interface to assist with research and/or allow others (such as myself) to layer third-party services on the forum that could expand the user experience (for example, combine sentiment analysis and topic modeling to automatically take the "pulse" of a particular fan base on a per-topic basis and visualize how the mood changes over time, and make it available to share for all users. The possibilities are endless).
I could do that, but man it'd be way down the list. You'd probably be better off taking an existing web crawler kinda bot (like the ones they use to automatically update statistics/etc for Wikipedia ... and they're written in Python, so they're easy to modify to do just about anything).
But if you do that, please have a "cool off" timer between page requests so you don't abuse the server. If you just gobble down pages as fast as possible and you're NOT a major search engine ... you'll probably eventually get filtered at the firewall level by the server.
I understand -- I'm probably in the minority of people who would be interested in such a feature :). I currently rely on a webscraper that I built to get the data I need to do some research, but I've used it very conservatively (cool off timing, memory for pages that have been seen previously so that they aren't needlessly pulled down multiple times, with the whole thing run sparingly and targeted to only a small subset of the entire forum) for all the reasons you mentioned (don't want to hammer the website/disrupt its availability for other users, impose major costs to you, etc).
I guess the reason I like APIs is because I feel like there's no ambiguity about what's acceptable with regards to data collected, data collection throughput, etc., but I can certainly continue the limited scraping practices I've done previously if that's ok (if it ever causes issues, let me know -- I'm responsive to your needs first and foremost).
|
|
10-14-2016 12:36 PM |
|