Does Reddit's JSON API have undocumented artificial limits to prevent scraping?

It would appear that the JSON API returns very different results than the browser. Put this URL in your browser and look at the results, then try it with API Kitchen, Curl, Mechanize, etc http://www.reddit.com/r/guitar/new/.json?limit=100 You get 100 results with the browser. Using the non-browser methods of retrieving it gets you 1-2 results. Is this a bug, or intentional design to limit what web crawlers gather from Reddit? On larger subreddits, it makes for incredibly inconsistent results, and the "after" parameter is inaccurate then for paging, resulting in a ton of duplicate results. Yet, I can't find any documentation indicating that this is intentional and not a bug. If there are limits, that's cool, I just want to know what they are so I can respect them properly in my code.
It turns out that the problem was that authenticated and unauthenticated requests will get different returns. If you authenticate, then everything will return 100%.

以上就是Does Reddit's JSON API have undocumented artificial limits to prevent scraping?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » JSON 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏