How do I datamine financial tables using Htmlunit?

Using java/htmlunit I want to data mine (web scrape) a bunch of hedge fund SEC 13F filings. I have no clue how to datamine the SEC's .txt files such as [This Table][1]. The table layout seems clean and structured, but how do I grab the `< Table >` with corresponding `< S >` and `< C >`? Moreover, how can I grab just the company names and `< C >` Value (in column 3) and `< C >` Shares Amt (in column 4). Not sure if I'm on the right track, but I used Bufferedreader, not sure what to do next to grab the data within the `< Table >` ... Here's what I have so far: import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.MalformedURLException; import java.net.URL; public class BufferedReaderExample { public static void main(String[] args) { try { // Create a URL for the desired page URL url = new URL("http://www.sec.gov/Archives/edgar/data/1047644/000104746912006072/a2209520z13f-hr.txt"); BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream())); String str; while ((str = in.readLine()) != null) { System.out.println(str); } in.close(); } catch (MalformedURLException e) { } catch (IOException e) { } } } [1]: http://www.sec.gov/Archives/edgar/data/1047644/000104746912006072/a2209520z13f-hr.txt

以上就是How do I datamine financial tables using Htmlunit?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » JavaScript 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准

联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏