How do I datamine financial tables using Htmlunit?

Using java/htmlunit I want to data mine (web scrape) a bunch of hedge fund SEC 13F filings. I have no clue how to datamine the SEC's .txt files such as [This Table][1]. The table layout seems clean and structured, but how do I grab the `< Table >` with corresponding `< S >` and `< C >`? Moreover, how can I grab just the company names and `< C >` Value (in column 3) and `< C >` Shares Amt (in column 4). Not sure if I'm on the right track, but I used Bufferedreader, not sure what to do next to grab the data within the `< Table >` ... Here's what I have so far: import; import; import; import; import; public class BufferedReaderExample { public static void main(String[] args) { try { // Create a URL for the desired page URL url = new URL(""); BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream())); String str; while ((str = in.readLine()) != null) { System.out.println(str); } in.close(); } catch (MalformedURLException e) { } catch (IOException e) { } } } [1]:

以上就是How do I datamine financial tables using Htmlunit?的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » JavaScript 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准