C: Reading a text file (with variable-length lines) line-by-line using fread()/fgets() instead of fgetc() (block I/O vs. character I/O)

Is there a `getline` function that uses `fread` (block I/O) instead of `fgetc` (character I/O)? There's a performance penalty to reading a file character by character via `fgetc`. We think that to improve performance, we can use block reads via `fread` in the inner loop of `getline`. However, this introduces the potentially undesirable effect of reading past the end of a line. At the least, this would require the implementation of `getline` to keep track of the "unread" part of the file, which requires an abstraction beyond the ANSI C FILE semantics. This isn't something we want to implement ourselves! We've profiled our application, and the slow performance is isolated to the fact that we are consuming large files character by character via `fgetc`. The rest of the overhead actually has a trivial cost by comparison. We're always sequentially reading every line of the file, from start to finish, and we can lock the entire file for the duration of the read. This probably makes an `fread`-based `getline` easier to implement. So, does a `getline` function that uses `fread` (block I/O) instead of `fgetc` (character I/O) exist? We're pretty sure it does, but if not, how should we implement it? ***Update*** Found a useful article, [*Handling User Input in C*](http://www.azillionmonkeys.com/qed/userInput.html), by Paul Hsieh. It's a `fgetc`-based approach, but it has an interesting discussion of the alternatives (starting with how bad `gets` is, then discussing `fgets`): >On the other hand the common retort from C programmers (even those considered experienced) is to say that **fgets()** should be used as an alternative. Of course, by itself, **fgets()** doesn't really handle user input per se. Besides having a bizarre string termination condition (upon encountering \n or EOF, but not \0) the mechanism chosen for termination when the buffer has reached capacity is to simply abruptly halt the **fgets()** operation and \0 terminate it. So if user input exceeds the length of the preallocated buffer, **fgets()** returns a partial result. To deal with this programmers have a couple choices; 1) simply deal with truncated user input (there is no way to feed back to the user that the input has been truncated, while they are providing input) 2) Simulate a growable character array and fill it in with successive calls to **fgets()**. The first solution, is almost always a very poor solution for variable length user input because the buffer will inevitably be too large most of the time because its trying to capture too many ordinary cases, and too small for unusual cases. The second solution is fine except that it can be complicated to implement correctly. Neither deals with **fgets'** odd behavior with respect to '\0'. >Exercise left to the reader: In order to determine how many bytes was really read by a call to **fgets()**, one might try by scanning, just as it does, for a '\n' and skip over any '\0' while not exceeding the size passed to **fgets()**. Explain why this is insufficient for the very last line of a stream. What weakness of **ftell()** prevents it from addressing this problem completely? >Exercise left to the reader: Solve the problem determining the length of the data consumed by **fgets()** by overwriting the entire buffer with a non-zero value between each call to **fgets()**. >So with **fgets()** we are left with the choice of writing a lot of code and living with a line termination condition which is inconsistent with the rest of the C library, or having an arbitrary cut-off. If this is not good enough, then what are we left with? **scanf()** mixes parsing with reading in a way that cannot be separated, and **fread()** will read past the end of the string. In short, the C library leaves us with nothing. We are forced to roll our own based on top of **fgetc()** directly. So lets give it a shot. So, does a `getline` function that's based on `fgets` (and doesn't truncate the input) exist?
To your new question at the end, yes, it exists. I outlined it in my answer. The article you've cited mentions a problem with a final non-newline-terminated line; I've made this a non-issue by pre-filling the buffer with '\n' and providing a way to detect the condition.

以上就是C: Reading a text file (with variable-length lines) line-by-line using fread()/fgets() instead of fgetc() (block I/O vs. character I/O)的详细内容,更多请关注web前端其它相关文章!

赞(0) 打赏
未经允许不得转载:web前端首页 » CSS3 答疑

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

前端开发相关广告投放 更专业 更精准