I'm trying to remove comments and strings from a c file with c code. I'll just stick to comments for the examples. I have a sliding window so I only have character n
and n-1
at any given moment. I'm trying to figure out an algorithm that does not use nested whiles
if possible, but I will need one while to getchar
through the input. My first thought was to while through find when n=* and (n-1)=/
then while through until n=/ and (n-1)=*
, but considering this has nested whiles I feel it is inefficient. I can do it this way if I have to, but I was wondering if anyone had a better solution.
相关问题
- Multiple sockets for clients to connect to
- What is the best way to do a search in a large fil
- glDrawElements only draws half a quad
- Index of single bit in long integer (in C) [duplic
- Equivalent of std::pair in C
The algorithm written with one
while
loop could look like this:To decide whether the input
char
belongs to a comment, you can use a state machine. In the following example, it has 4 states; there are also rules for traversing to next state.The example above is very simple: it doesn't work correctly for
/*
in non-comment contexts like in C strings; it doesn't support//
comments, etc.Since you only wish to use two characters for the buffer and only one while loop, I would suggest a third char to track your state (whether skipping text or not). I've put together a test program for you with inline comments explaining the logic:
I've also posted this code on Github to make it easier to download and compile:
https://gist.github.com/syzdek/5417109
Doing this correctly is more complicated than one may at first think, as ably pointed out by the other comments here. I would strongly recommend writing a table-driven FSM, using a state transition diagram to get the transitions right. Trying to do anything more than a few states with case statements is horribly error-prone IMO.
Here's a diagram in dot/graphviz format from which you could probably directly code a state table. Note that I haven't tested this at all, so YMMV.
The semantics of the diagram are that when you see
<ch>
, it is a fall-though if none of the other input in that state match. End of file is an error in any state exceptS0
, and so is any character not explicitly listed, or<ch>
. Every character scanned is printed except when in a comment (S4
andS5
), and when detecting a start comment (S1
). You will have to buffer characters when detecting a start comment, and print them if it's a false start, otherwise throw them away when sure it's really a comment.In the dot diagram,
sq
is a single quote'
,dq
is a double quote"
.