@Lady I thought I'd take a look, but discovered I'm not sure what this is for, so I got lost before I could figure out whether it's a good implementation of whatever it is 😅

I finally found, and appreciate, the comment at the bottom of the header file describing the syntax you're trying to match. But is that syntax specified by some other project? Are these paths coming in via HTTP requests or some other way? What motivated you to use C for this? I'm curious about many things 😁

And the most important question is, what kind of review were you hoping for? Suggestions on code style, or trying to find potential security issues, or something else?

@jamey the short version is i am trying to write a replacement for Gitweb (which i use, written in Perl) and Cgit (which i find aesthetically displeasing, written in C), which is to say, a CGI frontend for a directory of Git repositories. i’m doing this because i want something more configurable and don’t want to learn Perl, and i’m doing it in C because i want to get better at C and Cgit has already proven it’s possible.

so this commit is adding some code to process a path, provided to the CGI script, into a struct which represents its semantics. there aren’t any standards for this, but there is broad consensus on a format of `project/verb/base..target/subpath`. i’ve chosen some verbs that i’m planning to implement, like “log” and “show”.

because there is some flexibility in syntax, and to test the implementation, i’ve also written a serializer to get the canonical form of a path given the struct. the test code reads in a path from stdin, process it to a struct, and then serializes it back out. (it actually does this twice to confirm it is idempotent.) there is existing test infrastructure (in `sh/test.sh`) to compare the output of this program against the expected outputs and report errors.

so this is a very early step to enable me to start receiving and responding to requests over CGI, none of which is implemented yet.

in terms of code review, this is the first significant piece of C i’ve written in a long while, and i haven’t read that much C code either, so it’s kind of a shot in the dark. i’d be interested if there is anything significantly stylistically off, or if i am shirking best practices without good reason.

@Lady awesome, that explanation helped a ton, thank you!

with that in mind, there are no big changes I would suggest and only a few small things, which are mostly down to personal preference and you're welcome to ignore them 😁

the one thing I thought was actually strange was in the test runner, declaring single-element arrays. the pattern I expected to see was getline(&lineptr, &linelen, stdin), with those two variables not declared as arrays. that gets rid of the [0] subscripts in the rest of the function and means I don't sit there looking for the rest of the array elements. what you've written is valid and correct, just confusing in my opinion

there are tricks you could optionally use when building the string representation of a request. I find it difficult to maintain code where an array like "index" has to match its length (6) next to it. I think probably the simplest way to avoid that is to just call strlen on the string, which is going to be fast enough for almost any purpose since these strings are very short. but you could also consider e.g. sizeof("index") which counts the trailing nul so evaluates at compile time to 6. it's a little easier to visually check that the same string appears in both places. (note though that sizeof(action) is completely different and will always evaluate to the size of a pointer; for this trick to work you must write the string literal twice.) or you could write a small preprocessor macro, but I appreciate that you have avoided that so far and wouldn't blame you if you want to continue avoiding it

actually, I would generally prefer to put the lists of known strings into arrays that can be shared between the parser and printer so they can't get out of sync, and so the parser can be a simple loop over the array instead of a long list of strcmp calls. I can say more about that if you want. but the way you have it is okay too

getting into really minor things: sometimes you indent comments and sometimes you don't; I'm inclined to have them always match the indentation of the surrounding code, personally. and your choice of whitespace around const isn't any style I'm used to, but that's really not important

I hope that's helpful?

Follow

@jamey regarding using [0] and not &: this is a stylistic quirk i’ve picked up from Modern C, which takes the opinion that you should always use arrays when you know the pointer is not nullptr, and avoid & whenever possible. this definitely goes against existing practice but at this point i’m kind of inclined to agree? what you conceptually are doing with getline is passing in arrays with one slot each and it’s overwriting that slot. that’s just not how people usually write it.

regarding hardcoding, agreed. honestly i forgot about C23 constexpr while writing this code, and i think it could be cleaned up a fair bit by using it. i may try taking a stab at this.

regarding the indentation, i think that’s actually a bug in gitweb’s syntax highlighting :P

do you have any advice on naming global constants or struct members? this is probably one of the stylistic things i’m least certain about haha

@Lady fascinating, I hadn't encountered that advice on single-element arrays before. I guess my C experience is pre-modern 😂

similarly, I didn't know about constexpr, and I'm curious how you plan to use it!

I don't know that I have good advice on naming in C. the one piece of advice I picked up years ago that I personally like is to use whole words rather than abbreviating things, and if the result is uncomfortably long then consider if the thing you're trying to name is too complicated. I was thinking about mentioning that regarding your local variable names but forgot; I appreciated the comments explaining the names but I had to think more at every use to remember which meant what.

the only thing I can think of about struct fields is to not follow the old pattern of namespacing field names. like when struct stat names every field with an "st_" prefix; I don't think most code should do that.

oh, also I learned too late that type names should not end in "_t" because that's officially reserved. nobody told me that until after we released XCB 1.0 and committed to API stability, and I'm still annoyed 😅

Sign in to participate in the conversation
📟🐱 GlitchCat

A small, community‐oriented Mastodon‐compatible Fediverse (GlitchSoc) instance managed as a joint venture between the cat and KIBI families.