Coding with Gemini: Cheerful, Cooperative, and Usually, Wrong.

An experiment in AI coding with Google Gemini. I try to be fair. When I call generative AI mostly slop, I don’t do so blindly; I attempt to conduct reasonable tests in various contexts.

Yesterday I needed a couple of routines — one in Bash, the other in Python. I tried the Python one first. This required code to asynchronously access a remote site API, authenticate, send and receive various data and process what was returned, relying on a well documented Python library on GitHub written specifically to deal with that site’s API.

After almost two hours, I gave up. Gemini was consistently cheerful and cooperative — almost to a creepy extent. It generated code that looked reasonable, was very well commented, and even provided helpful examples of how to configure, install, and run the code.

Unfortunately, none of it actually worked.

When I noted the problems, Gemini got oddly enthusiastic, with comments like “Wow, that’s a great explanation of the problems, and a very useful error message! Let’s figure out what’s wrong! Here is another version with more diagnostics that accesses the library more directly!”

Sort of made me feel like I was dealing with an earnest but incompetent TA at an undergraduate CS course at UCLA long ago. Which was not something I enjoyed back then!

After a bunch of iterations, I gave up. Even starting over didn’t help. Gemini never seemed to produce the same code twice, no matter how I worded the prompts. The code would use completely different models each time, sometimes embedded configuration values, sometimes external files, sometimes command line args. And the way it tried to use the Python library in question also varied enormously. It almost seemed random. Or at least pseudorandom.

I spent half an hour and wrote plus tested the code I needed from scratch. It worked on the second try, and was about half the number of lines of any of the code Gemini generated, and much simpler, for whatever that’s worth. By comparison, Gemini’s code was bloated and definitely unnecessarily complex (as well as wrong).

I did give Gemini another chance. I also needed a simple Bash script to do some date conversions. I offered that task to Gemini since I didn’t want to bother digging through the various date format parameters required. Gemini came up with something reasonable for this in about four tries. Whether it’s completely bug free I dunno for sure, I haven’t dug into the code deeply since its not a critical application. But it seems to be working for now.

So really, I haven’t seen a significant improvement in this area. There are probably some reasonable sets of problems where AI-coding can reduce some of the grunt work, but once you get into anything more complex the opportunities for errors, especially in larger chunks of code where detecting those errors might not be straightforward, seem to rise dramatically.

–Lauren–

Vortex Technology PFIR PRIVACY Forum Network Neutrality Squad
	About Lauren - - - lauren@vortex.com	Blog Home Page	On Mastodon
	Go to postings from 2003 through March 2016

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30