Ok I finally took the time to run gesture mode without the RC connected. Results (no surprise) = 1440x1080 selfies. So I powered everything down, turned on the RC (phone is tethered with USB cable), turned on the Spark and let the two connect. Then I did gesture mode, keeping the turned on RC in my bag and not in use, and the results = 3968x2976 selfies. Both times it was actively tracking me as I walked around.
So in my experience, if you want full res selfies you need to have the RC connected.
I did some digging myself.
First, at least half the time I was in gesture mode I had my phone connected as a controller and got 1440x1080. Apparently, I never took any gesture shots with the RC attached (because that would defeat the purpose).
Second, I've never thoroughly tested gestures on my Mavic, so I decided to try it today. With the Mavic wifi-controlled by my phone I took shots using gestures and not. Interestingly, the gesture-triggered pics were, like the Spark, video frame grabs. Though on the Mavic that meant 3840x2160. When the shot was triggered by my phone the resolution was 4000x3000, AND significantly clearer.
Finally, FWIW, Mimicking your method, I flew the Mavic with a controller and tethered iPhone and had the same results as the iPhone via wifi. 4K image with the clarity of video.
So, my conclusion based on those results is that gesture mode is a form of video processing and requires the quad to be in quasi video mode. Therefore, a still image can only be a video frame-grab. Since I no longer have a Spark, I can't duplicate your tests exactly, but to me that's moot, because I don't think I would ever take a selfie with gestures if I had the controller and iPhone connected.