"Don't paint from nature too much. Art is an abstraction. Derive this abstraction from nature while dreaming before it, and think more of the creation that will result."
- Paul Gauguin
↓ Trailer for "Moving Still" (2022).
/ / What is Moving Still?
Moving Still is a 13-minute, experimental short movie and art installation. It takes on a odyssey through constantly morphing and pulsating nature scenes, with an eerie, dreamlike atmosphere. The visuals, both interconnected and disintegrating, evoke a haunting liminal space.
An evergrowing stem of memories, past or present? Clutching onto silhouettes and shadows in a world too fast to perceive, running into the unknown abyss.
//Project Info data = { "date": "2022-05" //+ continuous work till present "type": "personal project" "contributor": "Benno Schulze"
Moving Still was created as a passion project, stemming from lengthy experiments (more about that in the project insight) with GauGAN Beta and, later on, GauGAN 2. I found the basic concept of being able to produce artificial, photorealistic scenes of nature simply immensely intriguing. What I found even more fascinating, however, were the technical aspects—the inner workings of the GAN. To understand how it works, dissect its processes, test its limits, and use its weaknesses as a stylistic device, rather than trying to create a perfect copy of the reality.
To support and enhance the visual narrative with the use of AI became my primary focus. From what I learned about GANs, I always drew parallels to the human brain: neurons firing, creating artificial imagery right before your very own eyes. You can imagine the shape of a house, the number of windows, the color of the door, and, drawing from images you've seen and environmental influences (essentially the training data), your brain fills in the shapes to produce a somewhat realistic image with ease. Back to the GAN, the strong divergence between it´sindividual video frames stems directly from the limited capabilities of the GauGan Beta (2019) / GauGAN 2 (2021), developed by Taesung, Park et al. at NVIDIA Research AI. Although it is no more available, it was (from my knowledge) the first image generator made available to the public. The GAN (Generative Adversarial Network) was trained on 10 million—unconnected—reference images of landscapes and, as such, lacks frame consistency since video synthesis was never part of its training data.
Even though, I created the first version of the short film back in 2022, since then, I´ve done multiple additions to both the visual and auditive layer and still have things to work and experiment with out of pure joy for the base idea. Some of those changes found it´s way to the project insight.
↓ Further technical input / documentation on GauGAN2
data = { "web-resources": [ "Semantic Image Synthesis with Spatially-Adaptive Normalization", //[Taesung Park;Ming-Yu_Liu;Ting-Chun_Wang;Jun-Yan;Zhu] //[arxiv.org][PDF] "Understanding GauGAN", //[Ayoosh_Kathuria] //[paperspace.com] //[Part1]:_Unraveling_Nvidia's_Landscape_Painting_GANs //[Part2]:_Training_on_Custom_Datasets //[Part3]:_Model_Evaluation_Techniques //[Part4]:_Debugging Training & Deciding If GauGAN Is Right For You "GauGAN for conditional image generation", //[Soumik_Rakshit;Sayak_Paul] //[keras.io] }
/ / Concept
The lack of frame consistency, which results in a surreal, abstract pulsation of shapes and edges, abrupt changes in lighting moods, or even the complete replacement of objects is setting a new layer of narration. The image surface is held together solely by the silhouette and composition of its visual elements. Further discomfort, intentionally evoked in the recipient, stems from the dissonance between various visual elements within a single frame. While the camera pans and objects or trees move, other elements, such as the ground, appear to remain static. Depending on the subjective focus of the viewer, the scenes, despite their linear progression, can therefore have a completely different impact and perceived controllable component. Apart from the image-controlling segmentation maps (LINK), the outcome is entirely left to the GAN. The recipient is watching a virtual, artificial copy of a landscape that never existed, or did it? On another immersive level—parallel to the video—the auditory layer creates its own abstraction of the senses. At first, there are low-frequency sound effects, barely consciously perceptible, such as an almost omnipresent rattling, the playback of memories or video frames, similar to an old film projector. During certain phases, calibrated highs and lows offer the viewer moments to dive in as well as moments to breathe. The intra-diegetic soundscape is enriched with subtle, experimental music elements created by Azure Studios.
↓ Frame 6964 (Moving Still)
<INSIGHT>
Excerpt from the work in progress material, showcasing the steady improvement of quality and animations.
The short film began as an exploration of various techniques using Cinema4D and "GauGAN2." The core idea and workflow centered around creating segmentation maps, where solid colors were used to define shapes and objects.
Each specific hex color code corresponded to a distinct object or material type—such as light blue for the sky, green for a meadow, or gray for a stone. Further direction for "GauGan2" can be given by uploading a style image, as a reference for rough color palette and mood.
Those segmentation maps were created in Cinema4D and rendered out as images sequences, to be processed by "GauGan2".
↓ Segmentation sequence, rendered in Cinema4D
GauGan2 output with 2 different stye filtes enabed. You can notice that the silhouettes are not strictly followed, but rather give the overall composition, while being able to to adjust it. In this example, the small patches of clouds are connecting with eachother on the generated image, even though they are disconnected on the segmentation map.
The web-interface of GauGAN (Beta) around 2020 (it was first released in 2019). As you can tell, it looks kind of rudimentary compared todays GAN´s.
Though one must keep in mind, it was the first ever generative-adversarial-networks (GAN) for artificial image generation, atleast released to the public.
↓ Comment on reddit about the newly released GauGAN back in 2019...2025 here we are
/ / Jumping to GauGAN2 + Automization
I had played around with GauGAN (Beta) a bit but kind of forgot about it. In 2022 I got back to it with "GauGAN2". Initially for an event of Luft & Laune to be used as social media story ad and live visual content on stage.
↓ The web-interface of GauGAN 2 – just as the beta – the processing was outsourced to servers provided by NVIDIA.
While creating still images and short video sequences was enjoyable, I found the long, aggressively pulsating video scenes of nature to be the most fascinating. This came due to the GAN’s lack of frame consistency—unsurprising, given that it was only trained to generate single images.
As we’ve seen before, there’s always some variability in how the GAN processes input, even when using the exact same segmentation map.
/ / Utilizing .py (Python) script for bulk processing
Though the main issue was the web interface, which, at the time, was the only way to use the GAN. It allowed just one upload at a time—you had to click “process,” wait about seven seconds, and then manually download the generated output. Doing this hundreds or even thousands of times would have been absolutely dreadful and mind boggling.
So with the help of @Paul Schulze, I enhanced a Python script—originally created by @gormlabenz—for bulk uploading and downloading of input segmentation maps. Modifications also made it possible to set a style image and execute multiple iterations simultaneously.
import base64
import os
import time
from glob import glob
from tqdm import tqdm
import imageio
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
class Gaugan2Renderer:
def __init__(self, waiting_time=5):
self.waiting_time = waiting_time
self.output_images = []
chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.add_argument("--remote-debugging-port=9222")
#chrome_options.binary_location = "/usr/bin/chromedriver"
self.driver = webdriver.Firefox(
#ChromeDriverManager().install(),
# options=chrome_options
)
def open(self):
self.driver.get("http://gaugan.org/gaugan2/")
WebDriverWait(self.driver, 10).until(
EC.presence_of_element_located((By.ID, "viewport"))
)
self.close_popups()
def close_popups(self):
close_button = self.driver.find_element(By.XPATH,
"/html/body/div[2]/div/header/button")
if close_button:
close_button.click()
terms_and_conditions = self.driver.find_element(
By.XPATH, '//*[@id="myCheck"]')
if terms_and_conditions:
terms_and_conditions.click()
def download_image(self, file_path):
output_canvas = self.driver.find_element(
By.ID, 'output')
canvas_base64 = self.driver.execute_script(
"return arguments[0].toDataURL('image/png').substring(21);", output_canvas)
canvas_png = base64.b64decode(canvas_base64)
with open(file_path, 'wb') as f:
f.write(canvas_png)
def create_output_dir(self):
os.makedirs(self.output_path, exist_ok=True)
def render_image(self, file_path, style_filter_path):
# segmentation map
self.driver.find_element(
By.XPATH, '//*[@id="segmapfile"]').send_keys(file_path)
self.driver.find_element(
By.XPATH, '//*[@id="btnSegmapLoad"]').click()
# custom style filter
self.driver.find_element(
By.XPATH, '//*[@id="imgfile"]').send_keys(style_filter_path)
self.driver.find_element(
By.XPATH, '//*[@id="btnLoad"]').click()
self.driver.find_element(
By.XPATH, '//*[@id="render"]').click()
def run(self, input_folder, style_filter_path, output_path):
self.image_paths = glob(input_folder + "/*.png")
self.output_path = output_path
self.open()
self.create_output_dir()
for file_path in tqdm(self.image_paths):
file_path = os.path.abspath(file_path)
basename = os.path.basename(file_path)
output_image = os.path.join(self.output_path,
basename)
self.render_image(file_path, style_filter_path)
time.sleep(self.waiting_time)
self.download_image(output_image)
self.output_images.append(output_image)
self.driver.close()
def create_video(self, output_video):
images = [imageio.imread(image) for image in self.output_images]
imageio.mimsave(output_video, images, fps=10)
A quick test involved using shapes that didn’t align with their designated "colors" (object/material types, such as stone). I noticed that all objects and materials on the segmentation map seemed interconnected. For example, if a small patch of snow was placed in the foreground, trees in the background would also appear snow-covered, even if the segmentation map didn’t explicitly include snow in those areas. Same with fog in the examples below.
↓ Fog + building
↓ Fog + stone
↓ Fog + stone
↓ Fog + tree
Same test but with a shark I rigged and animated
↓ Building + clouds
↓ Fog + building
↓ Fog + clouds
↓ Moutains + clouds
/ / Starting the journey
Over time, I kind of figured out what works and what doesn’t, discovered a visual aesthetic, and developed a visual narrative and perception I was excited to explore more deeply.
However, a major issue persisted. As I talked about before, every element of the segmentation (e.g., dirt) is connected to the other elements on it (e.g., snow). But when similar elements on 2 otherwise different segmentation maps are visible, even though the elements differ in size and location, the segmentation map seems to act similar to a masking process.
This means that if the bottom half is covered in light blue, representing straw, this part — in its output — will almost always have the same look [1]. One could even say it’s the same picture. Even if the pattern is broken up by smaller dots, like stones or bushes (in the segmentation input) [2], it still remains unchanged, as it only seems to include parts if they reach a certain size threshold.
And this isn’t an isolated issue with just this combination of elements — it happens with almost anything. This could be due to several factors: insufficient variation in training data, issues with the seed (which basically adds a randomness factor to the result), or something with the script utilized for bulk processing.
Regardless, when attempting to create a moving scenery, it becomes obviously distracting — perhaps even nauseating — when some elements appear to move along while others, like the ground, seem to remain still, at least with this degree of persistence.
Over time, I kind of figured out what works and what doesn’t, discovered a visual aesthetic, and developed a visual narrative and perception I was excited to explore more deeply.
However, a major issue persisted. As I talked about before, every element of the segmentation (e.g., dirt) is connected to the other elements on it (e.g., snow). But when similar elements on 2 otherwise different segmentation maps are visible, even though the elements differ in size and location, the segmentation map seems to act similar to a masking process.
This means that if the bottom half is covered in light blue, representing straw, this part — in its output — will almost always have the same look [1]. One could even say it’s the same picture. Even if the pattern is broken up by smaller dots, like stones or bushes (in the segmentation input) [2], it still remains unchanged, as it only seems to include parts if they reach a certain size threshold.
And this isn’t an isolated issue with just this combination of elements — it happens with almost anything. This could be due to several factors: insufficient variation in training data, issues with the seed (which basically adds a randomness factor to the result), or something with the script utilized for bulk processing.
Regardless, when attempting to create a moving scenery, it becomes obviously distracting — perhaps even nauseating — when some elements appear to move along while others, like the ground, seem to remain still, at least with this degree of persistence.
↓ [1] Issue visualized: Similar output, different input
↓ [1] Issue visualized: Similar output, different input
↓ Issue visualized: Lack of power to recognize camera movement
The learnings from this are, that the ground / overall elements need to be: A: So small / far away, so that the difference between frames next to each other is big enough, so that the output given has a distinguishable look when compared to each other.
B: Can´t be too small, as from a certain treshhold on (about min. size about 15x15px of a 512x512 full resolution input map), elements are no more processed.
C: Big Elements, such as the ground need to be constantly broken up with various DIFFERENT elements, (represented as colors in the segmentation map) in order for the camera movement to be recognized by the recipient.
↓ Issue visualized: Lack of power to recognize camera movement, less visible due to low contrast with the sand texture
In the example below, you can tell that fixing the problems mentioned above (regarding segmentation map) substantially improved the output given by GauGAN2.