A Creative’s Guide: Lessons from a Year Collaborating with AI — Will It Take Your Job and How to Utilize It

zzyw
18 min readJan 22, 2024

by Yang Wang 黄瓜 of zzyw

(This article was originally written in Chinese and subsequently translated into English.)

Note: In this article, when I talk about AI, I’m specifically discussing the capabilities of GPT-4, Claude 2, MidJourney 5, and DALL-E 3 as they stood in November 2023. I’ve steered clear of future speculation, and any forward-looking statements are grounded in the technology as it exists today. So, the AI functionalities described here are limited to their state at the time .

Recently, numerous discussions have emerged about AI’s role in art and creativity, particularly following the rise in popularity of image-generating AI like MidJourney.

Various reasons have spurred media interest in sensationalizing narratives about job replacement and even eradication. However, current community discussions often occur within information silos — artists speak to artists, technologists to technologists. To date, I have yet to encounter a comprehensive and fair discourse in either Chinese or English circles.

Coincidentally, the surge in these AIs aligned with our research phase in the early to mid-stages of Other Spring (abbreviated as OS). Our engagement in reading, writing, and conceptual design was intense, and exploring new technologies was part of our job. Therefore, we naturally began integrating these new tools into our projects, particularly during the ideation process. MidJourney’s text-to-image capability was instrumental in visualizing many novel ideas. During our research into cryptography and network technology, ChatGPT became my most interacted-with ‘entity’ in January, surpassing even my conversations with Zhenzhen. Other Spring was a substantial project, nearing completion after nearly a year. Thus, I dare say we might be among the creators who have most deeply ‘collaborated’ with these AI tools.

Due to this experience, I feel not only qualified but also obligated to share some of my thoughts and experiences. This is the reason behind my decision to write this more informal piece.

Some readers may be familiar with our more academic articles on AI and automation, where we express critical views on the current technological paradigms that dominate industry thinking. However, today’s article avoids broader theoretical, political, and philosophical discussions. Instead, using project OS as a case study, I will practically examine AI’s capabilities and limitations based on my experiences over the past year, along with its potential impact on creative professionals.

This article is intended for artists, designers, writers, and anyone involved in creative professions. Finally, I warmly welcome contributions and corrections from AI practitioners, be they product managers, engineers, or investors. Let’s learn together.

What exactly can AI achieve?

The performance of AI can vary drastically depending on its application. Let’s break it down with specific examples.

1.1 Research-Oriented Use

Since OS is a research-oriented project, the first four months of the year-long project were dedicated to research. There was a course at ITP (Interactive Telecommunication Program, a graduate program in New York University, where I attended) called “temporary expert,” which is a fitting description of this phase.

OS is a “world building” project, and we wanted its framework to be rooted in the current technological feasibility rather than being a soft sci-fi created solely for artistic effect. During the initial months, I immersed myself in studying cryptography and network architecture. In this process, ChatGPT proved to be immensely helpful. The most challenging part of research often comes at the beginning — not knowing what you need; when facing a new field, you’re unsure which books and concepts to start with. ChatGPT functioned as an excellent index in this regard.

Since the research phase didn’t involve much creative work, its core was learning the knowledge that already exists. I spent a lot of time asking ChatGPT numerous questions, then reading its answers, followed by a detailed exploration of specific Wikipedia entries, and even delving into deeper explanatory videos or articles. The goal was to acquire sufficient introductory knowledge of a field in a short time. Having ChatGPT by my hand was akin to having a university graduate in that field ready to answer your questions. Of course, since AI can occasionally spout nonsense and isn’t always sharp, in the very early stages, I actually met with a few (human) friends for guidance. They pointed me towards cryptography, which was the direction I needed. With this focus, using ChatGPT became more purposeful. Another research direction was specific network architecture, like understanding the exact working mechanisms of current network protocols — how a message from one user undergoes various processes, nodes, and protocols before reaching the recipient. This kind of objective, textbook-like knowledge is very helpful to learn through ChatGPT. The biggest advantage is that it’s easier to delve deeper, as we all prefer conversations over dry textbooks.

Some specific examples: I wanted to quickly grasp some current concepts in cryptography, like Differential Privacy, Network Topology, Federated Learning, Onion Routing, Data Sovereignty, Digital Self-Determination, and so on. On one hand, AI helped me discover the existence of these concepts or technologies, and on the other hand, it explained their basics, reducing the intimidation of plunging into lengthy Wikipedia wall-of-text. In terms of knowledge research, I find AI extremely helpful.

1.2 Creative Work

After several months of research comes the part commonly referred to as “production.” This includes two aspects: multimedia (architecture, visuals, sound) and text writing. Let’s discuss them separately.

1.2.1 Image Generation

Early Conceptual Design

After the research work was completed, we began transforming the research output and our settings for this virtual world into visible visual structures. The whole OS process resembled a conceptually focused urban or architectural design project. We even allocated a (major) portion of the project’s budget to form a team, including two exceptionally talented architectural designers.

Integrating abstract concepts like Differential Privacy or Proxy Networks into architectural design was both an intriguing and challenging task. Moreover, as a video production, OS’s output needed to be not only structurally sound but also visually symbolic, conveying the underlying concepts intuitively. Thus, in the early stages of design, we attempted to use MidJourney for extensive visualization work to aid in brainstorming.

This is where the AI exposed its most significant shortcoming for creative workers: its taste (using this word for lack of a better term) was rather mediocre. For example, if you give it an abstract and open-ended task to generate some concept images, like asking it to produce an idealized utopia for OS, the results from DALLE3 were as follows:

I deliberately used a rather abstract and open-ended prompt:

“Generate a picture featuring a village situated inside a mountain. The village should appear ancient Chinese yet somewhat futuristic, filled with a colorful ‘computational haze.’ There are futuristic technological apparatuses around the village, which is insulated by the haze.”

Whether in its imagination of ancient Chinese architecture or its interpretation of futurism, the AI’s approach was very literal. So, if you have a rough idea and expect AI to help with genuine imagination and design, you are likely to end up with a result that is a literal interpretation of your request filled with the most boring stereotypes.

Sketch and Prototype Phase

After recognizing the significant limitations of AI, we reverted to a more traditional design process. This

d collecting and studying the works of other designers, especially drawing inspiration from Solarpunk and organic architectural design. Through keyword searches, we discovered many incredibly inspiring and exciting works.

Shown below are some reference images collected during the design process. You can see their profound influence on our final outcome.

Upon encountering these designs, we integrated them into our library and then engaged in many rapid iterative design attempts.

In the mid-stage of the design process, once we had some more concrete ideas, we returned to using AI to help us refine our designs. At this point, it was less about letting AI design and more about using AI to visualize our designs. This was very helpful for testing some ideas and also enabled us to quickly affirm or reject some early design concepts.

Stylization and Rendering Tests

In the middle to late stages of the project, when we had more concrete design ideas, we revisited some AI tools.

We found a particularly effective method: using our preliminary design drafts as input and then blending them with reference images of other works we admired. This approach produced several “renderings” that we were very fond of, such as this one:

It is actually a combination of these two images:

The image on the left is a design concept for the village by our architect Junling, created during the mid-to-late stages. The image on the right is a conceptual design by concept architect Joris Putteneers,which we found online and really liked. The combination produced by MidJourney further helped us visualize a possible final style for our existing models.

Therefore, it’s essential to have a clear vision. When engaging with AI, it’s best to already have a well-formed idea; otherwise, either the originality of the work is compromised, or something even more important is lost — your own creative voice.

1.2.2 Fiction and Creative Writing

OS is a docu-fiction type of video work, similar to a documentary form, thus requiring some narrative voice overs. This includes parts like “the monologue of the protagonist “Huan’’ or “the lecture records of the architect,” involving both colloquial and poetic, formal expressions. Throughout the writing process, I tried various methods, such as changing prompts, but still encountered issues with clichés. For example:

In the early stages of the project, we had only decided on the “Peach Blossom Spring” as the starting point of the story. Purely for experimentation, I once gave ChatGPT a task:

“Here’s the world overview of my docu-fiction work. Can you write a story inspired by the Chinese tale ‘The Peach Blossom Spring’? The story should be set in the background I’ve sent you.”

(World setting omitted)

ChatGPT’s response was something like this:

The protagonist lives in an oppressive, authoritarian society, constantly under surveillance. One day, a Glitch suddenly pulls them into a strange village where people live freely. Then, the protagonist discovers the village is in danger from the main world wanting to destroy it. So, they unite with the villagers to eliminate the threat and live happily together.

This story sounds somewhat too familiar, right? It’s a typical Hollywood trope.

So, I tried to guide it towards some revisions, like

“Can you make your view of good and evil more nuanced?”

ChatGPT then gave me a very obedient new version, summarized as:

The protagonist lives in a complex, oppressive, authoritarian society, constantly under surveillance. One day, a Glitch suddenly pulls them into a strange but morally ambiguous village where people live freely. Then, the protagonist discovers the village is in danger from the main world wanting to destroy it. But the main world has its nuanced reasons. So, they unite with the villagers to eliminate the threat and live happily together.

Basically, it just applied some literal patches… without fundamentally satisfying the need for revision.

There are more examples, but if you pay attention, you’ll notice, as with the images we previously mentioned, if you give AI a general proposition to work with, the results tend to only reach the level of existing clichés. If you have a more specific goal and vision, AI can help you test some ideas faster.

Overall, AI’s generative capabilities in text lag behind those for image creation. Image generation models benefit from technical advantages — 3D and 2D image processing software remain inaccessible to many, so AI art tools democratize creative potential. However, text operates as a more discrete medium of human expression. As such, the boundaries of AI’s creativity become more visible.

1.2.3 Academic Writing

In my process of academic writing, the role of AI can be broadly divided into two phases: research and revision.

During the research phase, AI’s handling of specific academic field questions differs slightly from general research. Academic fields are highly specialized, and AI’s performance in surfacing niche literature can be limited. For example, when I sought critical media analysis of algorithms & computing in academia, ChatGPT referenced a popular press book by Eli Pariser. While a worthwhile read, this recommendation misses the mark for academic literature.

The quality of AI search results depends heavily on the phrasing of prompts. But this illustrates that AI sometimes conflates academic texts with more mainstream, digestible writings.

Other assessments of the AI research phase align with prior discussion 1.1.

For revising academic writing, we expect AI to flag issues like logical gaps, incoherent sequencing of ideas, inferential leaps, or weak arguments. However, performance here remains underdeveloped, so we should not yet overly rely on AI for structural improvements. That said, AI editing can effectively polish grammar and punctuation.

1.2.4 Conclusion: The Major Issue

Clichés are a significant problem faced by current AI models.

Presently, when tasked with creative endeavors, the output of these models tends to be formulaic, oppressive, and lacks vibrant originality. It’s as if they lack their own creative identity. Descriptive terms that might apply include unauthentic, unoriginal, and stereotypical. If I were to assign a title to the current state of AI, I would call it a global aggregator of clichés.

If your job involves producing mainstream content by reusing established, popular ideas, then this might help you (or allow your company to automate your role). However, if you’re reading this, I’d estimate its creative potential realistically falls somewhere between limited and moderately useful.

To be fair, expecting it to nail our vision in one go might be too harsh, as even humans can’t always produce satisfying results on the first try. The real issue with AI currently is its weak understanding of feedback. As illustrated above, its comprehension is superficial, and we can’t incrementally suggest improvements to enhance the quality of writing or images until they reach the desired standard.

This problem varies across products from different companies. In general, for creative tasks, OpenAI’s products have the most severe issues, while Anthropic’s Claude is somewhat better.

This problem should improve with the advent of more fine-tuned models in the future. However, if your field is very niche, or if your work is already pushing the boundaries of the field, it’s hard to imagine any company having enough incentive to fine-tune a model of limited commercial value. This brings us to the importance of community and cultural localization in technology (not geographically but culturally localized), but let’s discuss that later.

1.3 Technical Development

I’m not sure if ChatGPT has been specifically optimized for programming, but its help in coding is a qualitative leap. Whether it’s completing a function from a TODO list, copying and pasting error messages for direct solutions, or generating API or other technical documentation, its performance is pretty good. I can hardly imagine coding without its assistance anymore.

However, its role is still that of an assistant. It’s excellent for addressing specific issues, but when it comes to overall system architecture planning or in-depth problem-solving (beyond just patching), human intellect is still superior.

That said, nearly every programmer I know from both small and large companies interacts with it daily. A small sample study conducted by GitHub claimed that CoPilot (an AI based on GPT-4 tailored for programmers) could boost productivity by ~20%. In some scenarios, this doesn’t seem exaggerated, but it’s just some scenarios.

Will Your Job Be Affected?

First, let’s discuss predictions about the development of artificial intelligence tools in the coming year.

Despite last year’s petition signed by numerous celebrities and scholars calling for a six-month halt in AI research, major companies have not slowed down their development of generative AI, but rather accelerated it. Currently, it seems that there won’t be a qualitative leap in the optimization of large models in the short term. However, an expansion in the context window (which can be understood as the memory capacity of AI) is evident, growing from the previous 8,000 tokens to several tens of thousands or even hundreds of thousands. Some of the more advanced models can now read an entire book.

It appears that the focus of the industry is on developing the interoperability of AI, integrating other existing services and functionalities to make it “more useful.”

Based on the current shortcomings of generative AI mentioned above, it is foreseeable that jobs not involving creativity or decision-making are more likely to be affected.

Here, creativity is broadly defined. It doesn’t mean that jobs producing media (like images, sounds, etc.) are necessarily creative while text production or management jobs are not. Essentially, if your job follows a fixed process of identifying A, executing B, then moving to C, and is highly routinized, it is more likely to be affected. This is not specific to any particular type of job.

Let’s look at a few examples:

Technical Roles

Take software engineers at large companies, for example. The culture of each company and the responsibilities of each team vary. Some engineers’ jobs are closer to that of a product manager, involving a lot of decision-making and organizational work. Such engineers might not be greatly affected by automated programming tools.

However, engineers primarily tasked with execution, such as those implementing features based on given UI designs, might gradually feel the impact of these tools. Therefore, engineers should leverage Gen AI to enhance their efficiency while also broadening their knowledge to be more holistic (integrating not just your company’s technical status but also regarding collaboration, market insights, product knowledge, etc.) to make high-quality decisions and make it difficult for AI to replace them.

Designers

Design is too broad of an umbrella term; speaking broadly, anyone who ‘designs’ could be called a designer. So what I am referring to here is limited to the type of designers whose job titles include the word ‘design’ and who work with creative software from companies like Autodesk or Adobe on a daily basis to fulfill the major duties of their jobs.

Designers are typically visual creators, whether in architecture or graphic design. Therefore, text-based models cannot completely replace a designer’s job. However, designers often need to conduct extensive research, so becoming proficient in these tools can improve research efficiency and competitiveness.

Now to the main point: do current image-generating AIs have the capability to replace designers in the short term? I don’t think so, at least not in the next year or two. Current image-generating AIs are based on Diffusion models, which inherently limit the images they produce — for example, the output images are always rectangular, and they struggle to produce specific shapes accurately. Even Adobe’s Firefly model, which is experimenting with Text to Vector Art, requires a designer’s modifications before entering production.

Here’s an example. There was an “Other Spring” exhibition in Shanghai recently. Many visitors found the world setting of the exhibition complex and hard to fully understand from the film alone, so I wanted to create a large poster. After several days of modifications, the composition and information flow of the poster were roughly determined. Just then, OpenAI released the latest DALL-E3, and I decided to test my settings to see if it could achieve the desired effect. The result showed that while it could handle the main subject, it became chaotic once it involved details. Below are some records of my interaction with ChatGPT for demonstration purposes (the specific records from that time were actually lost; this is just a reenactment for demonstration. By the way, my ChatGPT is named 73… don’t get hung up on these details).

Everyone can judge for themselves how well it follows instructions when it comes to generating images.

In summary, it’s difficult to say that these models could completely replace designers with generative models. However, I can envision many initial conceptualizations, mockups needed for client presentations/pitching, gradually being dominated by an increasingly diverse array of generative AI tools.

Another point of interest is when generative AI will have the capability to produce 3D models. Many teams in the industry, particularly those at big companies like Nvidia or Meta, are researching this area. However, based on their published papers, there’s still a long way to go. Whether it’s accuracy, capability, or generation speed, all are still in the early stages of exploration. So, it’s unlikely that high-precision 3D generation models will emerge in the next year or two. Even if they do, given the current level of AI’s ability to follow instructions, it’s unlikely that they will replace designers proficient in Rhino/Maya or other 3D production software anytime soon.

Editors and Writers

Before ChatGPT, Grammarly was almost an everyday tool for everyone working with text. However, for basic text editing, like correcting typos and grammatical issues, this is exactly the forte of Gen AI. If your job is solely proofreading, you might already feel the impact. However, the role of an editor often includes a lot of creative work, sometimes even making more decisions than the authors themselves. As I’ve said before, the more repetitive and purely executive parts of your job are, the more likely your job is impacted. Ideally, the rise of generative AI could liberate editors from the most tedious parts of their work, allowing them to focus on more interesting and impactful decisions.

But this is an “ideal world,” and sadly, we live in a world that pursues profits and is gradually moving towards a TikTok-style media consumption. In the past month, some reputable traditional media have been caught using AI-generated articles as substandard replacements. Although they have deleted the published articles, Pandora’s box has been opened. It’s hard to imagine a large number of non-top-tier media rejecting the temptation from the cost reductions brought by AI. However, as a reader of this article, you may not yet be in crisis, but adjusting your work process is necessary.

Summary

  • AI is very effective in retrieving knowledge as a replacement for search engines, significantly speeding up various types of research.
  • AI-generated content, whether factual or fictional, tends to be clichéd.
  • AI currently follows simple instructions well, but once the instructions become complex or layered, it struggles.

Final Thoughts

We’ve finally reached this point. Our present technological trajectory (focused on resource extraction and automation) and our current economic system (yes, I’m looking at you, 21st-century capitalism) have always been the targets of our critique. While crafting the previous analyses, I kept reminding myself to stay on track: “Today, I am here to impartially report on the capabilities of AI, not to engage in media criticism.”

However, as I wrap up, I feel I’ve earned the right to share my perspective. So, here are some thoughts on navigating our challenging world.

1)To make a difference, you must first ensure your own survival.

No matter your stance on AI, it’s crucial to learn about it. Try to use it daily, weaving Gen AI tools into your workflow, be it with text, 3D, programming, or anything else. Gen AI can boost your competitiveness. The general belief in the industry is that AI won’t replace all jobs, but it will replace those who don’t adapt to working alongside it.

Moreover, to critique something effectively, you must first understand it thoroughly, to earn the right to your criticism.

2)If possible, start exploring programming and machine learning more deeply.

If you’re even slightly interested in programming, I’d recommend investing more time in learning it, starting with small programs or scripts. Not everyone will love coding, but some knowledge of software can transform your perspective on numerous issues, whether you’re a curator, artist, or editor.

Silicon Valley companies churn out pure neoliberal products, and problematic design philosophy books like “Don’t Make Me Think” and “Hooked: How to Build Habit-Forming Products” have become influential, partly because many talents from the Arts & Humanities haven’t penetrated the tech industry. I’m not pointing fingers. This difficult fight requires us to do whatever is helpful. Plus, coding can be incredibly rewarding!

3)The practice of using “public” data to train Gen AI models is an infringement on human content creators.

Scraping artists’ works or our online chats, blogs, posts, and any published texts without the explicit consent of the creators (us) is a violation. The iOS ATT (App Tracking Transparency) policy is a good example. When given a choice, over 85% of users opted out of being tracked. Before ATT, it wasn’t that users consented to tracking; they simply lacked a mechanism to prevent it.

I believe if users were asked through a pop-up whether they’d like their content used for AI training when posting on platforms like Reddit, Medium or Stackoverflow, most would say no. These companies access our data not because we willingly provide it, but because we don’t expect our freely shared information to be harvested by a commercial entity to nurture their digital “deities.”

AI companies are likely aware that they operate in a legal gray area when it comes to consent for using people’s data. In an era where “technological innovation” is often uncritically glorified, the status quo tends to favor them. However, just because something may be currently legal or normalized does not mean it is ethically acceptable for us to passively enable it. There is also a common argument that if individual humans can freely absorb information and creativity from public sources to advance their knowledge, AI systems should be able to do the same. But there is a key difference — humans learning in this way are individuals freely expanding their horizons, not commercial products turning knowledge into proprietary assets and ultimately profits. The existence of free will in the political sense undermines this argument, as humans can choose how to use absorbed information in more nuanced ways than AI systems designed for commercial interests. The lack of sound reasoning makes this a flawed justification.

I also don’t completely disagree that by utilizing products spawned out of legally and ethically dubious means, we could be called complicit. To be completely honest, I don’t know a better solution within current flawed economic and political systems. But I must insist that the origin story of these tools should persistently sound an alarm, whenever we open it. The products may seem immediately useful, but in using them we must not forget the labor that went into creating the content that was used as their training materials and remain respectful. By treating these as training data, they have already undergone a process of flattening and we mustn’t continue to treat the content that way.

Ultimately, exploiting existing data for profit without consent is unethical, no matter how much it may benefit AI progress. Distinguishing right from wrong remains important, even when taking action against normalized violations seems difficult. I hope we do not lose sight of this principle going forward.

--

--

zzyw

zzyw is an art and research collective producing software, installations, and texts examining the cultural, political and educational imprints of computation.