Launch AI Websites under your brand
with 10Web White Label Solutions

Building a Visual Point-&-Click Editor on Top of Agent-Generated React: What it Takes

When a user clicks a hero section and changes the font size, what they see is simple. What has to happen is not: the browser knows what they clicked, but the React source code the agent wrote has no native concept of which element that DOM click corresponds to. Connect those two things and the edit has to travel back into the source without breaking anything around it, and then stay synchronized with what the agent sees the next time it runs.

That’s the problem. It’s a sync problem, running in both directions, across three systems that weren’t built to know about each other.

Building 10Web’s visual editor on top of agent-generated React means solving problems traditional editors never had to face. Traditional editors controlled the code they were editing. We didn’t.

The difference

Traditional visual editors generate the code, own the schema, and know exactly what they wrote. That assumption makes every other problem tractable.

An agent doesn’t write for an editor. It writes for a browser. By the time 10Web’s visual editor needs to operate on that output, the agent is done. Three systems now have to stay in sync: the rendered DOM the user clicks, the React source the agent wrote, and the backend source that persists between sessions. They share no common data structure, and none of them was designed to know about the others. Every problem that follows is a problem of keeping those three aligned.

Connecting a click to a line of code

What it takes: a traceable element identity built into every JSX node before the preview renders.

Nothing in the browser connects a rendered DOM node to a source file. Once code is compiled and served to an iframe, that context is gone. You have to build the bridge before the click fires.

We inject a traceable identity into every JSX element at transform time in 10Web’s build pipeline. It encodes the element’s location in the source. When a click fires, it resolves to that identity, which resolves to the source position. The bridge exists before the interaction begins.

Nested elements complicate this. A span inside an h2, or child nodes with no distinguishing attributes, can resist clean identity resolution. For those cases we support alternate encoding strategies, including position-based encodings for backward compatibility with older agent output. A system that operates on code it just generated can make stronger assumptions. We handle multiple generations of 10Web agent output.

Writing the edit back without breaking the file

What it takes: structurally precise source modification, a decoupled update path, and three layers of state that have to agree.

Text manipulation is the wrong tool. The file is code, and replacing a string at a character position risks corrupting adjacent nodes.

The modification has to be structural:

  • parse the file into an AST
  • find the specific node
  • change the right attribute
  • regenerate cleanly

Speed is the second problem. A full roundtrip takes seconds done naively:

  • parse the source
  • find the node
  • modify it, re-bundle
  • re-render

We decouple what the user sees from when the code actually changes — update the DOM in 10Web’s preview iframe immediately, before the source is touched, then run the AST modification and rebundle in the background. The user sees the change in under 100 milliseconds.

Three layers of state have to stay coherent: the in-memory preview (what the user sees), the Redux edits slice (full updated file contents plus undo history), and the backend source (written only on explicit Save). A visual edit that hasn’t been saved is a change the agent doesn’t know about. The agent works from the backend source, not the client-side Redux state.

The agent constraint

What it takes: constraining the agent to a class vocabulary the editor can parse.

The visual editor doesn’t receive instructions from the agent about which controls to show. It derives that at selection time by reading the element. Tag name and DOM attributes determine which control panels appear. The existing Tailwind classes determine the values those panels display. A button with rounded-md surfaces a Radius slider set to medium. Adjust it, and the new value maps back to a Tailwind class written into the source.

This only works if the agent writes classes the editor recognizes. Arbitrary values outside the predefined token vocabulary produce classes the editor can’t map to a control. So 10Web’s agent is constrained to that vocabulary. A more constrained agent is a more editable output. Generation quality and editability are not independent variables.

What survives when the agent runs again

What it takes: a clear rule for which values belong to the design system and which belong to the user.

In 10Web’s system, visual edits aren’t stored in a separate overrides layer. When a user changes a color, that change is written directly into the TSX source as a Tailwind class modification. The edit becomes the source.

For theme-level values, the rule is: elements set to a named system value (Default, Primary, Secondary, Background) inherit from the global theme and update when the theme updates. Elements a user has set to a custom value become independent. Future theme changes won’t touch them.

The inline-style exception is deliberate. Agents that prioritize visual fidelity, such as when cloning an existing site or converting a Figma file, use inline styles for accuracy. Those outputs sit outside the theme system by design. The trade-off between looking exactly right and being fully editable is explicit, not hidden.

Why this is engineering work, not a product decision

What it takes: committing to operating on free-form agent output before you know the full cost.

Across 2M+ sites built on 10Web, the split between what users prompt for and what they click to change is consistent. They reach for the agent for structural changes. They reach for the editor for precise, local ones. Re-prompting for a 2px margin change is a measurably worse experience.

10Web spent a year on an Elementor integration: a visual editor built on a widget abstraction, layered on top of agent output. The abstraction imposed a ceiling on what the agent could generate, because anything it wrote had to be expressible in Elementor’s widget model. The ceiling was real. We hit it and shut the integration down.

The path we took is a visual editor that operates on free-form React, with no intermediary abstraction. A widget-based editor doesn’t have a DOM-to-source mapping problem, because the widget model is the source. It doesn’t need AST modification or a Tailwind vocabulary constraint in the same way. The harder path produces the harder problems. It also produces an editor with no ceiling on generation quality.

Conclusion

What is your visual editor operating on: a schema your team controls, or code the agent wrote freely?

The first gives you a manageable editor and a constrained agent. Most of the problems above don’t appear. The second gives you an unconstrained agent and forces you to solve all of them.

At 10Web, we chose the second. The problems were real. So is our solution.

FAQ

How does a visual editor know which element a user clicked when the source is React?


The browser has no native way to map a DOM click back to a line of React source. 10Web solves this by injecting a traceable identity into every JSX element at transform time, before the preview renders. When a click fires, it resolves to that identity, which maps back to the exact source position. The bridge is built before any interaction begins.

Why is editing agent-generated React harder than editing code your own team wrote?


Traditional visual editors own the code they generate. They write to a known schema, so they always know where everything is. 10Web’s editor operates on code an agent produced—code it didn’t write and can’t make strong assumptions about. That gap means DOM-to-source mapping, structural file modification, and multi-system state coherence all have to be solved from scratch.

Why use AST modification instead of find-and-replace when writing an edit back to source?


Text-based replacement is too blunt for code. Replacing a string at a character position can corrupt adjacent nodes or hit the wrong occurrence. AST modification parses the file into a structured tree, finds the exact node, changes only the right attribute, and regenerates cleanly. It’s more expensive but structurally precise, which matters when you’re editing source you didn’t generate.

How does 10Web show visual changes instantly if AST modification takes time?


The preview is decoupled from the code change. When a user edits, the DOM in the preview iframe updates immediately—under 100 milliseconds. The AST modification and rebundle run in the background. The user sees the result right away without waiting for the full roundtrip.

When should I use the AI agent versus the visual editor?


Across 2M+ sites built on 10Web, the pattern is consistent: the agent handles structural changes—adding sections, regenerating layouts, rewriting content. The visual editor handles precise, local ones—nudging a margin, swapping a color, adjusting font size. Re-prompting the AI for a 2px spacing change is measurably worse than clicking it directly.

Why did 10Web shut down the Elementor integration?


Elementor’s widget model imposed a ceiling on what the agent could generate—anything the agent wrote had to be expressible as an Elementor widget. That constraint limited generation quality in ways that couldn’t be engineered around. 10Web chose to operate directly on free-form React instead, accepting harder engineering problems in exchange for no ceiling on agent output quality.
Share article

Leave a comment

Your email address will not be published. Required fields are marked *

Your email address will never be published or shared. Required fields are marked *

Comment*

Name *