As reported by engadget,

It can also draw and combine multiple objects and provide different points of view, including cutaways and object interiors. Unlike past text-to-image programs, it even infers details that aren’t mentioned in the description but would be required for a realistic image. For instance, with the description “a painting of a fox sitting in a field during winter,” the agent was able to determine that a shadow was needed.

“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in complete detail, DALL·E is often able to ‘fill in the blanks’ when the caption implies that the image must contain a certain detail that is not explicitly stated,” according to the OpenAI team.

OpenAI also exploits a capability called “zero-shot reasoning.” This allows an agent to generate an answer from a description and cue without any additional training, and has been used for translation and other chores. This time, the researchers applied it to the visual domain to perform both image-to-image and text-to-image translation. In one example, it was able to generate an image of a cat from a sketch, with the cue “the exact same cat on the top as the sketch on the bottom.”

The system has numerous other talents, like understanding how telephones and other objects change over time, grasping geographic facts and landmarks and creating images in photographic, illustration and even clip-art styles.

For now, DALL-E is pretty limited. Sometimes, it delivers what you expect from the description and other times you just get some weird or crappy images. As with other AI systems, even the researchers themselves don’t understand exactly how it produces certain images due to the black box nature of the system.

Still, if developed further, DALL-E has vast potential to disrupt fields like stock photography and illustration, with all the good and bad that entails. “In the future, we plan to analyze how models like DALL·E relate to societal issues like economic impact on certain work processes and professions, the potential for bias in the model outputs, and the longer term ethical challenges implied by this technology,” the team wrote.

Source link: engadget


Please enter your comment!
Please enter your name here