• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL



Dave Raggett W3C HTML Architect on assignment from HP Labs

I first became aware of the seductive power of 3D computer models in 1980 when I was working at the University of Edinburgh on artificial intelligence. The papers by Whitted and Blinn included wonderful color plates of models of teapots, translucent glass, and surrealistic fruits. Computers clearly had the potential to make imagination tangible in a much more direct way than the written word or even movies and television.

Later I learned about the developing field of virtual reality (VR), but it seemed remote with its dependence on very powerful computers and expensive peripherals—something that only a privileged handful of people could play with.

By the early nineties I was working at HP Labs on hypertext ordering systems that converted sales manuals and other paperwork into interactive graphical displays. The project was canceled, but it gave me lots of ideas. I posted a wacky suggestion to the news group "alt.hypertext" for a collaborative skunk-works project on a global hypertext system; soon I was led to the work being done by Tim Berners-Lee and Robert Caillau at CERN and decided to work with them.

I met Tim on a flying visit to CERN in '92, and we talked and talked about ideas for the Web. I soon found myself spending my spare time writing code for browsers, servers, and firewall tunnels. My experience with the HP ordering system led me to write an Internet Draft for HTML+, an early attempt to define a richer model for Web hypertext. My experimental browser, Arena, was the first to support tables, text flow around images, and math. Arena was later transferred to CERN and used as the original test bed for developing style-sheet support for the Web.

NCSA's Mosaic browser—rugged, simple to use, and best of all free—helped to fuel the exponential explosion of the Web. Anyone with access to the Internet could download a browser and a server and within minutes have a Web site up and running! Graphics turned out to be very important, fueling people's creative impulses. Soon every company wanted a Web site of its own.

In the free-for-all that characterized the early development of the Web, rough consensus and running code held sway. Nonetheless, there were a lot of variations from one browser to the next. Every company thought it could introduce new tags of its own. Standardization efforts that started in the Internet Engineering Task Force (IETF) and were carried on by the World Wide Web Consortium have helped stave off the threat of fragmentation.

Standards are built with the hard work and willing cooperation of the parties involved. Sometimes standards cement existing practices, and sometimes they point the way forward. The high cost of maintaining existing code acts as a brake to the smooth adoption of standards. Standards moderate the excesses of the frontier spirit that has characterized the meteoric growth of the Web, and they provide stability to sustain further growth.

In early 1994, I turned my attention to the glittering promise of extending the Web to an interconnected 3D landscape that one could roam at will, to meet others for work, to buy and sell, or just out of curiosity. In part, this interest grew out of my frustration with remote working via telephone conferences, but my imagination had been fired by the new 3D games on PCs, especially "Doom."

Doom was essentially a tedious shoot-em-up game with a few puzzles to solve along the way. What was new was the intense immersive 3D experience you could enter using an ordinary PC. Doom showed that the necessary computing power was on hand and that it wasn't necessary to use the expensive helmets and gloves found in professional virtual reality systems.

It was easy to envisage the use of a 3D Web to bring to life historical scenes (e.g., medieval cities); to realize imaginary places that don't exist anywhere in the real world; to experience wildlife parks and undersea vistas; to visit places you could never normally go—the canyons on Mars, the international space station—or simply to get familiar with somewhere you are going to visit soon.

Around this time, tools appeared that exploited the power of fractals for building computer-generated natural landscapes—clouds, mountains, rivers, lakes, trees and forests, snow, rocks, and vegetation. These models took a long time to compute, but with increasing computer power (courtesy of Moore's law) and some sleight-of-hand, the opportunities looked grand!

Encouraged by my colleagues, I started work on a vision paper for the first international Web conference, to be held in CERN in the summer of 1994, and for the Internet Society Conference a few weeks later in Prague.

One of the sad things about the Internet is that you never have enough bandwidth. This actually got worse as more people started using modems from home. Unfortunately, 3D models involve large amounts of geometry data along with texture files. How, then, could 3D scenes, large enough to be interesting, be downloaded quickly enough to be worthwhile?

Bandwidth is increasing more slowly than processing power, which quadruples every three years, following Moore's law. The solution, I realized, was to avoid transferring such files whenever you could, instead transferring instructions which could be used to locally recreate the data needed by the rendering engine—in essence, to use local computer power to compensate for bandwidth limitations.

Perhaps a markup language could be used to describe models that in turn would be interpreted by scripts to create the 3D scenes? Thus was born the idea of VRML-an extensible virtual reality markup language.

The missing ingredient was an object-oriented scripting language for rendering the markup and adding behavior. Java and ECMAScript have since emerged, and I am now working with ECMA on improving ECMAScript's fit to this role. My "Spice" proposal covers a way to bind novel markup to object classes using style rules (http://www.w3.org/People/Raggett/Spice/).

Using these tools, skilled programmers would be able to define collections of object classes for interpreting matching markup. Less skilled programmers could tinker with the rendering and behavior via simple scripts and style sheets.

To get a feeling for what this means, you can imagine markup for creating a variety of buildings. The markup specifies parameters such as the number of floors, the color and type of wall claddings, the wallpaper in each room, etc. The markup could allow you to specify an optional floor plan if desired. A style sheet binds the tags to the code needed to interpret them. This code can then be downloaded as required. There is no need to standardize the tags, since their meaning is determined by the downloaded code used to interpret them.

The ease of Web-site construction has a critical impact on how many people are prepared to have a go. Most of us want practically instant gratification! Being able to use a text editor is much easier than having to struggle to learn a complex new tool. This observation motivated the idea of using simple markup and scripts.

Like many people, I have been attracted by the power of fractals to generate arbitrarily complex images from the simplest of rules. It occurred to me that this could be applied to synthetic virtual reality scenes other than naturalistic landscapes. One example that sprang to mind was the open-plan office cubicles familiar from every HP site. The cubicles are made from a small set of components and are populated with shelves, filing cabinets, white-boards, desks, computers, and so on. A handful of rules could construct an infinite number of varying cubicles and interconnecting corridors.

Provided that the code used to generate the offices gave the same results on different machines, you could give each cubicle a unique identifier to inform the Web server what personal effects are present in each cubicle—for instance, what books are on the shelves, what is written on the whiteboard, and whose files are on the computer.

In essence, you can think in terms of a biological metaphor, where seeds are transported to be grown into plants. In fact, you only need to transfer the DNA, if the cellular machinery is already present. This approach can be applied to much more than the geometry of a 3D scene. For instance, think of "animals," and you get the idea of DNA determining behavior as well as form. Perhaps the virtual meeting places of the future will be much more "organic" than today's architecture!

Another issue is how to provide realism without excessive computational demands. To date, the computer-graphics community has focused on geometrical and lighting models to achieve realism. An alternative approach could draw upon artistic metaphors for rendering scenes. Paintings are not like photographs, and artists don't try to capture all details; rather, they aim to distill the essentials with a much smaller number of brush strokes. This is akin to symbolic representations of scenes. We may be able to apply this to VR. As an example, consider the difficulty in modeling the folds of cloth on your shirt as you move your arm around. Modeling this computationally is going to be very expensive, but perhaps a few rules can be used to draw in some creases when you move your arms.

I presented my ideas for VRML to a meeting I ran together with Tim Berners-Lee at the CERN Web conference for those interested in exploring the idea of a 3D Web. The meeting was packed—particularly so, on account of Mark Pesce's great talk on the vision he and Tony Parisi had for 3D hypertext. After the conference ended, the VRML mailing list was set up and was swamped by some 2,000 subscribers in a matter of weeks.

I didn't get to read Stephenson's Snow Crash and similar books until much later. Stephenson describes the "Metaverse," a virtual world that replicates many of the properties of the real world including zoning laws and urban sprawl. The "Metaverse" supports virtual presence for its participants, allowing them to congregate together, even though separated by thousands of miles in the real world.

For an effective virtual presence, it would be important to be able to see people's facial expressions in real time with a low latency, preferably as low as 100 milliseconds. Longer latencies would progressively worsen the experience. Computer games have shown that the latency is more important to the sense of engagement than the quality of the graphics!

If you try "pinging" different Web sites, you can see that such latencies are achievable across the Internet, at least within the same country. Perhaps we could use a video camera to capture just people's facial expressions and combine these with a 3D model of people's heads and the rest of their bodies? By a strange coincidence, I found out about some research in the University of Essex that seemed ideally suited to this purpose.

British Telecom had commissioned Adrian Clark to investigate ways to improve the quality of low-bitrate videotelephony. The existing standard smeared people's lip movements so much that lip reading was impossible. The solution was to develop software that incorporated a model of the human head, which it used to interpret the video signal and to provide the instructions for pasting the image data back onto a wireframe model at the receiver. This yielded a 3D model that you could rotate, even though the starting point was a regular 2D video camera.

Model-based coding techniques seem well suited for adoption for virtual presence, since they make it straightforward to create lifelike avatars with minimum bandwidth demands. For instance, head movements can be compressed into simple changes in coordinates. Further gains in bandwidth could be achieved at a cost in accuracy by a real-time characterization of facial gestures in terms of a composition of "identikit" stereotypes (e.g., shots of mouths which are open or closed, smiling or frowning). The face is then built up by blending the static model of the user's face and jaw with the stereotypes for the mouth, cheeks, eyes, and forehead.

Before we can construct the "Metaverse," we must solve a number of tough scaling problems. Rendering engines slow down as the size of the geometric model increases. A realistic model of a skyscaper can run into millions of polygons used to construct scores of rooms, corridors, foyers, etc. Now picture a city composed of thousands of buildings and the difficulty of the task becomes obvious.

Picture yourself inside a building. You see the room you are in, plus glimpses of others through doors, windows, and walkways. If for each room the geometric model identifies the visibility set of other rooms that can be seen from within it, then the rendering engine has an easy way to ignore the rooms that can't be seen. This technique can be combined with others. For things that are far away, a simple model will do. By representing objects at different levels of detail, the rendering engine can pick the level appropriate to the distance of the object and the total complexity of the scene. For outdoor views, the city "smog" can be used to shroud distant buildings, so that only nearby buildings need to be considered.

The information needed to construct the city could come from many different Web sites. If you are a frequent visitor, then the information can be cached locally, perhaps even in advance. There are many possibilities for doing this, ranging from background feeds to using a "standard" CD-ROM. Other challenges arise from the need to manage the information flows used to populate the city with people. Here again the notion of "visibility sets" proves helpful. Your computer needs to receive information only about the people you can see. The use of visibility sets allows servers to route real-time updates to the appropriate clients.

Solving these scaling problems will allow a cohesive 3D Web to appear. Individuals and companies will be able to set up Web sites that fit together to create buildings and cities that function collectively as an effective whole instead of isolated sites. Although I have mostly focused on immersive applications of 3D, there are many other opportunities—on Web pages devoted to mathematics, for instance, where the ability to animate polyhedra can make the topic more vivid and easier to grasp. On pages devoted to cars, 3D models offer potential customers a novel way to experience a new design before visiting a real showroom.

Sadly, my work on HTML and HTTP has left me little time for work on VRML. As you can read elsewhere in this book, the development of VRML has been focused on geometric modeling rather than the higher-level approach I envisaged in my paper. The acronym VRML was soon repositioned to mean "Virtual Reality Modeling Language" rather than markup language. I have been intrigued by the way the fates have brought matters back full circle, with work on Extensible 3D (X3D), VRML's successor, focusing on an XML-based markup solution for representing 3D scenes. The new emphasis on extensible markup, together with the success of Java and ECMAScript, suggests that the 3D Web is entering an exciting phase with wonderful opportunities for further work.

I would like thank Aaron Walsh and Mikaël Bourges-Sévenier for their great efforts in pulling this information together into Core Web3D, and I encourage readers to use this material to bring about the next generation of 3D for the Internet.

  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint