Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP

HP.com home

Capturing China's knowledge:

Ancient Terracotta warriors, scientific discoveries, even 2008 Olympics to go online

HP Labs

» Research
» News and events
» Technical reports
» About HP Labs
» Careers @ HP Labs
» Worldwide sites
» Downloads
Content starts here
China is using the DSpace archiving system to preserve digital copies of millions of objects from its museums and universities.
By Anne Stuart, November 2007

As the world’s most populous nation, the People’s Republic of China rarely does anything on a small scale — and its efforts to share its cultural and academic treasures with the rest of the world are no exception.

Using DSpace, a digital archiving system that HP Labs researchers helped create and continue to support, institutions throughout China are putting literally tens of millions of objects online for the first time. Those objects – or more accurately, digital copies of them – range from up-to-minute scientific research reports to historic film clips and photos of traditional Chinese sporting events to centuries-old calligraphy and paintings.


gif gif gif gif

Learn more

» DSpace
» DSpace Foundation director has ambitious goals
» DSpace blog
» Technical review of DSpace China project
gif gif

News and events

» Recent news stories
» Archived news stories

One Chinese DSpace initiative involving 18 major universities is well on its way to archiving up to 90 million objects.

A second, scheduled to coincide with the 2008 Summer Olympic Games in Beijing, will hold two terabytes of information – which, depending on how the information is compressed, is roughly as much content as you’d find in 1,000 feature-length movies, 300,000 photographs or 500,000 song-length music files.



Preserving digital knowledge

The Chinese projects are the most ambitious ones yet for DSpace, a digital archiving system for capturing, preserving and indexing academic information so that people anywhere can easily find and retrieve it. The open-source DSpace ("digital space") software, initially developed by researchers at HP and the Massachusetts Institute of Technology Libraries, provides access to a global group of repositories for digital data, text, images, audio and video.

  Chinese Relic

Worldwide, some 265 universities, libraries, museums and research centers have stored more than one million documents on the DSpace platform, making each piece of information available far beyond its original organizational and national boundaries – at no cost to those who access it.

Researchers began thinking about DSpace seven or eight years ago, says Nick Wainwright, research director of the HP Labs Digital Media Systems Group in Bristol, U.K., who oversees HP’s involvement in the effort. "At that time, it was becoming obvious that most of the new knowledge that we were dealing with was in digital form,"Wainwright says. (see Preserving digital data for the ages Aug. 2003).


Worldwide DSpace community

In the past, research reports, presentations, theses, dissertations and other scholarly work started out on paper or in analog format; today, most such information starts out digital and stays that way. Although some institutions had already begun grappling with how to internally manage what Wainwright calls "their newborn digital intellectual property," most hadn’t gotten as far as figuring out ways to make such content available to outside researchers.

Enter the HP-MIT Alliance, a groundbreaking research partnership formed in August 2000. Business and academic scientists teamed up for numerous research projects, including a $1.8 million initiative to develop a digital archive for managing scholarly data at MIT.

That effort evolved into today’s DSpace, an open-source archive that now houses – and makes available to others – content from institutions in 43 countries. (Overseeing that community: the DSpace Foundation, a new Cambridge, Mass.-based nonprofit organization jointly launched by HP and MIT in mid-2007.)

Not surprisingly, China is currently among DSpace’s most enthusiastic participants. Attracted by DSpace’s easy-to-use open-source technology, Chinese officials and researchers selected DSpace to support two national initiatives: A digital museum project to archive knowledge from the nation's top 18 universities and an effort to preserve photos, audio, video and more from the 2008 Summer Olympics being held in Beijing.


China Digital Museum

In the United States and other Western countries, museums and universities are typically separate institutions; in China, universities oversee many of the most important museums. Now, working with HP researchers and the Chinese Ministry of Education,18 top Chinese universities are using a DSpace-based system to archive knowledge about biology, anthropology, the geosciences and technology.

  Capturing 3D object

The Chinese researchers digitize not only scholarly documents, photographs and audio and video content, but the actual artifacts in their museums’ collections as well, often in three-dimensional formats. For example, researchers did "wraparound" scans of part of China’s famous terracotta soldier statues; DSpace users can spin the digital images to view the 2,200-year-old life-sized warriors from various angles. Similarly, they can read scanned books, page by page, or zoom in for a closer view of a mineral specimen, a technical diagram or a hand-painted porcelain bowl.

The effort, launched in 2001, has been so successful that Chinese officials plan to build 30 similar virtual museums over the next three to five years.


Virtual Olympics Museum

This project, supported by HP’s University Relations team and being developed by technologists at Beihang University in Beijing, will create a digital archive for the 2008 Summer Olympic Games in Beijing. Plans call for the DSpace-based museum to offer more than two terabytes (about two trillion bytes) of content – and on far more than the 17-day extravaganza of August 2008.

Using a standard Internet connection, visitors worldwide will be able to access digital photos, audio and video clips and virtual environments providing information about modern and ancient Olympic competitions and traditional Chinese sports.

Initially, the archive will provide the information in four languages – Chinese, English, French and Russian; plans call for adding more languages down the road. The content will be available free to anyone who wants it – and, researchers hope, for all time.



Ongoing research

Meanwhile, researchers face plenty of challenges as they continue moving the DSpace software toward its next release (DSpace 2.0, due out sometime in 2008).

"We’re working toward making it into a more powerful and responsive system with more content – and more types of content – that can be more easily researched and retrieved on the Internet," Wainwright says.

  3D capture of a terracotta warrior
  • Increasing storage capacity  – DSpace users are likely to keep archiving space-gobbling content such as 3-D models and sophisticated audio and video files. As a result, the DSpace platform must be able to scale up quickly and automatically to meet such demand.
  • Capturing ephemeral content – With blogs and wikis and videos and e-mail and other programs, Web content is changing more frequently than ever before. The team is looking at ways to capture that content so there’s a definitive record of it at every stage – and the version that you want is always available."
  • Protecting content integrity   – When you squirrel something away in the archives, you want to make sure that what you get out later is an authentic version. To that end, HP’s DSpace researchers are adapting technology developed by HP’s Trusted Systems Lab to create, as Wainwright puts it, "a chain of trust for showing the integrity of any piece of content."

That last area of research is also critical for helping DSpace achieve one of its founding goals: preserving content so that it can be easily and completely accessed down the road—even if its original format is no longer in use.

"We want to make sure that the information that went in 2007 can be read in 2017 – and beyond – in other formats," Wainwright says. "And we want to make sure it doesn’t lose anything in the transformation."

Printable version
Privacy statement Using this site means you accept its terms Feedback to HP Labs
© 2009 Hewlett-Packard Development Company, L.P.