FRAMINGHAM 1 NOVEMBER 2010 - GeoCities, once the Internet's third most-visited domain, is back from the dead -- unofficially.
When Yahoo announced early last year that the Web hosting service would close on Oct. 26, 2009, digital historian Jason Scott knew he had to act. "For hundreds of thousands of people, this was their first Web site," he wrote in a blog post. "This was where you went to get the chance to publish your ideas to the largest audience you might ever have dreamed of having."
Founded in 1994 as Beverly Hills Internet, Yahoo GeoCities was one of the first services to offer an easy way for early Internet users to publish their own Web pages. Whereas most hosting options of the 1990s were expensive, GeoCities' free hosting space became the home for thousands of sites built around "neighborhoods," including those focused on conservation, fashion, military, sports, finance and travel.
Yahoo bought the service in 1999, but the availability of affordable personal hosting -- including Yahoo's own Web hosting -- led the search firm to announce in April 2009 that GeoCities would be shuttered, with any data that its owners did not personally archive irrecoverable.
Scott, who has stored thousands of text files from the pre-Internet area on his Web site, TextFiles.com, mobilized a 25-person archive team to download as many of the GeoCities sites as they could. Since Yahoo refused to release a list of directories or users whose sites were hosted on GeoCities, automated scripts were used to probe for data, downloading whatever they found over a period of six months. As the shutdown date approached, the team working with Scott learned of other groups doing the same thing. So they started sharing usernames, creating more comprehensive databases.
About 100,000 Web site accounts were captured and saved. "It was like running through a burning building and deciding what to save," Scott said in an interview.
Now, a year later, Scott has released a torrent file on Pirate Bay containing everything his Archive Team saved. The 642GB file, which took two weeks to compress, unpacks to 909GB of content and offers data Scott said will appeal primarily to academics, historians and collectors. It also shows the ease with which the data could be preserved and made available.
Sign up for Computerworld eNewsletters.