h a l f b a k e r yNaturally low in facts.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
We provide a last ditch disaster recovery service for websites using archives from Google, Wayback and other online services. Sites crash on a daily basis, some seriously, and an alarming number of these are not backed up. However our experienced team can ensure that these days nothing is lost forever.
Using our proprietory software, they can recreate your site word for word, even if the server and all the backup tapes were destroyed in a fire. While it may not be possible to retrieve more recent material, recovery levels of 99% are not ususual, which would save you 99% of the manhours and cost of rebuilding in case of disaster.
Phoenix.com, the internet safety-net.
Cost-effective solutions for a dangerous world.
.......................................................
Of course I can barely code basic html, but I'm sure others could use this idea... Jutta? Egbert?
Warrick
http://warrick.cs.odu.edu/ Tool to do what you want. YMMV. [gtoal, Oct 07 2007]
[link]
|
|
Phoenix has obviously been around a while. You got that link up while the hb search page on my machine spent about five minutes looking for all instances of "Phoenix", there's a few there. Will drop him a line and tell him to headhunt Jutta. |
|
|
There is an existing phoenix.com that has nothing to do with this idea. Would you mind naming your idea something else before it is misinterpreted as referring to the real company? |
|
|
So, your "idea" is to do the same thing that I did, only ... for money? |
|
|
Yes, I thought about it. Figure it's too obvious to be worth posting. Still figure that. |
|
|
Funny that he never counted himself among those who regularly destroyed their accounts. |
|
|
I've been thinking about a peer backup system. One that uses techniques similar to the movie download sites -- they use it to prevent identification, I would use it for privacy and data protection. |
|
|
Jutta's efforts are indeed worth a mint. [Jutta], if you accept donations or have a favored charity please let me know. (Until then Ill send mine willy-nilly.) |
|
|
I seriously doubt that proprietary software could replace her. I am no programmer, but it must require one to (re)build a custom database from overlapping incomplete collections of unstructured data. Even with a service doing grunt work feeding 99% of plain HTML or custom parsed data, 99% work reduction seems unrealistically high. |
|
|
The burden of work must be in the sophisticated stuff that I don't really understand. |
|
|
I'm sure 99% is unrealistically high, it was more ad-speak than anything. Many sites will be largely uncached anyway for security reasons, especially commercial ones I should imagine. |
|
|
Well, imagine my surprise... |
|
|
We actually use what you might call a peering system where I work. Backups migrate to a central system inhouse, then offsite - automatically - overnight. It's not very sophisticated, but it works very well. There's no reason why multiple archives couldn't be created, bandwidth being the central issue. For true peering, two parties could agree to allow each other the use of a predermined amount of storage. |
|
|
If anyone's interested, I'll see if I can put together a polished version. In all fairness, I've been inhumanly busy lately, so it might take a while to get around to it. Also, there's no facility for security, so you're on your own there. |
|
|
//recovery levels of 99% are not ususual// |
|
|
Umm,shouldn't that say unusual? This idea could actually work, but it would be expensive and would need a LOT of memory. After all, you probably can't store the entire Internet (or at least most of the really important, English language ones)on a few USB drives. |
|
|
Of course it would work - the majority of this site was recreated in exactly this way about three years ago. The technique was thought up by smarter users than me - I just shamelessly cashed in on the idea by suggesting it was turned into a commercial service, which is why I got them well deserved bones. |
|
|
Please don't bump this again - I might just get more... |
|
|
I actually did this last year. Got most of my web site back from various repositories, although I'ld shot myself in the foot a little by having had a robots.txt file that banned the Microsoft spider. |
|
|
A couple of problems: spidering Google to get your own pages back - they have both a rate limit on what you can fetch, and an absolute limit on the number of results they'll present. |
|
|
Also the Alexa/Wayback Machine archive - there's a big gap between the last three months that Alexa will give you, and the historical pages that archive.org will return. I didn't manage to get a lot of my stuff back for six months while I waited for it to work its way through the invisible pipeline from Alexa to Archive.org. |
|
|
There is a tool called Warrick out there that attempts to do multi-service recovery for you but it didn't do much for me. I ended up writing my own. |
|
|
Congratulations on piecing your stuff back together! The way I dealt with googles 1000 result limit back then was by throwing in (different) additional search terms that I knew would only appear in smaller subsets of the site. (Specific usernames, for example.)
I never hit the rate limit. |
|
|
I once lost my mind, but I think I got it
back. De-Boned by one. |
|
|
// The way I dealt with googles 1000 result limit back then was by throwing in (different) additional search terms that I knew would only appear in smaller subsets of the site. (Specific usernames, for example.) // - that's sort of how I did it -- by using the 'allinurl:' tag and walking the hierarchy. Slowly. |
|
|
Note that the warrick tool is no longer a downloadable utility that you run on your own system but is now a web-based service. So I guess the original proposal is now baked. |
|
| |