Computer: Searching
Text search in searched text   (+2)  [vote for, against]
Search in websites that I have searched

I remember something I saw but can't find it. I want to search in the texts referred to by the website list in my browsing history.

(Or in any list of websites for that...)
-- pashute, Dec 08 2020

grep https://en.wikipedia.org/wiki/Grep
"a command-line utility for searching plain-text data sets" [8th of 7, Dec 08 2020]

I think you can run a search against your google account search history - but for that, you need to have fairly reliably run all your searches under your google username - which might put off some. It seems to be an option here in Chrome, might not extend to other browsers, I'm not sure.
-- zen_tom, Dec 08 2020


How do I do that? I don't mean searching for a phrase in the history list, I mean to search for the phrase in the websites themselves, those that my history list refers to.
-- pashute, Dec 08 2020


My bad - just tried it and the secondary search only works on webpage _titles_ that appear in your history, not the original full-text. Sorry, as you were.
-- zen_tom, Dec 08 2020


grep.

<link>

If your device doesn't support grep, then you need to get one with a proper operating system.
-- 8th of 7, Dec 08 2020


I'm in the habit of saving any interesting pages I visit; it conserves bandwidth, defends against bitrot, and incidentally solves [Pashute]'s problem.

As Socrates said, Ya cain't grep what ya hain't kep'
-- spidermother, Dec 09 2020


I too am a webpage hoarder. Do you think we should form a support group?
-- pertinax, Dec 09 2020


I've been archiving news on a daily basis for the last 3 years - being able to enter a search term, and see how it appears over time can sometimes be illuminating - but the process is through heavy reliance on an ever diminishing set of free-to- access rss feeds. Interestingly, the idea of data as an asset has caused a number of information sources to charge for access to their data. Reuters turned off their free-to-access news feed, probably to encourage interested parties to pay for access to their curated repository as a commercial asset. A "seen-it-stored-it" facility would save a lot of fiddling about.
-- zen_tom, Dec 09 2020


I was about to point you to the Colindale newspaper archive, which I have used happily in the past, [zen_tom], but it seems the news from there is not good.
-- pertinax, Dec 09 2020


// I too am a webpage hoarder. //

More deserving of pity than condemnation ...

// Do you think we should form a support group? //

What, like trestles, or a brick plinth or something ?

There's a TV program about going round to people's houses and "de-cluttering" them, with the aid of a skip.

Fortunately, a relatively short burst of 7.62mm automatic fire (1 round in 5 tracer) aimed just above head height is enough to send them scurrying away.

Maybe you need someone like them, but for your offline archive ? The trick with the automatic fire will probably work just as well if you don't...
-- 8th of 7, Dec 09 2020


I can see it coming: The next YouTube advertisements trend with Declutter the Computer experts.
-- pashute, Dec 10 2020


[pertinax] it's a pain, the only way to (non-commercially) get good content seems to be to manually scrape it from ever changing websites - the hard part is automatically monitoring for when the format of those sites change, so you can recode the scrapers. It's disheartening to clean up a dataset and find gaps in the middle when something stopped working due to a redesign.

[pashute] have you come across `youtube-dl`? It's a unix utility for downloading content from youtube - I'm yet to find a tangible use for it, but it is handy, and does mean you can pull content down and watch offline, at leisure without the imposition of advertising - It can only be a matter of time before it gets disabled/blocked.
-- zen_tom, Dec 10 2020



random, halfbakery