Here's how to save a large topic from the forum

Started by waltk, September 01, 2011, 09:40:21 PM

Previous topic - Next topic

waltk

Hi All,

Some of the forum threads here are the most valuable resource you could ever find when building a new circuit.  Even if it's not YOUR thread, chances are good that someone else has encountered your problem, and has shared the answer for everyone.

Some of the most interesting threads are also excruciatingly long, and tedious to review in their entirety.  So how do you find the diamond in the rough?

One way is to use the forum search feature (only available for registered users).  If you are lurking here, and haven't registered, you are missing out on a valuable tool.

But suppose your are interested in "Tube boost + overdrive running off a 9 volt battery".  That thread has 125 pages as of this post.  Suppose you are dying to know in great detail everything about a "stutter box" - that's a long thread too.  Even after you find the thread of interest, you may be in for some serious browsing time to find what you are looking for.  Most of us computer-savvy folks know that you can search for things within a page (usually from the Edit menu of your browser).  This helps, but doesn't work with a thread with many pages.

Here's the best part: right above the first post on any page, on the same line that has links to individual pages, all the way over at the right of your browser window - theres a button that reads: "Print".  If you click this button, you'll get a printable version of the entire topic.  I don't recommend printing a large topic (save the trees), but you can also see, browse, and search the entire topic at once.

What else can you do with the "Print" view? If you have Adobe Acrobat, you can create an offline viewable version of the topic.  If you save it to an HTML file on your PC, you can browse it any time - even if not connected.

What? You noticed a problem with this method?  Yes, none of the graphics - schematics, layouts, build photos, etc. show up in the print view.  Those graphics are some of the best parts.  In their place is a reference to the location of the graphic.  Here's what I do to to capture an entire thread with graphics for later reference and review...

Use the Print button to display an HTML page with the entire thread.
Save it on your computer as an HTML file.
Open the HTML file with a text editor that supports "regular expression" search and replace. (I use TextPad, but that's just my preference.)
(Here's the tricky part)  Construct a regular expression that replaces the references to images with html <img... references.
Save and open the resulting html page with your browser...

and you have the entire topic, including all graphics on one page.  You can then save it - as html, or a PDF (with Adobe Acrobat), or as MHT with IE.

OK, I realize that regular expressions are not within the repetroire of non-computer-geeks.  If this topic grows to over one page, I will write and place in the public domain a single-purpose .net executable program that will convert topic-print-html to an html version that will display all the graphics in that topic.

Anyone interested?

defaced

Me likey.

Dumb question, will this executable run on linux/mac?  I use neither, but figured I'd ask so it doesn't come up after the fact. 
-Mike

chi_boy

That's actually pretty cool considering it's been right in front of me for so long and I never clicked that button.

Thanks!
"Great minds discuss ideas, average minds discuss events, small minds discuss people." — Admiral Hyman G. Rickover - 1900-1986

The Leftover PCB Page

waltk

QuoteDumb question, will this executable run on linux/mac?  I use neither, but figured I'd ask so it doesn't come up after the fact.

It will only run on Windows because Windows is the only platform that supports .Net (at least I don't think WINE supports it).

Linux geeks will undoubtedly already be familiar with regular expressions, and could easily do it with some ungodly complex single command-line statement (using GREP).
MAC users will not likely ever see this because they more interested touchy-feely artistic things than building their own stuff.

(Don't mean to start a firestorm of flaming here - the previous comments are truly meant to be funny, and not to slam people who don't know a real OS when they see one.)

head_spaz

#4
QuoteMAC users will not likely ever see this because they more interested touchy-feely artistic things than building their own stuff.

Who knew? And here I always thought they spent all their time in their Volvos... or on jury duty.

I think it would be great if each post had a button for users to optionally attribute it with some kind of rating/ranking, in the sense that highly-rated postings would likely have the most pertenent content. Like all of R.G.'s posts.
The search function seems a bit handicapped to me. I especially hate it when it times out on me.
Deception does not exist in real life, it is only a figment of perception.

harmonic

Great bit of advice, Walt. I will use this so often! Thank you. :-)

jbgron

#6
You can run basic .NET apps on Linux and Mac using Mono.

http://www.mono-project.com

I'd prefer the single command line option though  ;D

waltk

QuoteThe search function seems a bit handicapped to me. I especially hate it when it times out on me.

Yes, it's frustrating when there are slowdowns.  I've notice improvements in the forum software as time goes on.  The timeouts could be the fault of the forum software, or the underlying database that posts are stored in, or even just the internet connection speed at any node between you and the server.

If you look at the fine print at the bottom of every page, you'll see that the DIYSTOMPBOXES Forum is running on SMF (Simple Machines Forum) version 1.1.11.  SMF is an open-source PHP-based forum software.  If one is running a linux/apache web server, this is not a bad choice (obviously, as we have all been using this for a while now).

The 1.1.11 version was released in June 2009, and the current version in this branch is 1.1.14 released in June of 2011.  At the same time version 2.0 has been released, and it will be the "stable production" version going forward.

It's up to Aron as to when new versions are incorporated into the site.  I can tell you that migrating to a new version of any server-based software is a pretty big deal.  I haven't updated my own web site since the turn of the century (really, it's been running since then 24/7 without any changes).  I'm sure Aron will consider upgrading as time and resources allow.

-Walt

blooze_man

Quote from: jbgron on September 01, 2011, 11:29:33 PM
You can run basic .NET apps on Linux and Mac using Mono.

http://www.mono-project.com

I'd prefer the single command line option though  ;D

That's cool. Now I can do this on my "fake" OS.
Big Muff, Trotsky Drive, Little Angel, Valvecaster, Whisker Biscuit, Smash Drive, Green Ringer, Fuzz Face, Rangemaster, LPB1, Bazz Fuss/Buzz Box, Radioshack Fuzz, Blue Box, Fuzzrite, Tonepad Wah, EH Pulsar, NPN Tonebender, Torn's Peaker...

jbgron

Oh dear.  Lets not have an OS war in an electronics forum.

waltk

QuoteYou can run basic .NET apps on Linux and Mac using Mono.

Hmmm... not being a LinuxHead, I'm not sure I understand what would be required to run this under linux.  According to documentation in the link you provided, mono is compatible with .Net binaries.  I'm thinking this probably means assemblies, and not executables.  If a .Net executable will run under mono, that's fine with me.  It it takes significant extra effort, I don't expect to do it.

I like command-line utilities also, and would probably create the app to handle a simple command-line interface as well (so if you it pass a source file name and target file name, it will just do the conversion with no UI).

Remember though... the interesting part is the technique!  Taking the HTML source from the "Print" button, and just converting the text image references to a valid "img" tag - is the thrust of this post.  Some viewers would be able able to figure out how to do this without a standalone tool.


jbgron

I haven't looked at Mono for a long time but last I heard you can run any .NET binary but their implementation of Windows.Forms is still a little buggy.  I'd ideally like to see this as a hosted solution that everyone can use, like http://url2pdf.com for example.

waltk

QuoteThat's cool. Now I can do this on my "fake" OS.

Yes, and some day you can aspire to getting a REAL OS...

KIDDING!  Really!  My comments are meant as gentle teasing.  It happens that I'm a professional developer working almost entirely on the MS platform.  I've had some exposure to other platforms, and had some major frustrations with the MS OS's.  But as any developer will tell you, having acquired the programming expertise in one environment, switching to another is not trivial.  I don't believe there is any intrinsic "goodness" in one or the other - as a practical matter, however, I can earn a living and feed my family by designing and developing software in the environment I know best.

-Walt

jbgron

I'm also a professional developer working entirely on the UNIX platform.  I used to be an evangelist but I just can't be bothered anymore.  Somebody great once said, "Nobody has ever changed their mind as a result of losing an argument".  I couldn't agree more.

waltk

QuoteI'm also a professional developer working entirely on the UNIX platform.  I used to be an evangelist but I just can't be bothered anymore.  Somebody great once said, "Nobody has ever changed their mind as a result of losing an argument".

Well said, brother.

QuoteI'd ideally like to see this as a hosted solution that everyone can use

Yeah.  Hosting would be good.  I'm just not willing to jump into being the provider of a hosted solution that is so limited in usefulness.

Question for you as a unix guy, if you wanted to take an HTML source file, and just convert references to hosted images (embedded as text in parentheses instead of <img> tags), you would just use grep, right?

-Walt

jbgron


jbgron

#16
Quick and dirty but this will do it;

sed -e 's|http://\([a-zA-Z0-9.\,:\/?\&=~%+_#-]*\)|<img src=\"http://\1\">\1/>|g;' print.html > print_fixed.html

Has a hangover of mangling non-image urls too but its all I have in me this late on a Friday afternoon (in Australia).

waltk

Quotesed -e 's|http://\([a-zA-Z0-9.\,:\/?\&=~%+_#-]*\)|<img src=\"http://\1\">\1/>|g;' print.html > print_fixed.html

Cool.  I'm not familiar with the sed syntax.

I would probably use this as the find expression  in Textpad:

(\(http://[^)]+gif\|jpg\|png\))

and this as the replacement expression:

<img src="\1" />

The .Net regex syntax is a little different, so the find expression would be:

\((http://[^)]+(gif|jpg|png))\)

and the replacement expression would be:

<img src="$1" />

markeebee

#18
Quote from: jbgron on September 02, 2011, 01:30:07 AM

sed -e 's|http://\([a-zA-Z0-9.\,:\/?\&=~%+_#-]*\)|<img src=\"http://\1\">\1/>|g;' print.html > print_fixed.html


Nice.  I see what you did there.


EDIT
I changed my avatar pic as a tribute.

pinkjimiphoton

#19
if ya google up cute pdf writer (free) you can also have it print the entire thread, with graphics, into a pdf file.
i do it all the time. you need to set the "print preview" to print ALL the pages, not just the one you're looking at.

edit: i just checked it out, it works but only for 11 pages at a time max. but, you can have it print everything to pdf, including all graphics, backgrounds, etc etc.

so in a big thread, you may have to make parts 1, 2,3, etc...but you can indeed capture the whole thing with just a few clicks, and easier than editing hypertext markup if you're not used to coding.

and cute pdf is freeware. ;)
  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr