A Tiny Start Towards a Paperless Home, Scanning, Print To PDF

This article will show you how to get started eliminating paper in your home or office and going all digital with scanned images stored on your computer. It’s a bit long, but has lots of good data. Also, though, note that it is just an introduction to the topic.

Note that it’s the DATA that’s really important on your computer. The more important data you have on there that you don’t want to lose, the more important it is to backup the data. All hard drives crash. It’s just a matter of when. Scanning all your important papers, especially if you discard the paper, will make it all the more painful if your computer crashes. Also, be aware that scanning certain types of confidential papers into your computer will make it more important to maintain your computer security and more critical if the computer data is stolen, or the computer is stolen.

How many of you have ever heard of the term “paperless office” or “paperless home”?

How many of you have ever seen one? It turns out it’s a very hard thing to do for a variety of reasons.

http://en.wikipedia.org/wiki/Paperless_office

For some time, I’ve had a motive to at least move in the direction of paperless at home. I don’t have much space to file papers, and I’m terrible at doing so. They usually end up in a pile, or several piles, or they end up in piles of file folders. Sometimes, papers of similar nature end up in different piles.

Now, I do keep important papers like medical records or bills and do the best I can to find a place for them. But, there are many other papers I have which might be conducive to conversion to digital with the right technology. For a long time, I’ve been eyeballing high speed document scanners like the Fujitsu ScanSnap. However, I have been unable to afford the $400 ish price.

Well, coincidentally, I just recently had a need to replace a printer. I ended up getting a Brother multifunction laser printer scanner copier with 35 page automatic document feeder (ADF). I’m not mentioning the name since this isn’t really intended as a printer review. But, I think the combination of things it has may help me get a start into going paperless.

Printer, Scanner, Copier, ADF (Automatic Document Feeder)

The printer and software have 6 key elements which help me out:

A) It’s wireless and attaches to my LAN. That means I can put the printer anywhere I want and I can attach to it from any PC.
B) It can print.
C) It can copy.
D) It can scan.
E) It has document management software, an older version of Nuance PaperPort.
F) It has a 35 page ADF or Automatic Document Feeder.

Combined, these features should help me on my road to going paperless. (PS, it’s a long road, and scanning and categorizing things is time consuming. So I’ll probably only do so for the most important things.)

It took a while to get the printer set up on the network and it took a while to get the scanning software set the way I wanted on one pc and document all that with screenshots in case I want to replicate it later.

I designated a certain directory where my scans go. Under that, I have sub categories of receipts, computer instructions, and uncategorized scans. Note that I’m just getting started with this and many more categories are possible. Under receipts, I have sub folders of electronics, meds / nutrition, automotive, medical, and household. Under computer instructions, I have sub folders like Audacity, PaperPort, and Windows.

Although I’m not using the default folders that PaperPort set up, they included categories for Articles, Bank Statements, Business Cards, Faxes, Investments, Photographs, Presentations, Real Estate, Samples (including some sample documents and PaperPort usage documents), Taxes, and Web Pages. I will add folders such as these to my folder structure as necessary.

Scanner Software

The scanner software contains 5 important attributes which are quite useful:

OCR – Optical Character Recognition – This is a fancy way of saying that the software can read English characters in common fonts right from the scan and save the text AS TEXT, not just a bunch of dots in an image.
Searchable PDF – This is a very handy type of PDF file. It not only includes an image of the page which was scanned, a giant collection of dots; but also includes the text which came from the OCR subsystem. So, for example, if you had a print out of this blog post, and you scanned it into a searchable PDF with OCR, you would not only see an exact image of the page, but you would also be able to select text from the page and copy it or search it from other programs.
Print to PDF – This allows you to print almost anything that’s printable from your computer, but the printout is directed straight to a PDF file, and, if you wish, a searchable PDF file. The utility of this is not immediately obvious, but it is very handy as I’ll describe below.
Stack and Unstack – This allows you to take several 1 page PDF files and turn them into 1 multi-page file. It also allows you to break a multi-page file into several 1 page files.
Minimal PDF editing, including orientation and annotation

Note, this software is not exclusively focused on PDF’s, but that’s the main part of it that I’m using.

How Many Dots Per Inch Can Your Eyes See

One of the first questions you will confront when setting the scanner settings is how many dots per inch to use. I would encourage you to plod through your scanner settings and don’t just leave them at the default level, which may not meet your needs.

More dots per inch = better quality image = bigger file size.
Less dots per inch = poorer quality image = smaller file size.

I should note here, that, if you scan at a poor quality setting, and throw away or lose the paper, you can never replace the dots you missed. So, I like to scan at a high quality setting.

A related question is how many dots can your eyes see? What is good quality and what is poor quality? I found this article to help answer this:

The Resolution of the Human Eye

I didn’t take the time to understand everything in the article, but what I get out of it is this. At 4″ distance from the eye, a HEALTHY (and probably young) human eye can discern 2190 dots per inch (DPI). It also says that the legal norm for 20 / 20 vision at 4″ is 876 DPI.

If you’re further away than 4″, you can discern fewer dots. I didn’t take the time to follow all the math, but, it says magazines are typically printed at 300 DPI and fine art prints are typically printed at 720 DPI.

I’ve chosen to set my scanner to 600 DPI unless I’m trying to go for a reduction in file size. This will provide a very good quality image if I’m looking at it on a screen or if I ever have to print it again. Note also that pages with small text may not OCR properly at resolutions less than about 400 DPI according to my scanner software documentation.

Beware the File Size

Even though disk drives are cheap and most people have extra disk space on their hard drive, the file size of scans can be your nemesis, particularly if you save lots of pages. Here are some example sizes for a single page letter size scan. Note that output sizes will vary substantially depending on content. They will also vary substantially depending on the specific scanner and software and the specific output file type and whether the file type uses compression.

These samples were taken from a scan of page 1 of this web page which has a fair amount of both text and graphics:

Wikipedia Page for Amoeba

Black and White, PDF, 600 DPI, 300 KB per page
Gray Scale, PDF, 600 DPI, 5 MB per page
Low Color, PDF, 256 color, 300 DPI, 4 MB per page
Full Color, PDF, 24 bit color, 600 DPI, 15 MB per page
Full Color, JPG, 24 bit color, 600 DPI, 10 MB per page

Scanner Settings

For most of my scanning, I use black and white at 600 DPI to minimize disk space usage. If I have a particular need for gray scale or color, I switch to that on a case by case basis.

Here are some settings I always like to use:

OCR – English – As described above, extracts the text from the scan.
Searchable PDF Output – As described above, provides a PDF with both the image and the text of the scan.
Auto Straighten – Automatically corrects the vertical alignment of the scan.
Auto Orient – Automatically orients the page right side up for reading text. This occasionally fails and I have to load the PDF in their viewer and reorient it.

Here are the scan settings I have created for different documents, based on the pre existing document types in the software:

Black and White Document, OCR on, scanner set to text, black and white, 600 DPI, searchable PDF output
Gray Scale Document, OCR on, scanner set to text, true gray scale, 600 DPI, searchable PDF output
Low Color Document, OCR on, scanner set to text, 256 bit color, 300 DPI, searchable PDF output
Color Document, OCR on, scanner set to text, 24 bit true color, 600 DPI, searchable PDF output
Color Photograph, OCR off, scanner set to photo, 24 bit true color, 600 DPI, JPG output

For each document, I have the system set to auto generate a file name in the format of 2015-01-09-BW-OCR-9999, where the date automatically updates, the middle changes depending on the type of scan I’m doing, and the number at the end count upwards for each scan I do during the day. In the case of this particular software, it only counts numbers upwards for each file that’s already in a folder. Thus, if I scan 5 things, 0001-0005, then move them from the folder and scan again, it will count 0001-0005 again. If I then move them to the same destination folder, I would have to rename them, but the software allows me to do that. I would rather it number the subsequent scans 0006-0010, but it doesn’t.

Work Flow for Handling Various Paper Items

Receipts

Receipts are one thing which are an obvious candidate for scanning. Unfortunately, they are one of the hardest things to deal with. They’re usually printed on flimsy thermal oddball size paper which fades over time. This makes them hard to file, hard to keep, and hard to scan.

I keep all my receipts for some period of time. For non critical receipts like food and miscellaneous purchases, I just throw them in a box in case the bank makes any errors with the transaction. After a year, I throw them out and start the box over. I do not try to scan these. Note, that, while uncommon, some receipts have your credit card number on them. These should be shredded.

I do intend to scan most or all of my important receipts. This would include things like the categories I mentioned above and include anything that has a warranty associated with it, or anything which I might need to return more than a week later.

When I first get important receipts, if I’m not scanning them immediately, I put them in an Unscanned Receipts physical file. After I scan them, I put them in a Scanned Receipts physical file, but don’t attempt to further sort the paper versions.

For these receipts, I first physically group them by the receipt categories I mentioned earlier and then subgroup them by increasing date of transaction. I go through and write the category and the transaction date on each, since I may be scanning them on a different date from when the transactions occurred. I then put them one by one on the glass plate of the copier and make a copy. I don’t try to feed them through the document feeder, as that would probably turn out badly.

Copying the receipts does two useful things. First, since the printer is a laser printer, it turns the receipt from a fadable thermal item into a relatively permanent item. Second, it turns a flimsy oddball size piece of paper into a standard letter size piece of paper. These pieces of paper WILL be able to go through the document feeder.

I start the PaperPort software and go to my Uncategorized Receipts folder, unless I know all the receipts are going to the same folder, then I go to that one. I put the stack of copies in the document feeder, (usually) set the scanner for black and white OCR, and scan the whole stack. I have the software set to make a separate searchable PDF file for each page. So, I get a number of PDF files showing up in the PaperPort software.

At this point, I can double click on each PDF, see what the receipt is for, and then drag the file to the appropriate folder for that receipt. I can also optionally rename the PDF file so the name shows what the receipt is for. In some cases, particularly for more expensive electronics, I put the PDF in a sub folder to show how long the warranty is.

As I mentioned earlier, this is time consuming, and I only do this for important receipts. However, once this is done, I have a much greater chance of finding the receipts again if I need them. Note that the copier, not just the scanner, was an integral part of this workflow.

After I’m done scanning the copies of the receipts, I put the copies in an Important Receipts Copies physical file.

Print to PDF

The utility of this feature of the software is not immediately obvious. If you have something already on your pc screen, why would you want to print it to a PDF and not just print paper or save it as some other format? Well, I’m trying to avoid paper. And, I’ve noticed many of the papers I have around the house came from the printer. So, they were once on the computer.

Obviously, if you have something like a word document, that is easy enough to save as a file. And, the document management software can manage those. But, for some other types of things I routinely used to print, it’s not so easy.

One example is web pages. Yes, you can just save a bookmark. But, the page may go away or change later. Yes, you can do a file save as on the web page. But, this normally generates an html file plus a folder containing all the little images and such associated with the page. After doing a number of them, this gets messy.

Printing to PDF is an attractive alternative. You get a PDF that looks exactly like the printout would if you printed to paper. You can then view this, email it, or even print it if you had to. I used this technique recently to save a digital “print” of a web page about adding partitions to a hard disk. I stuck that in my computer instructions folder for future reference.

Another thing that I routinely used to print was screenshots of various menu screens in various programs that I have to configure such as Firefox. Now while I may still print the images and put them in a binder, I am now going to start printing to PDF and saving the digital files also.

Scan and Toss

Scan and toss is a term I use to refer to papers that are important enough that I don’t want to throw away the information, but not important enough that I want to keep the paper.

This may include SOME things I get in the mail, maybe info about a subject I’m interested in but have no time or money to pursue. It may also include SOME printouts I’ve made in the past and put in piles related to some topic I was researching. Or perhaps it’s printouts of screenshots I’ve saved for some reason.

In any case, I simply set the software to scan to my Uncategorized Scans folder, scan the page, then toss the paper in my recycle bin. If the information is confidential, I shred it. Within a week, the paper is gone, but it’s essence remains. If I choose to, I can rename the file with a more relevant name and put it into some relevant category.

Scan and Keep

These are papers I would want to keep after scanning. Bills and Medical records might fall into this category. Now, I might not scan these at all, but, if I did, I would still keep and file the paper. After scanning these, I would take the time to rename the digital file with a relevant name and move it to a relevant category folder on the pc. I would then file the paper somehow.

Hand Written Notes

This might include notes from club meetings or notes I’ve written about some topic I’m researching. Now, while the OCR probably won’t work, especially with MY handwriting, it’s still valuable to have the notes stored on the pc so I can get to them. These could be handled as scan and toss or scan and keep documents.

Old Papers

One of the motives for going through these exercises at all is I have a LARGE quantity of old papers around the house that I’ve accumulated over the years. Some are from outside sources, such as the mailbox, newspapers, flyers, etc. Many are from my own printer. So, what I intend to do is to periodically go through a stack of them and determine if they fall into a just toss it category, a scan and toss category, or a scan and keep category. I’ll try to minimize the latter, since, if I had a good place to file them, they wouldn’t be in piles in the first place.

Once I have several that need scanning, I’ll divide them in to 35 page stacks, which is the capacity of the ADF, and scan them in mass. After that, I’ll toss the tossable ones and file the keepable ones … somewhere … somehow. For papers I need to scan which won’t go through the ADF, I’ll copy them first and then scan them or I will scan them directly from the glass plate of the copier.

Using these techniques, I hope to at least start making a dent in my numerous piles of papers and hope to avoid printing others when I can. That said, I still don’t like reading long documents on a screen.

Hopefully, these techniques will help you start going paperless too.

Ron

Rons Tech Rant

Various musings about tech and security. I don't ALWAYS rant. ronstechrant AT techstarship DOT com