OCR for Invoices

Have you considerations to include optical character recognition to automate the read and fill in of purchase invoices, say from pdf’s? Either through email or cut and paste?

This is too difficult without some invoice standard being widely used. For example, I can imagine in future all invoices could contain QR code with relevant data which accounting systems could recognize. But this will take time before the world gets there…

1 Like

I think you just need to tell the computer where to look, each company will have its own invoice “Standard”, especially if they come from an email address (identifier)

First recognise the logo, then show the program where to read company name (by placing the text box over it), then what each of the columns has and says .(again with a text box)
It’s a matter of a looping program of placing text boxes with the bookkeeper placing it the first time, for each aspect, then just making certain they are all good for the next months invoice for that company?

Is that still too hard?

That’s exactly the problem. Each company will have its own invoice “standard” which means in order to make this work, you are proposing some convoluted setup per each supplier which also needs to be maintained. You will spend more time maintaining such a system than just manually entering that one invoice per month. That’s why nobody is doing it that way and most likely never will.

If there are many invoices, then usually businesses will agree on some electronic standard so they will send each other invoices which can be read by machine without any upfront setup. Small businesses rarely deal with this issue though so it’s not really such a big deal.

If Manager were to support this, it wouldn’t be OCR. It would be something more reliable such as QR codes which will embed relevant data or similar.

1 Like

myob is doing it, it will by default become the standard
just like a spellcheck, look for recognisable words, item numbers etc

1 Like

They have only released it 3 months ago and the feedback on their forum from users looks like it can recognize very little (if anything at all).

I’m not blaming them. It’s a difficult problem to solve. That’s why you have companies like http://www.receipt-bank.com who use actual humans to transcript scanned receipts and invoices. If it would be that easy for machines to read scanned invoices properly, then nobody would pay for services like Receipt bank.

1 Like

well at least for electronic pdfs (non OCR) it only takes a google search to find tons of freebies converting pdfs to word or excel, you set up an email address invoices@myco.com they could all be opened into excel
that’s infinitely readable and pdf invoices are the standard.

1 Like

It’s not a problem for a tool to extract text from PDF.

The problem for any tool is making sense out of it. You can “train” the system to recognize some patterns (MYOB), maybe in future it could recognize all patterns but this would involve major effort into building such neural network which is beyond the scope of Manager. And honestly, I don’t even believe it’s the future.

The future is that invoices will contain metadata embedded in QR codes or similar. The result will be that machines will be able to read and understand content in scanned invoices without error and without dependency on some complicated neural network which would never deliver 100% accuracy anyway.

1 Like

A post was split to a new topic: Integration with Receipt Bank